How to import files from subdirectories and name them with subdirectory name R

r list files in subdirectories
list.files r pattern
r list.files pattern wildcard
loop through multiple folders in r
read all files in a directory r
file.path in r
r loop through files in folder
r list.files multiple pattern

I'd like to import files (of different lengths) recursively from sub-directories and put them into one data.frame, having one column with the subdirectory name and one column with the file name (minus the extension):

e.g. folder structure
IsolatedData
  00
    tap-4.out
    cl_pressure.out
  15
    tap-4.out
    cl_pressure.out

So far I have:

setwd("~/Documents/IsolatedData")
l <- list.files(pattern = ".out$",recursive = TRUE)
p <- bind_rows(lapply(1:length(l), function(i) {chars <- strsplit(l[i], "/");
cbind(data.frame(Pressure = read.table(l[i],header = FALSE,skip=2, nrow =length(readLines(l[i])))),
      Angle = chars[[1]][1], Location = chars[[1]][1])}), .id = "id")

But I get an error saying line 43 doesn't have 2 elements.

Also seen this one using dplyr which looks neat but I can't get it to work: http://www.machinegurning.com/rstats/map_df/

tbl <-
  list.files(recursive=T,pattern=".out$")%>% 
  map_df(~data_frame(x=.x),.id="id")

Here's a workflow with the map functions from purrr within the tidyverse.

I generated a bunch of csv files to work with to mimic your file structure and some simple data. I threw in 2 lines of junk data at the beginning of each file, since you said you were trying to skip the top 2 lines.

library(tidyverse)

setwd("~/_R/SO/nested")

walk(paste0("folder", 1:3), dir.create)

list.files() %>%
    walk(function(folderpath) {
        map(1:4, function(i) {
            df <- tibble(
                x1 = sample(letters[1:3], 10, replace = T),
                x2 = rnorm(10)
            )
            dummy <- tibble(
                x1 = c("junk line 1", "junk line 2"),
                x2 = c(0)
            )
            bind_rows(dummy, df) %>%
                write_csv(sprintf("%s/file%s.out", folderpath, i))
        })
    })

That gets the following file structure:

├── folder1
|  ├── file1.out
|  ├── file2.out
|  ├── file3.out
|  └── file4.out
├── folder2
|  ├── file1.out
|  ├── file2.out
|  ├── file3.out
|  └── file4.out
└── folder3
   ├── file1.out
   ├── file2.out
   ├── file3.out
   └── file4.out

Then I used list.files(recursive = T) to get a list of the paths to these files, use str_extract to pull text for the folder and file name for each, read the csv file skipping the dummy text, and add the folder and file names so they'll be added to the dataframe.

Since I did this with map_dfr, I get a tibble back, where the dataframes from each iteration are all rbinded together.

all_data <- list.files(recursive = T) %>%
    map_dfr(function(path) {
        # any characters from beginning of path until /
        foldername <- str_extract(path, "^.+(?=/)")
        # any characters between / and .out at end
        filename <- str_extract(path, "(?<=/).+(?=\\.out$)")

        # skip = 3 to skip over names and first 2 lines
        # could instead use col_names = c("x1", "x2")
        read_csv(path, skip = 3, col_names = F) %>%
            mutate(folder = foldername, file = filename)
    })

head(all_data)
#> # A tibble: 6 x 4
#>   X1        X2 folder  file 
#>   <chr>  <dbl> <chr>   <chr>
#> 1 b      0.858 folder1 file1
#> 2 b      0.544 folder1 file1
#> 3 a     -0.180 folder1 file1
#> 4 b      1.14  folder1 file1
#> 5 b      0.725 folder1 file1
#> 6 c      1.05  folder1 file1

Created on 2018-04-21 by the reprex package (v0.2.0).

dplyr, How to import files from subdirectories and name them with subdirectory name R Then I used list.files(recursive = T) to get a list of the paths to these files, use and file name for each, read the csv file skipping the dummy text, and add I know this is unfashionable to give an answer that just uses base R,  I'm trying to list all the files in a directories including subdirectories that end with _input.txt. - folder 1 - a_input.txt - folder 2 - b_input.txt If folder 1 were my working directory, I would like list.files(pattern = "\\_input.txt$") to be able to detect both a_input.txt and b_input.txt

Can you try:

library(tidyverse)    

tbl <-
  list.files(recursive = T, pattern = ".out$") %>% 
  map_dfr(read_table, skip = 2, .id = "filepath")

Working with files and folders in R, Learn how to work with files and folders in R. Import, delete, and create files in R. Unless you specify it otherwise, all files will be read and saved into the working directory. setwd ( "C:/Documents and Settings/Folder name" ). If you want to setup the working directory to a subfolder within your current  The "package" declaration in your go files doesn't have to be the same as your subfolder name, but all the files in the same subfolder belong to the same package and should have the same package declaration with each other. If your package declaration is named differently from your subfolder name, you should still import with the subfolder name.

I am guessing from your program that your ".out" files consist of a single column of data? If so, you can use scan instead of read.table. I am also guessing that your want the folder name in a column called Angle, the file name (minus extension) in a column called Location, and the data in a column called Pressure. If that is correct, the following should work:

setwd("~/Documents/IsolatedData")
l <- list.files(pattern = "\\.out$", recursive = TRUE)
p <- data.frame()
for (i in seq_along(l)){
  pt <- data.frame(Angle = strsplit(l[i], "/")[[1]][1],
                   Location = sub("\\.out", "", l[i]),
                   Pressure = scan(l[i], skip=2))
  p <- rbind(p, pt)
}

I know this is unfashionable to give an answer that just uses base R, particularly one involving a loop. However, for things like iterating through files in a directory, IMHO it is a perfectly reasonable thing to do, not least for readability and ease of debugging. Of course, as you expect you know, growing an object with rbind in a loop (or apply for that matter) is not a great idea if you are dealing with big data, but I suspect that is not the case here.

Finding files in project subdirectories, the project root (when running R scripts),; a subdirectory (when building vignettes​), (Often, the root contains a regular file whose name matches a given pattern, It returns the path to the first directory that matches the filtering criteria, R CMD check (working directory: a renamed recursive copy of tests ). Then I used list.files(recursive = T) to get a list of the paths to these files, use str_extract to pull text for the folder and file name for each, read the csv file skipping the dummy text, and add the folder and file names so they'll be added to the dataframe.

List the Files in a Directory/Folder, These functions produce a character vector of the names of files or directories in the named directory. Should subdirectory names be included in recursive listings? If a path does not exist or is not a directory or is unreadable it is skipped. 24 Answers 24. Since you're on a new python, you should use pathlib.Path.glob from the the pathlib module. If you don't want to use pathlib, just use glob.glob, but don't forget to pass in the recursive keyword parameter.

command line, find parent -name "source" -type d - For each directory named source in will match all files and zero or more directories and subdirectories. represents the parent directory and 'source' represents the subdirectory you want to find. It has the advantage of not using any exotic shell constructions, and so  If you have multiple files having an extension like jpeg and you want to batch rename them to `jpg` as extension you can do that with a single line of code: FOR /R %f IN (*.jpeg) DO REN "%f" *.jpg In my case I wanted to batch rename all my template files from `.jade` to `.pug`, as there was an issue with

Operating on files with R: copy and rename, Sometimes, it is useful perform this operations in batch. Why someone should use R to copy or rename a (lot of) file(s)? The main directory has a name like "​2012_09_21_Fri" while subdirectory has a name like "Fri 21 sep 2012". Now, I need to copy all the txt files from their old subdirectories to the  How to import files from subdirectories and name them with subdirectory name R 0 Kable: “The table should have a header (column names)” when trying to display data frame in markdown

Comments
  • "I can't get it to work" what's the issue ?
  • the tidy verse example doesn't actually import the data, just gives me a tibble of file names
  • Small point, not an attempt at an answer: The pattern should be "\\.out$". The one you have will match anything the ends in "out".
  • Thank you very much for this. I've replaced read_csv with read_table2 and added the following inside read_table2: col_types = cols( TimeStep = col_integer(), Value = col_character() )
  • Sure, there's a lot of options in the readr functions: read_tsv for tab-separated, or read_table for columns separated by other sets of whitespace
  • Thank for this.The files are tab separated with two lines of header that I would need to skip. can I substitute read_csv for something like read_table(skip=2)?
  • Yes read purr::map documentation: rdocumentation.org/packages/purrr/versions/0.2.4/topics/map
  • Thank you. I'm missing something here though it makes a tibble with 65 rows (64 containing NA). The closest answer I can get is using my first piece of code but it only works correctly when in one folder.