Unpacking a list in an R dataframe

r unpack list into arguments
r unpack list into variables
r extract elements from list
r extract elements from list of lists
r tuple unpacking
r zeallot
extract variable from list in r
getting a value from a list in r

I have a dataframe of which one field comprises lists of varying lengths. I would like to extract each element of the list in this field to its own field so that I can gather the results into a long dataframe with each list element per id.

Here is an example dataframe

dat <- structure(list(id = c("509935", "727889", "864607", "1234243", 
        "1020959", "221975"), some_date = c("2/09/1967", "28/04/1976", 
        "22/12/2017", "7/02/2006", "10/03/2019", "21/10/1935"), df_list = list(
            "018084131", c("062197171", "062171593"), c("064601923", 
            "068994009", "069831651"), c("071141584", "073129537"), c("061498574", 
            "065859718", "067251995", "069447806"), "064623976")), class = c("tbl_df", 
        "tbl", "data.frame"), row.names = c(NA, -6L))

I have come with code to achieve what I want the final result to look like, however, I have not done this the DRY way. Here is what I have tried.

res_n is a function as follows:

res_n <- function(field, n) {
    field[n]
}
dat <- dat %>% mutate(res1 = map(df_list, res_n, 1))
dat <- dat %>% mutate(res2 = map(df_list, res_n, 2))
dat <- dat %>% mutate(res3 = map(df_list, res_n, 3))

This returns a data frame with each of the three list elements from df_list in their own columns.

From this I can achieve what I set out to do and produce a final dataframe of results, as follows:

dat_final <- gather(dat, test, labno, -df_list, -some_date, -id) %>% 
    select(-df_list) %>% 
    mutate(labno = as.integer(labno)) %>% 
    filter(!is.na(labno))

To avoid the DRY approach I used I resorted to a for loop to try and eliminate the repetitive code. I'm struggling to get this to work in the way I need to achieve the final result. This is the for loop I tried.

 for (i in 3) {
     dat %>% mutate(paste(res, i, sep = '_') = map(results, res_n, i)) }

Can someone help me to refine the code to elimiate the repeitive lines that produce the result.

If the final goal is to get data in long format, we can use unnest from tidyr

tidyr::unnest(dat, cols = df_list)

#   id      some_date  df_list  
#   <chr>   <chr>      <chr>    
# 1 509935  2/09/1967  018084131
# 2 727889  28/04/1976 062197171
# 3 727889  28/04/1976 062171593
# 4 864607  22/12/2017 064601923
# 5 864607  22/12/2017 068994009
# 6 864607  22/12/2017 069831651
# 7 1234243 7/02/2006  071141584
# 8 1234243 7/02/2006  073129537
# 9 1020959 10/03/2019 061498574
#10 1020959 10/03/2019 065859718
#11 1020959 10/03/2019 067251995
#12 1020959 10/03/2019 069447806
#13 221975  21/10/1935 064623976

Unpacking Assignment, The zeallot package defines an operator for unpacking assignment, sometimes A data frame is simply a list of columns, so the zeallot assignment does r. squared : num 0.726 #> $ adj.r.squared: num 0.717 #> $ fstatistic� The operator is written as %<-% and used like this. c (lat, lng) %<-% list ( 38.061944, -122.643889) The result is that the list is unpacked into its elements, and the elements are assigned to lat and lng. lat #> [1] 38.06194 lng #> [1] -122.6439. You can also unpack the elements of a vector.

Instead of using repeated map, we can make use of unnest_wider

library(dplyr)
library(tidyr)
library(stringr)
out <- dat %>%
         unnest_wider(df_list, names_repair = ~ 
                     str_remove(str_c("res", .x), "[.]+"))
out
# A tibble: 6 x 6
#  id      some_date  res1      res2      res3      res4     
#  <chr>   <chr>      <chr>     <chr>     <chr>     <chr>    
#1 509935  2/09/1967  018084131 <NA>      <NA>      <NA>     
#2 727889  28/04/1976 062197171 062171593 <NA>      <NA>     
#3 864607  22/12/2017 064601923 068994009 069831651 <NA>     
#4 1234243 7/02/2006  071141584 073129537 <NA>      <NA>     
#5 1020959 10/03/2019 061498574 065859718 067251995 069447806
#6 221975  21/10/1935 064623976 <NA>      <NA>      <NA>     

EDIT: Based on @Phil's comments

Now, reshape to 'long' with pivot_longer

out %>% 
    pivot_longer(cols = starts_with('res'), values_drop_na = TRUE) %>%
    mutate(value = as.integer(value))
# A tibble: 13 x 4
#   id      some_date  name     value
#   <chr>   <chr>      <chr>    <int>
# 1 509935  2/09/1967  res1  18084131
# 2 727889  28/04/1976 res1  62197171
# 3 727889  28/04/1976 res2  62171593
# 4 864607  22/12/2017 res1  64601923
# 5 864607  22/12/2017 res2  68994009
# 6 864607  22/12/2017 res3  69831651
# 7 1234243 7/02/2006  res1  71141584
# 8 1234243 7/02/2006  res2  73129537
# 9 1020959 10/03/2019 res1  61498574
#10 1020959 10/03/2019 res2  65859718
#11 1020959 10/03/2019 res3  67251995
#12 1020959 10/03/2019 res4  69447806
#13 221975  21/10/1935 res1  64623976

NOTE: If we check ?unnest, it says the lifecycle as deprecated

nest(.data, ..., .key = deprecated())

unnest(data, cols, ..., keep_empty = FALSE, ptype = NULL, names_sep = NULL, names_repair = "check_unique", .drop = deprecated(), .id = deprecated(), .sep = deprecated(), .preserve = deprecated())

and in ?hoist description is

hoist(), unnest_longer(), and unnest_wider() provide tools for rectangling, collapsing deeply nested lists into regular columns.


Also, if the intention is not to get the intermediate wide format, just use unnest_longer

dat %>%
      unnest_longer(df_list)
# A tibble: 13 x 3
#   id      some_date  df_list  
#   <chr>   <chr>      <chr>    
# 1 509935  2/09/1967  018084131
# 2 727889  28/04/1976 062197171
# 3 727889  28/04/1976 062171593
# 4 864607  22/12/2017 064601923
# 5 864607  22/12/2017 068994009
# 6 864607  22/12/2017 069831651
# 7 1234243 7/02/2006  071141584
# 8 1234243 7/02/2006  073129537
# 9 1020959 10/03/2019 061498574
#10 1020959 10/03/2019 065859718
#11 1020959 10/03/2019 067251995
#12 1020959 10/03/2019 069447806
#13 221975  21/10/1935 064623976

Or using base R

merge(setNames(stack(setNames(dat$df_list, dat$id))[2:1], 
      c("id", "values")), dat[-3])

unpack Your Values in R, In the above example base::split() built a named list of sub-dataframes from our original data frame d . We used unpack[] to assign these named� In R there are many functions that return named lists or other structures keyed by names. Often, you want to unpack the elements of such a list into separate variables, for ease of use. One example is the use of split () to partition a larger data frame into a named list of smaller data frames, each corresponding to some grouping.

Base R solution:

# Split, Apply, Combine Base R: 
# Split the data frame on ids, unlist the dataframe list, replicated the id,
# the number of times as there are elements in the unlisted df list - store
# as a dataframe, left join back to the original data.frame,
# (dropping the df_list vector) using the ID vector, row bind the id data.frames
# back together and store it as a dataframe: 

data.frame(do.call("rbind", lapply(split(df, df$id), function(x){

      unlisted_df_list <- unlist(x$df_list)

      rolled_out_df <- data.frame(id = rep(x$id, length(unlisted_df_list)),

                                 df_list = unlisted_df_list, stringsAsFactors = F)

      x <- merge(x[,names(x) != "df_list"], rolled_out_df, by = "id", all.x = T)

      }

    )

  ),

  row.names = NULL

)

Data wrangling: dataframes, matrices, and lists, metadata[1, 1] # element from the first row in the first column of the data frame list1$df. Now we have three ways that we could extract a component from a list. What is R List? R list is the object which contains elements of different types – like strings, numbers, vectors and another list inside it. R list can also contain a matrix or a function as its elements. The list is created using the list() function in R. In other words, a list is a generic vector containing other objects. For example:

Pack and unpack — pack • tidyr, Source: R/pack.R Packing and unpacking preserve the length of a data frame, changing its width. Used to check that output data frame has valid names. 5 Ways to Subset a Data Frame in R; R – Sorting a data frame by the contents of a column; How to write the first for loop in R; Date Formats in R; Installing R packages; Time Series in 5-Minutes, Part 1: Visualization with the Time Plot (JUST RELEASED) timetk 2.0.0: Visualize Time Series Data in 1-Line of Code

Simplifying lists, Run the below code in your console to download this exercise as a set of R scripts. By storing the list in a data frame, we bundle together multiple vectors so when we Use your knowledge of rectangling with tidyr to extract relevant data of� Get the List of column names of dataframe in R To get the list of column names of dataframe in R we use functions like names() and colnames(). In this tutorial we will be looking on how to get the list of column names in the dataframe with an example

Extract or Replace Parts of a Data Frame, When [ and [[ are used with a single vector index ( x[i] or x[[i]] ), they index the data frame as if it were a list. In this usage a drop argument is ignored, with a� # append item to list in r append (first_vector, second_vector) You are likely already familiar with using concatenate to add elements to a list. While this does a solid job of adding items to a list in R, the append function operates faster. Append also allows you to specify where to append the values within the list or vector.

Comments
  • @akrun - sorry, it was a function to name each new column in the mapping. Added it now.
  • You could avoid the rename_at() by including the change right into the names_repair argument: unnest_wider(dat, df_list, names_repair = ~ str_remove(str_c("res", .x), "[.]+"))
  • @akrun - I feel like I have learned a lot this morning with unnest_wider and pivot_long. These will help a lot.
  • I don't think unnest as a function is deprecated but some arguments passed to unnest are deprecated.