why does `mutate_all()` function fail when reading an `xlsx` based data frame?
I have a data frame (df) that I initially read in from
xlsx document. I am trying to create a new df with all the missing values replaced by
999999. When I run the following command:
LPAv0.4.2 <- LPAv0.3 %>% mutate_all(funs(replace(., is.na(.), 999999)))
I get the following error:
13. stop(structure(list(message = "Evaluation error: 'origin' must be supplied.", call = mutate_impl(.data, dots), cppstack = NULL), .Names = c("message", "call", "cppstack"), class = c("Rcpp::eval_error", "C++Error", "error", "condition"))) 12. mutate_impl(.data, dots) 11. mutate.tbl_df(.tbl, !(!(!funs))) 10. mutate(.tbl, !(!(!funs))) 9. mutate_all(., funs(replace(., is.na(.), 999999))) 8. function_list[[k]](value) 7. withVisible(function_list[[k]](value)) 6. freduce(value, `_function_list`) 5. `_fseq`(`_lhs`) 4. eval(expr, envir, enclos) 3. eval(quote(`_fseq`(`_lhs`)), env, env) 2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 1. LPAv0.3 %>% mutate_all(funs(replace(., is.na(.), 999999)))
The weird thing is that if I write
LPAv0.3 to a
csv , then read it back in, the
LPAv0.4.2 <- LPAv0.3 %>% mutate_all(funs(replace(., is.na(.), 999999))) command works as expected. However, if I write out to an
xlsx file, then read back in, it fails again with the error above.
Any idea why this is happening? Also, any idea how I can replace all the missing values without having to print out of
R then import it back in?
Thanks in advance.
Error in as.POSIXct.numeric(value) : 'origin' must be supplied 16. stop("'origin' must be supplied") 15. as.POSIXct.numeric(value) 14. as.POSIXct(value) 13. `[<-.POSIXct`(`*tmp*`, thisvar, value = 99999) 12. `[<-`(`*tmp*`, thisvar, value = 99999) 11. `[<-.data.frame`(`*tmp*`, list, value = 99999) 10. `[<-`(`*tmp*`, list, value = 99999) 9. replace(., is.na(.), 99999) 8. function_list[[k]](value) 7. withVisible(function_list[[k]](value)) 6. freduce(value, `_function_list`) 5. `_fseq`(`_lhs`) 4. eval(expr, envir, enclos) 3. eval(quote(`_fseq`(`_lhs`)), env, env) 2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 1. LPAv0.3 %>% replace(., is.na(.), 99999)
What you're looking for is this line:
LPAv0.4.2 <- LPAv0.3 %>% replace(., is.na(.), 99999)
Let me explain this a bit while we're here.
First, R standard functions and
readxl can't write .xlsx files (despite fact that Excel himself can read variety of formats). However,
readxl package have function
write_excel_csv, which should write .csv in your locale so Excel will pick it up without a problem.
R standard and readxl functions do not rely on name of file you supplied, they use it just as identificator, and write or read data to (from) this file in specific pattern. You can check it by yourself - renaming .xlsx file to .csv will give you nothing but error when you'll try to open it with Excel.
File reading functions expect that you know file format beforehand, and you will use appropriate function. In your case, to read excel file (.xlsx) you need to use
read_excel function from
Learning R: A Step-by-Step Function Guide to Data Analysis, A Step-by-Step Function Guide to Data Analysis Richard Cotton whereas require returns a logical value (letting you do custom error handling). but there is also read.xlsx in the same package, and different functions in several other packages. mutate all allow manipulating columns and adding columns to data frames, To use mutate in R, all you need to do is call the function, specify the dataframe, and specify the name-value pair for the new variable you want to create. Example: how to use mutate in R The explanation I just gave is pretty straightforward, but to make it more concrete, let’s work with some actual data.
Below error is because of missing value in
Error in as.POSIXct.numeric(value) : 'origin' must be supplied
So you can try something like this. Here I have excluded all
POSIXct columns to replace remaining column's
library(tidyverse) library(lubridate) LPAv0.3 %>% mutate_at(vars(-one_of(names(.)[sapply(., is.POSIXct)])), funs(replace(., is.na(.), 999999)))
The Tidyverse, This book is an introduction to functional programming and unit testing with the but there is an aspect that I really like about them: if they fail to read your data List of 2 ## $ :'data.frame': 32 obs. of 11 variables: ## . You should find the mtcars.xlsx inside your working directory. gasoline %>% mutate_all(as.character). xlsread does not change the data stored in the spreadsheet. When xlsread calls the custom function, it passes a range interface from the Excel application to provide access to the data. The custom function must include this interface both as an input and output argument.
Date columns in your file are read as a POSIXct variable by readxls. If you write this to a csv and read it in again, this column is read as a factor (or character if you use stringsAsFactors = FALSE). If you have any missing values in a column with a POSIX or Date class, you need to think carefully about what you are replacing them with. If you replace missing values with 999999 that needs to be converted to a date value, which in turn requires an origin. Any method you use will cause problems with this. If you do not have any missing values in date columns (as in your sample data), but they are confined to other (numeric or text) columns, then a simple solution is:
LPAv0.4.2 <- LPAv0.3 LPAv0.4.2 [is.na(LPAv0.4.2 )] <- 999999
You don't have to use a tidyverse verb for everything :-) I appreciate I am not completely answering your question as to WHY the code you gave gives an error, even when there are no missing date values. Incidentally, being part of the tidyverse, readxls will give you a tibble, whereas read.csv will give you a normal data frame. That will not make a difference in this case, but I thought I would just point that out in case it causes other issues, e.g. with indexing.
[PDF] Data Manipulation in R, The function for reading a CSV is read_csv() and by default you only need to the CSV and read the data into a data.frame such that it matches those best I have a data frame (df) that I initially read in from xlsx document. I am trying to create a new df with all the missing values replaced by 999999. When I run the following command: LPAv0.4.2 &
Thanks for the communal brains—I truly appreciated it.
So, I used the following line from this post referencing
LPAv0.4.2 <- LPAv0.3 %>% mutate_if(is.numeric, funs(if_else(is.na(.), 999, .)))
And it works as expected. I appreciate all the input. Thanks all for allowing me to syphon some knowledge!
Mutate multiple columns, A data frame. By default, the Grouping variables covered by explicit selections in mutate_at() and transmute_at() are always an error. nms <- setdiff(nms, group_vars(data)) data %>% mutate_at(vars, myoperation) if there is only one unnamed variable, the names of the functions are used to name the created columns. Dismiss Join GitHub today. GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Compute and Add new Variables to a Data Frame in R, The functions mutate_all() / transmute_all() , mutate_at() / transmute_at() and mutate_if() / transmute_if() can be used to modify multiple columns at once. The Summarise and mutate multiple columns. To apply a function to all variables, use mutate_all() To apply a function to a selection of variables,
[PDF] 2019 / cheatsheets, RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • email@example.com of error. Click next to line number to add/remove a breakpoint. Built with `r getRversion()` Read Tabular Data - These functions share the common arguments: ggplot2 is based on the grammar of graphics, the idea. Note that the filter, mutate and select functions do not include the data table name making the chunk of code less cluttered and easier to read. The input data table dat appears just once at the beginning of the pipe.
column type conversion warnings can be overwhelming · Issue #361 , If readxl guesses that these are in fact numeric columns or if the Reading the excel file takes virtually forever if the number of rows is to be a memory leak because after a couple of failed and interrupted I don't know whether this would be more efficient but my function is rather fast with a big dataframe: mutate_if.grouped_df <- function(.data,.if, ) # This function is currently extremely unelegant and inefficient # Problem: when transforming to data.table row order will be changed