Import with xlsx package in R gives NA, <NA> and empty entries, can´t delete NA values

xlsx r
how to read multiple excel files in r
import multiple excel sheets into r
read_xlsx r
r read xlsx from url
r read xlsx date format
rstudio read xlsx
readxl package in r

I am importing data from an xlsx (https://www.dropbox.com/s/r5sn5pio5rnprdq/gesammelte%20Daten_1707.xlsx) file with read.xlsx

setwd("C:/***//Kultivierungen//1707_ADH//")
pH <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="pH")
OD <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="OD")
Glc <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="Glucose")
Ac <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="Acetate")

I want to delete the NA Values with

OD <- OD[rowSums(is.na(OD))==0,]
Glc <- Glc[rowSums(is.na(Glc))==0,]
Ac <- Ac[rowSums(is.na(Ac))==0,]
pH <- pH[rowSums(is.na(pH))==0,]

..which works fine for the OD and pH data, but not for Ac and Glc. The result before deleting the NA Values looks like this:

  time.in.h               SPL1 SPL1_Error               SPL2 SPL2_Error               SPL3 SPL3_Error
1  0.000000               <NA>       <NA>               <NA>       <NA>               <NA>       <NA>
2  1.502222               <NA>       <NA>               <NA>       <NA>               <NA>       <NA>
3  3.687778 0.0602636534839925       0.06 0.0502197112366604       0.09 0.0301318267419962       0.03
4 10.248889                                                                                          
5 16.248333  0.118460019743337       0.06 0.0829220138203356       0.12  0.106614017769003       0.18
6 21.653056 0.0644511581067472       0.03 0.0161127895266868       0.15 0.0483383685800604       0.12
7 29.653333                                                                                          
8 37.652778                                                                                          
9 43.391667  0.342347696879643       0.18  0.271025260029718       0.18  0.727488855869242       0.24

And after deleting the NA Values..:

  time.in.h               SPL1 SPL1_Error               SPL2 SPL2_Error               SPL3 SPL3_Error
3  3.687778 0.0602636534839925       0.06 0.0502197112366604       0.09 0.0301318267419962       0.03
4 10.248889                                                                                          
5 16.248333  0.118460019743337       0.06 0.0829220138203356       0.12  0.106614017769003       0.18
6 21.653056 0.0644511581067472       0.03 0.0161127895266868       0.15 0.0483383685800604       0.12
7 29.653333                                                                                          
8 37.652778                                                                                          
9 43.391667  0.342347696879643       0.18  0.271025260029718       0.18  0.727488855869242       0.24

str() returns the following:

> str(Glc)
'data.frame':   9 obs. of  17 variables:
 $ time.in.h : num  0 1.5 3.69 10.25 16.25 ...
 $ SPL1      : Factor w/ 5 levels "","0.0602636534839925",..: NA NA 2 1 4 3 1 1 5
 $ SPL1_Error: Factor w/ 4 levels "","0.03","0.06",..: NA NA 3 1 3 2 1 1 4
 $ SPL2      : Factor w/ 5 levels "","0.0161127895266868",..: NA NA 3 1 4 2 1 1 5
 $ SPL2_Error: Factor w/ 5 levels "","0.09","0.12",..: NA NA 2 1 3 4 1 1 5

It has worked fine before with a different set of data/xlsx file, I tried to rule out all format-issues in the xlsx file as well, but couldn´t find anything....anyone had this before?

It seems that the empty cells in the Glucose and Acetate-sheet are recognized as text, although I am not sure why (Excel is not really my expertise..).

When I replace the empty cells in a column in the xlsx-file with 0 and then I delete those 0's again read.xlsx does import it as numeric vector instead of a factor and assigns NA to the empty cells. Then, you can use data <- data[rowSums(is.na(data))==0,] to remove the rows that contain NA's.

Can't tell you what exactly is going on here, but the above solution seems to work.

Writing Data From R to Excel Files (xls|xlsx) - Easy Guides, This tutorial on reading and importing Excel files into R will give an overview of XLConnect; xlsx Package; gdata Package; Readxl Package made in your Excel file to avoid extra columns or NA's to be added to your file;  A more recent entrant into the ranks of the .xlsx and .xls data importing libraries is Hadley Wickham’s 2016 readxl package. By leveraging R’s established RapidXML C++ library, readxl can increase both the speed and the ease of loading excel documents into R.

I'm not completely sure what's your problem but here's some fixes:

Do you want to replace NA values with zeros? In that case you could try:

Glc[is.na(Glc)] <- 0

Or do you want to delete the observations with NAs? In that case it seems to be working correctly for the first two observations. But e.g. the fourth observation seems to have something else (zero?) than NA in the xlsx-file.

R Tutorial on Reading and Importing Excel Files into R, Imports rJava, xlsxjars, grDevices, utils. LazyLoad The xlsx package gives programatic control of Excel files using R. A high level API allows the user In R character NA's are simply characters, which for Excel means. "NA"  Data[Data=="no info"] <- NA Or, transform your data to a csv and use read.csv, or as commented use another package that take care of missing values. Edit use XLConnect package: The more performant XLConnect package takes care of missing values using setMissingValue function. Here the equivalent code can be written as:

I had the same problem and worked it out. In Excel, instead of delete use "Clear All" then save your Excel file and import it R. This way R does not show any N/A.

[PDF] Package 'xlsx', read.xlsx( xlsxFile, sheet = 1, startRow = 1, colNames = TRUE, rowNames = FALSE, A character vector of strings which are to be interpreted as NA. "​readTest.xlsx", package = "openxlsx") df1 <- read.xlsx(xlsxFile = xlsxFile, sheet = 1,  The xlsx package gives programmatic control of Excel files using R. A high-level API allows the user to read a sheet of an xlsx document into a data.frame and write a data.frame to a file . Lower level functionality permits the direct manipulation of sheets, rows, and cells.

read.xlsx function, read.xlsx( xlsxFile, sheet = 1, startRow = 1, colNames = TRUE, rowNames = FALSE, A character vector of strings which are to be interpreted as NA. Blank cells  Another method to Import Data in R from Excel is using xlsx package, which I used to access Excel files. The first row should contain variable names. It is necessary that while using read.xlsx function, we mention the sheet index or the sheet name. If the required dataset is bigger, then read.xlsx2() function is used.

An Update on Importing Excel Data in R, This post provides an update on importing spreadsheets into R and This code uses the httr package to read in a .xls file from a url that is  Importing Excel files into R using readxl package. The readxl package, developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.

Importing Data with RStudio – RStudio Support, Importing data into R is a necessary step that, at times, can become time intensive. To ease this task, RStudio includes new features to import data from: csv, xls, xlsx, sav, and in general, character delimited files using the readr package. an specific Excel sheet; Skip the first N rows; Select NA identifiers. There are many solutions to import and export Excel files using R software. The different ways to connect R and Excel has been already discussed in our previous article [R Excel essentials : Read, write and format Excel files using R]. xlsx package is one of the powerful R packages to read, write and format Excel files. It is a java-based solution and it is available for Windows, Mac and Linux.

Comments
  • There seem to be empty cells in your data (e.g. SPL1, row 4,7,8), which I don't understand. Usually when there are empty cells in the xlsx-file that is imported, read.xlsx defines those as NA. Can you provide a reproducible example (of both your data in xlsx-file and data.frame in R)? Also see: stackoverflow.com/questions/5963269
  • This is what a sheet in my xlsx file looks like: > d1 <- rbind(c("time", "Sample", "Error"), c("","",""), c(1,3.4,0.5),c("","",""), c(1,3.4,0.5)) > d1 [,1] [,2] [,3] [1,] "time" "Sample" "Error" [2,] "" "" "" [3,] "1" "3.4" "0.5" [4,] "" "" "" [5,] "1" "3.4" "0.5"
  • I want to delete the rows with empty cells. Normally R assigns NA to those cells, for whatever reason, read.xlsx only imports it this way: > d1_import <- rbind(c("time", "Sample", "Error"), c("<NA>","<NA>","<NA>"), c(1,3.4,0.5),c("","",""), c(1,3.4,0.5)) > d1_import [,1] [,2] [,3] [1,] "time" "Sample" "Error" [2,] "<NA>" "<NA>" "<NA>" [3,] "1" "3.4" "0.5" [4,] "" "" "" [5,] "1" "3.4" "0.5"
  • Are any of the cells in excel text-formatted? (read.xlsx imports those as factor) If so you first have to convert those to numbers (see office.microsoft.com/en-001/excel-help/…). And do you use "" in Excel? (that will also be recognized as text and transformed to factor)
  • that was one of the first things I did, I also removed all of the formulas etc, just pasted the values back into the sheet and afterwards I checked the correct format again.. dropbox.com/s/r5sn5pio5rnprdq/gesammelte%20Daten_1707.xlsx
  • Thanks a lot for your help. I did a small manual workaround in the meantime as well...I deleted the whole Glucose and Acetate sheet and re-pasted only the cells containing numbers (or text for the first row) and it worked too... funky
  • Replacing the NA-Values (or better deleting the whole row containing NA-values) works perfectly fine. The Problem is that read.xlsx only writes NA-values for the first two empty rows in the Glc dataset but not for the following empty rows. It works fine in a different dataset that is in the same xlsx file :/