Finding Mean of a column in an R Data Set, by using FOR Loops to remove Missing Values

replace missing values with mean in r
r replace missing values with na
replace missing values categorical variables in r
count missing values in r
replacing missing values with mean
percentage of missing values in r
impute missing values in r
na.rm in r

I have a data set with Air Quality Data. The Data Frame is a matrix of 153 rows and 5 columns. I want to find the mean of the first column in this Data Frame. There are missing values in the column, so I want to exclude those while finding the mean. And finally I want to do that using Control Structures (for loops and if-else loops)

I have tried writing code as seen below. I have created 'y' instead of the actual Air Quality data set to have a reproducible example.

y <- c(1,2,3,NA,5,6,NA,NA,9,10,11,NA,13,NA,15)
x <- matrix(y,nrow=15)

for(i in 1:15){
   if(is.na(data.frame[i,1]) == FALSE){
   New.Vec <- c(x[i,1])
   }
}
print(mean(New.Vec))

I expected the output to be the mean. Though the error I received is this:

Error: object 'New.Vec' not found

can't see your data, but probably like this? the vector needed to be initialized. better to avoid loops in R when you can...

myDataFrame <- read.csv("hw1_data.csv")

New.Vec <- c()    
for(i in 1:153){
   if(!is.na(myDataFrame[i,1])){
      New.Vec <- c(New.Vec, myDataFrame[i,1])
   }
}
print(mean(New.Vec))

How to Deal with Missing Values in R, How do I find missing values in a column in R? Mean of a column in R can be calculated by using mean() function. Mean() Function takes column name as argument and calculates the mean value of that column. Mean of single column in R, Mean of multiple columns in R using dplyr. Get row wise mean in R. Let’s see how to calculate Mean in R with an example

One line of code, no need for for loop.

mean(data.frame$name_of_the_first_column, na.rm = TRUE)

Setting na.rm = TRUE makes the mean function ignore NAs.

Exclude Missing Values · UC Business Analytics R Programming , Missing values in data science arise when an observation is missing in This dataset has many NA that need to be taken care of. We can drop them with the na.omit(). We will use the apply method to compute the mean of the column ELSE, ELSE IF Statement · 13) For Loop in R · 14) While Loop in R  Example 4: Apply mean Function to Real Data. So far, we have only used a simplified example vector. This example shows how to apply the mean function to the column of a real data set. For the example, I’m going to use the Iris data set, which can be loaded to RStudio as follows:

Here, we can make use of na.aggregate from zoo

library(zoo)
df1[] <- na.aggregate(df1)

Assuming that 'df1' is a data.frame with all numeric columns and wanted to fill the NA elements with the corresponding mean of that column. na.aggregate, by default have the fun.aggregate as mean

How to Replace Missing Values(NA) in R: na.omit & na.rm, To identify missings in your dataset the function is is.na() . First lets You also can find the sum and the percentage of missings in your dataset with the code below: sum(is.na(dt)) mean(is.na(dt)) 2 0.2222222 Another useful function in R to deal with missing values is na.omit() which delete incomplete  Step 1) Earlier in the tutorial, we stored the columns name with the missing values in the list called list_na. We will use this list . Step 2) Now we need to compute of the mean with the argument na.rm = TRUE. This argument is compulsory because the columns have missing data, and this tells R to ignore them.

R 2.4 - for() Loops and Handling Missing Observations, This video discusses for() loops, which are a structure that can be used to Remove all Duration: 4:38 Posted: Jul 1, 2013 The basic syntax for calculating mean in R is − mean(x, trim = 0, na.rm = FALSE, ) Following is the description of the parameters used − x is the input vector. trim is used to drop some observations from both end of the sorted vector. na.rm is used to remove the missing values from the input vector. Example

Removing NAs in R dataframes, Find out why clear explanation of the dealing with NAs in a data frame - thank you Duration: 10:11 Posted: Apr 24, 2015 2. Delete A Column Of A Data Frame In R Directly. To remove or delete a column of a data frame, we can set that column to NULL which is a reserved word and represents the null object in R. For example, let’s delete the column “hair” of the above data frame:

Dealing with Missing Values · UC Business Analytics R , A common task in data analysis is dealing with missing values. NAs in specific data frame column is.na(df$col4) ## [1] FALSE FALSE FALSE TRUE recode missing values with the mean # vector with missing data x <- c(1:4, NA, 6:7, values will calculate the mathematical operation for all non-missing values mean(​x,  R max and min Functions | 8 Examples: Remove NA Value, Two Vectors, Column & Row . In this article, you will learn how to use min and max in R. I’m going to explain both functions in the same tutorial, since the R syntax of the two functions is exactly the same.

Comments
  • It's easier to help with a reproducible example. For one thing, you don't need to increment i, that's handled by the for loop. For another, it's unclear what you're doing with the <- FALSE part, since there isn't any condition being tested. Maybe you mean ==? I have a feeling the line that assigns New.Vec isn't actually getting evaluated, but can't say for sure without being able to run your code.
  • @camille - Thank you! The pointers help a lot. I have removed the increment i and added the == . However the error New.Vec still exists. I am editing the question to a reproducible example. So you can check it out in a bit and give your inputs :)
  • Now you should be getting an error because you're trying to subset data.frame instead of x. It also seems like you're just reassigning New.Vec each iteration...Either way, in R a loop for something like this should be necessary
  • Unfortunately this produces the same error too. Here's the data, if that provides assistance: <d396qusza40orc.cloudfront.net/rprog/data/quiz1_data.zip>
  • edited, vector needed to be initialized
  • This does work. Since I am new to programming and R, was trying to make it work using loops! but thank you for your help @Ben G :)
  • Thank you for the help! However when I call mean(df1), it returns 'NA' and produces an error saying "argument is not numeric or logical: returning NA", Edited Code that I ran is: df1 <- read.csv("hw1_data.csv") df1[] <- na.aggregate[df1) mean(df1[1])
  • @AshreetSangotra. If it works, please check here
  • @AshreetSangotra. The error is very specific. As I mentioned in the post, I assume that your columns are numeric. If your columnss are not numeric, you may need. to check. why it is not
  • The columns are numeric, with the exception of column names. It produces result for colMeans(). You can find the data file in this comment if that would provide any assistance. <d396qusza40orc.cloudfront.net/rprog/data/quiz1_data.zip>
  • @AshreetSangotra. Can you use dput of few rows to show the example and post it in your question