how to fill missing values in a vector with the mean of value before and after the missing one

Related searches

Currently I am trying to impute values in a vector in R. The conditions of the imputation are.

  • Find all NA values
  • Then check if they have an existing value before and after them
  • Also check if the value which follows the NA is larger than the value before the NA
  • If the conditions are met, calculate a mean taking the values before and after.
  • Replace the NA value with the imputed one
# example one
input_one = c(1,NA,3,4,NA,6,NA,NA)

# example two
input_two = c(NA,NA,3,4,5,6,NA,NA)

# example three
input_three = c(NA,NA,3,4,NA,6,NA,NA)

I started out to write code to detect the values which can be imputed. But I got stuck with the following.

# incomplete function to detect the values
sapply(split(!is.na(input[c(rbind(which(is.na(c(input)))-1, which(is.na(c(input)))+1))]), 
             rep(1:(length(!is.na(input[c(which(is.na(c(input)))-1, which(is.na(c(input)))+1)]))/2), each = 2)), all)

This however only detects the NAs which might be imputable and it only works with example one. It is incomplete and unfortunately super hard to read and understand.

Any help with this would be highly appreciated.


We can use dplyrs lag and lead functions for that:

input_three = c(NA,NA,3,4,NA,6,NA,NA)

library(dplyr)
ifelse(is.na(input_three) & lead(input_three) > lag(input_three),
       (lag(input_three)  + lead(input_three))/ 2,
       input_three)

Retrurns:

[1] NA NA  3  4  5  6 NA NA
Edit

Explanation:

We use ifelse which is the vectorized version of if. I.e. everything within ifelse will be applied to each element of the vectors. First we test if the elements are NA and if the following element is > than the previous. To get the previous and following element we can use dplyr lead and lag functions:

lag offsets a vector to the right (default is 1 step):

lag(1:5)

Returns:

[1] NA  1  2  3  4

lead offsets a vector to the left:

lead(1:5)

Returns:

[1]  2  3  4  5 NA

Now to the 'test' clause of ifelse:

is.na(input_three) & lead(input_three) > lag(input_three)

Which returns:

[1]    NA    NA FALSE FALSE  TRUE FALSE    NA    NA

Then if the ifelse clause evaluates to TRUE we want to return the sum of the previous and following element divided by 2, othrwise return the original element

Fill missing values - MATLAB fillmissing, F = fillmissing( A ,'constant', v ) fills missing entries of an array or table with the If A is a table, then the data type of each column defines the missing value for that Define a vector of non-uniform sample points and evaluate the sine function� F = fillmissing (A,movmethod,window) fills missing entries using a moving window mean or median with window length window . For example, fillmissing (A,'movmean',5) fills data with a moving average using a window length of 5. example. F = fillmissing ( ___,dim) specifies the dimension of A to operate along.


Here's an example using the imputeTS library. It takes account of more than one NA in the sequence, ensures that the mean is calculated if the next valid observation is greater than the last valid observation and also ignores NA at the beginning and end.

library(imputeTS)
myimpute <- function(series) {
    # Find where each NA is
    nalocations <- is.na(series)
    # Find the last and the previous observation for each row
    last1 <- lag(series)
    next1 <- lead(series)
    # Carry forward the last and next observations over sequences of NA
    # Each row will then get a last and next that can be averaged
    cflast <- na_locf(last1, na_remaining = 'keep')
    cfnext <- na_locf(next1, option = 'nocb', na_remaining = 'keep')
    # Make a data frame 
    df <- data.frame(series, nalocations, last1, cflast, next1, cfnext)
    # Calculate the mean where there is currently a NA
    # making sure that the next is greater than the last
    df$mean <- ifelse(df$nalocations, ifelse(df$cflast < df$cfnext, (df$cflast+df$cfnext)/2, NA), NA)
    imputedseries <- ifelse(df$nalocations, ifelse(!is.na(df$mean), df$mean, NA), series)
    #list(df,  imputedseries) # comment this in and return it to see the intermediate data frame for debugging
    imputedseries
}
myimpute(c(NA,NA,3,4,NA,NA,6,NA,NA,8,NA,7,NA,NA,9,NA,11,NA,NA))

# [1] NA NA  3  4  5  5  6  7  7  8 NA  7  8  8  9 10 11 NA NA

Dealing with Missing Values � UC Business Analytics R , In R, missing values are often represented by NA or some other value that the data for the elements that contain that value and then assign a desired value to recode missing values with the mean # vector with missing data x <- c(1:4, NA,� Read data and replace missing values: Copy. > dat <- read.csv("missing-data.csv", na.strings = "")> dat$Income.imp.mean <- ifelse(is.na(dat$Income), mean(dat$Income, na.rm=TRUE), dat$Income) After this, all the NAvalues for Incomewill now be the mean value prior to imputation.


Here's an alternative that uses zoo::rollapply():

library(zoo)

fill_sandwiched_na <- function(f) rollapply(f, 3, FUN = function(x) {
  y <- mean(x[-2]); if(is.na(y)) x[2] else y
}, fill = NA, partial = TRUE)

fill_sandwiched_na(input_one)
[1]  1  2  3  4  5  6 NA NA

fill_sandwiched_na(input_two)
[1] NA NA  3  4  5  6 NA NA

fill_sandwiched_na(input_three)
[1] NA NA  3  4  5  6 NA NA

Fill in missing values with previous or next value — fill • tidyr, Fills missing values in selected columns using the next or previous entry. "up", " downup" (i.e. first down and then up) or "updown" (first up and then down). Details. Missing values are replaced in atomic vectors; NULL s are replaced in lists. A tibble: 16 x 3 #> quarter year sales #> <chr> <dbl> <dbl> #> 1 Q1 2000 66013� Mean imputation is a method replacing the missing values with the mean value of the entire feature column. While this method maintains the sample size and is easy to use, the variability in the data is reduced, so the standard deviations and the variance estimates tend to be underestimated.


Missing Values in R, A missing value is one whose value is unknown. NAs can arise when you read in a Excel spreadsheet with empty cells, for example. The is.na() function will find missing values for you: this function returns a logical vector the same length variables like na.tree.replace(); for numerics, NAs are replaced by the mean of � However, before we can deal with missingness, we need to identify in which rows and columns the missing values occur. In the following, I will show you several examples how to find missing values in R. Example 1: One of the most common ways in R to find missing values in a vector


Matlab function: fillmissing – Fill missing values – iTecTec, missing entries of an array or table with the constant value v . If A is a matrix or Missing values are defined according to the data type of A : a vector of time values, then fillmissing(A,'linear','SamplePoints',t) interpolates the data in A relative� Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.


replace_mean_age = ifelse (is.na (age), average_missing [1], age) replace_mean_fare = ifelse (is.na (fare), average_missing [2],fare) If the column age has missing values, then replace with the first element of average_missing (mean of age), else keep the original values. Same logic for fare.


Missing Completely at Random (MCAR): The fact that a certain value is missing has nothing to do with its hypothetical value and with the values of other variables. Missing not at Random (MNAR): Two possible reasons are that the missing value depends on the hypothetical value (e.g. People with high salaries generally do not want to reveal their