Substitute NA values depending of position in dataframe

Related searches

I would like to substitute the NA values by a previous and posterior rows average values. Moreover, when the first or last lines are NA values I would like just the repeat next and before rows, accordingly. My real data have negative and decimals values.

My input:

1.0   NA    1.0
NA    2.0   2.0
3.0   3.0   NA

My expected output:

1.0   2.0   1.0
2.0   2.0   2.0
3.0   3.0   2.0

Cheers!

You could also use the na.approx function from the zoo package. Note that this has a slightly different behavior (than the solution by @flodel) when you have two consecutive NA values. For the first and last row you could then use na.locf.

y <- na.approx(x)
y[nrow(y), ] <- na.locf(y[(nrow(y)-1):nrow(y), ])[2, ] 
y[1, ] <- na.locf(y[1:2,], fromLast=TRUE)[1, ] 

EDIT: @Grothendieck pointed out that this was much too complicated. You can combine the entire code above into one line:

na.approx(x, rule=2)

pandas.DataFrame.replace — pandas 1.1.1 documentation, Fill NA values. DataFrame.where. Replace values based on boolean condition. Series.str.replace. Simple string replacement. Notes. Regex substitution is� inplace: Boolean value. Makes the changes in passed data frame itself if True. kind: String which can have three inputs(‘quicksort’, ‘mergesort’ or ‘heapsort’) of algorithm used to sort data frame. na_position: Takes two string input ‘last’ or ‘first’ to set position of Null values. Default is ‘last’.

All vectorized after turning your data into a matrix (which will also make computation faster):

x <- matrix(c(2, NA, 3, NA, 2, 3, 1, 2, NA), 3, 3)

p <- rbind(tail(x, -1), NA) # a matrix of previous value
n <- rbind(NA, head(x, -1)) # a matrix of next value
m <- matrix(rowMeans(cbind(as.vector(p),
                           as.vector(n)), na.rm = TRUE), nrow(x)) # replacements

ifelse(is.na(x), m, x)

Working with missing data — pandas 1.1.1 documentation, The actual missing value used will be chosen based on the dtype. For example, numeric Replace NA with a scalar value. In [42]: df2 The labels of the dict or index of the Series must match the columns of the frame you wish to fill. The use case of this is to fill a DataFrame with the mean of that column. In [49]: dff = pd. where. myDataframe is the dataframe in which you would like replace all NAs with 0.; is, na are keywords.; Example – Replace NAs with 0 in R Dataframe. In this example, we will create an R dataframe, DF1, with some of the values being NA.

Quite simple to solve:

library(imputeTS)
na.interpolation(x)

That's it already.

Pandas Tutorial: Replacing Values in DataFrames and Series, Matplotlib Tutorial: replace, at, loc to change values. accessing the job of Bill: print(df.loc['Bill', 'job']) # alternative way to access it with at: print(df.at['Bill', 'job']) # setting the job of Bill Similar to loc , in that both provide label-based lookups. Using these methods either you can replace a single cell or all the values of a row and column in a dataframe based on conditions .

R Replace NA with 0 (10 Examples for Data Frame, Vector & Column), A common way to treat missing values in R is to replace NA with 0. Choose one of these approaches according to your specific needs. + # Create ggplot geom_point(aes(col = colours , size = 1.1)) + theme(legend.position = "none") ggp� To use a dict in this way the value parameter should be None. For a DataFrame a dict can specify that different values should be replaced in different columns. For example, {'a': 1, 'b': 'z'} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value.

Pandas Coalesce, Pandas Coalesce - How to Replace NaN values in a dataframe if Hourly rate is missing then apply Daily rate and if Daily is missing then apply Weekly. Using those index find if any of the value is null then replace that with the first bfill will replace that NaN value with the next row or column based on� Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. ‘any’ : If any NA values are present, drop that row or column. ‘all’ : If all values are NA, drop that row or column. thresh int, optional. Require that many non-NA values. subset array-like, optional

Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). Returns DataFrame. Mask of bool values for each element in DataFrame that indicates whether an element is not an NA value.

Comments
  • to clarify, the NA in column 1 is replaced by the mean of the two values immediately above and below (1.0 and 3.0) or the mean of the two complete rows above and below (mean(c(1.0, NA, 1.0, 3.0, 3.0, NA))?
  • Yes, is the mean between two values immediately above and below, not the entire collumn! It is your question? Thank you for help.
  • 'substitute value with average of previous and next' is called interpolation. And 'repeat last non-NA' is called filling, with carry-forward/backward
  • Anyway, this takes information in same collumn of NA to replace it, right? Always based in above and below values... What´s is the difference to consecutive values?
  • or just: na.approx(x, rule = 2) or na.approx(x, rule = 2, method = "constant") depending on what you want.
  • Suppose in one column, you have 1, 2, NA, NA, 5. Then na.approx will give you 1, 2, 3, 4, 5. @flodel's answer will give you 1, 2, 2, 5, 5. Both seem like reasonable answers, just to slightly different questions.