Input missed values with mean of nearest neighbors in column

k nearest neighbors missing values
knn imputation for categorical variables python
knn imputation in r
knn for categorical data python
knn missing value imputation python
knnimpute python
from sklearn.impute import knnimputer
knnimpute for categorical data

I have a DataFrame:

df = pd.DataFrame(data=[676, 0, 670, 0, 668], index=['2012-01-31 00:00:00','2012-02-29 00:00:00',
                                                     '2012-03-31 00:00:00','2012-04-30 00:00:00',
                                                     '2012-05-31 00:00:00'])  
df.index.name = "Date"
df.columns = ["Number"]

Which looks like:

              Number
Date    
2012-01-31 00:00:00 676
2012-02-29 00:00:00 0
2012-03-31 00:00:00 670
2012-04-30 00:00:00 0
2012-05-31 00:00:00 668

How can i input 2nd and 4th values with (676+670)/2 and (670+668)/2 correspondinly?

I can save values as np.array and imput them in array, but that's rediculous!

I use where method and specify to replace any 0 with np.nan. Once we have specified 0 to be NaN we can use fillna method. By using ffill and bfill we fill all NaN with the corresponding previous and proceeding values, add them, and divide by 2.

df.where(df.replace(to_replace=0, value=np.nan),
 other=(df.fillna(method='ffill') + df.fillna(method='bfill'))/2)

                     Number
Date                       
2012-01-31 00:00:00   676.0
2012-02-29 00:00:00   673.0
2012-03-31 00:00:00   670.0
2012-04-30 00:00:00   669.0
2012-05-31 00:00:00   668.0

kNN Imputation for Missing Values in Machine Learning, How to impute missing values with nearest neighbor models as a data input values, and a value to be present for each row and column in a dataset. used row average method (as well as filling missing values with zeros). 3 Input missed values with mean of nearest neighbors in column May 22 '17 3 pandas group by, aggregate using multiple agg functions on input columns May 16 '17 2 split python dataframe in to equal numbers based on the unique values of a column May 10 '17

#use apply to fill the Number with average from surrounding rows.
df['Number'] = df.reset_index().apply(lambda x: df.reset_index()\
                               .iloc[[x.name-1,x.name+1]]['Number'].mean() \
                               if (x.name>0) & (x.Number==0) else x.Number,axis=1).values

df
Out[1440]: 
                     Number
Date                       
2012-01-31 00:00:00   676.0
2012-02-29 00:00:00   673.0
2012-03-31 00:00:00   670.0
2012-04-30 00:00:00   669.0
2012-05-31 00:00:00   668.0

Impute missing data using nearest-neighbor method, If the corresponding value from the nearest-neighbor column is also NaN , the next NaN s in Data with a weighted mean of the k nearest-neighbor columns. knnimpute replaces the (3,1) entry of column 1 with the corresponding entry from � Input . The data file contains N rows of X, Y coordinates and Z values. Output . The output lists the total number of points in the files, the minimum, maximum, mean, standard deviation, skewness and kurtosis values for X, Y and Z. 2. Nearest Neighbor Analysis

@spies006 answer can be adapted to:

df.where(df.replace(to_replace=0, value=np.nan).isna(), other=(df.fillna(method='ffill') + df.fillna(method='bfill'))/2)

It can be simplified to this:

df.where(df.values == 0, other=(df.fillna(method='ffill') + df.fillna(method='bfill'))/2)

The use of KNN for missing values | by Yohan Obadia, The assumption behind using KNN for missing values is that a point Euclidean is a good distance measure to use if the input variables are 1- use the global mean for numeric column and global mode for categorical ones. A standard technique is the mean of the column itself (counting only non-missing values, of course and you can easily do it in Matlab thanks to the nanmean() function). On StackOverflow I posted an answer, which you can find here , regarding several missing data imputation techniques.

sklearn.impute.KNNImputer — scikit-learn 0.23.1 documentation, Imputation for completing missing values using k-Nearest Neighbors. Each sample's missing values are imputed using the mean value from n_neighbors nearest Input data, where n_samples is the number of samples and n_features is the� 4 Input missed values with mean of nearest neighbors in 3 Python pandas create new column with groupby with 2 Setting values of multiindex dataframe using

[MRG] Added k-Nearest Neighbor imputation for missing data � Issue , This PR implements a k-Nearest Neighbor based missing data imputation algorithm The mean coordinate value of the neighbors is then used to impute the missing coordinate value is not possible, then those could be imputed based on column means too. msg = "Input contains NaN, infinity or a value too large for %r. K-nearest imputation: Similar to the two above, find the k-nearest neighbors for a sample where you have missing data, and use the mean/mode of the k-nearest neighbors.

KNNImputer for Missing Value Imputation in Python using scikit , Each sample's missing values are imputed using the mean value from n_neighbors nearest neighbors found in the training set. Two samples� Nearest-neighbor definition is - using the value of the nearest adjacent element —used of an interpolation technique. How to use nearest-neighbor in a sentence.

Comments
  • could you please provide a little explain?
  • @LadenkovVladislav Does that make sense now?
  • Thanks! Wonderfull methods!
  • Is this possible for more than one NaN value ? How would that look like?