Remove NAs in each columns separatly and join them - Python

Related searches

For example, if I have this data:

X1  X2  X3 
a   b   Na 
Na  Na  Na 
b   Na  a 
c   c   Na

The final result would be something like:

    X1  X2  X3 
    a   b   a
    b   c   Na
    c   Na  Na

I tried this funcion:

df.apply(lambda x: pd.Series(pd.unique(x)))

but I get:

    X1  X2  X3 
    a   b   Na 
    b   c   a 
    c   Na  

How can I use the function but implementing ignore the NAs in pd.unique(x)

Thanks!

I think you need Series.dropna:

df = df.apply(lambda x: pd.Series(x.dropna().to_numpy()))

print (df)
  X1   X2   X3
0  a    b    a
1  b    c  NaN
2  c  NaN  NaN

For improve performance is possible use a bit changed justify function by Divakar:

def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        #change to notnull
        mask = pd.notnull(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    #change dtype to object
    out = np.full(a.shape, invalid_val, dtype=object)  
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

df = pd.DataFrame(justify(df.values, invalid_val=np.nan, side='up', axis=0), 
                  columns=df.columns).dropna(how='all')
print (df)
  X1   X2   X3
0  a    b    a
1  b    c  NaN
2  c  NaN  NaN

Group By: split-apply-combine — pandas 0.25.0.dev0+752 , Filling NAs within groups with a value derived from each group. For DataFrame objects, a string indicating a column to be used to group. Of course df.groupby('A ') is just syntactic sugar for df.groupby(df['A']) , but it makes life simpler. Calling the standard Python len function on the GroupBy object just returns the length of � I am trying to remove spaces from a dataframe I have. The columns names look like below. I am trying to get the spaces between name out and replace it with "_" wherever present. ['join_date' '

IIUC here's a NumPy based approach:

import numpy as np
a = np.take_along_axis(df.values, df.isna().values.argsort(0), 0)
pd.DataFrame(a, columns=df.columns)

    X1   X2   X3
0    a    b    a
1    b    c  NaN
2    c  NaN  NaN
3  NaN  NaN  NaN

Double check your missing values are actual np.nans, otherwise you can use:

df.replace('Na', float('nan'), inplace=True)

Pandas Join vs. Merge. What Do They Do And When Should We , I write a lot about statistics and algorithms, but getting your data ready for modeling We can create two separate dataframes from the dictionaries like so: The join method takes two dataframes and joins them on their indexes In:# Drop NAs in region column When and Why to Use := Over = in Python. 1 Save in DataFrame unique values for every column Jan 16 1 Plot density chart with Age and Sex Jan 18 1 Remove NAs in each columns separatly and join them - Python Jan 27

df.apply(lambda x: x.dropna().reset_index(drop=True))

Or:

df.apply(lambda x: x.dropna().tolist()).apply(pd.Series).T


    X1  X2  X3
0   a   b   a
1   b   c   NaN
2   c   NaN NaN

NA handling in unite � Issue #203 � tidyverse/tidyr � GitHub, So I was trying to combine the columns into one and remove the NA to see things It's just a guess, but the need to separately update tidyselect from tidyr was� How to drop column by position number from pandas Dataframe? You can find out name of first column by using this command df.columns[0]. Indexing in python starts from 0. df.drop(df.columns[0], axis =1) To drop multiple columns by position (first and third columns), you can specify the position in list [0,2].

Another solution, using pd.concat():

print( pd.concat((df[c].dropna().reset_index(drop=True) for c in df.columns), axis=1) )

Prints:

  X1   X2   X3
0  a    b    a
1  b    c  NaN
2  c  NaN  NaN

Separate a character column into multiple columns with a regular , Separate a character column into multiple columns with a regular expression or numeric locations separate( data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ) NB: this will cause string "NA" s to be converted to NA s. extra "merge": only splits at most length( into) times. We assign the variable (arbitrarily named) row to each element of L (i.e. each nested list). Then, we delete an element of row using del. You might want to read about Python basics such as for-loops. – arshajii Apr 2 '14 at 2:10

I would just add the function .dropna() to yours:

df.apply(lambda x: pd.Series(pd.unique(x.dropna())))

hope this helps

15 Easy Solutions To Your Data Frame Problems In R, Discover how to create a data frame in R, change column and row names, access values, However, it's a list with vector structures of the same length. Next, you just combine the vectors that you made with the data.frame() function: To remove all rows that contain NA-values, one of the easiest options is to use the� Going From a List to a String in Python With .join() There is another, more powerful, way to join strings together. You can go from a list to a string in Python with the join() method. The common use case here is when you have an iterable—like a list—made up of strings, and you want to combine those strings into a single string.

We can Remove or Delete a specified column or sprcified columns by drop() method. Suppose df is a dataframe. Column to be removed = column0. Code: df = df.drop(column0, axis=1) To remove multiple columns col1, col2, . . . , coln, we have to insert all the columns that needed to be removed in a list. Then remove them by drop() method. Code:

newdf = df[df.columns[2:4]] # Remember, Python is 0-offset! The "3rd" entry is at slot 2. As EMS points out in his answer, df.ix slices columns a bit more concisely, but the .columns slicing interface might be more natural because it uses the vanilla 1-D python list indexing/slicing syntax. WARN: 'index' is a bad name for a DataFrame column.

right_on: Columns or index levels from the right DataFrame or Series to use as keys. Can either be column names, index level names, or arrays with length equal to the length of the DataFrame or Series. left_index: If True, use the index (row labels) from the left DataFrame or Series as its join key(s). In the case of a DataFrame or Series with

Comments
  • Great! Exactly what I was looking for