When i convert my numpy array to Dataframe it update values to Nan

convert numpy array to pandas dataframe with column name
build a numpy array from the dataframe
numpy array to dataframe column
numpy fill missing values
converting array to dataframe in python
pandas create dataframe with index
import impyute.imputation.cs as imp

Data = pd.DataFrame(data = imp.em(Data),columns = columns)

When i do the above code all my values gets converted to Nan as below,Can someone help me where am i going wrong?


     Time  LymphNodeStatus    ...      MeanPerimeter  TumorSize
0      31              5.0    ...             117.50        5.0
1      61              2.0    ...             122.80        3.0
2     116              0.0    ...             137.50        2.5
3     123              0.0    ...              77.58        2.0
4      27              0.0    ...             135.10        3.5
5      77              0.0    ...              84.60        2.5


     Time  LymphNodeStatus    ...      MeanPerimeter  TumorSize
0     NaN              NaN    ...                NaN        NaN
1     NaN              NaN    ...                NaN        NaN
2     NaN              NaN    ...                NaN        NaN
3     NaN              NaN    ...                NaN        NaN
4     NaN              NaN    ...                NaN        NaN
5     NaN              NaN    ...                NaN        NaN


Solution first

Instead of passing columns to pd.DataFrame, just manually assign column names:

data = pd.DataFrame(imp.em(data))
data.columns = columns


Error lies in Data = pd.DataFrame(data = imp.em(Data),columns = columns).

imp.em has a decorator @preprocess which converts input into a numpy.array if it is a pandas.DataFrame.

if pd_DataFrame and isinstance(args[0], pd_DataFrame):
    args[0] = args[0].as_matrix()
    return pd_DataFrame(fn(*args, **kwargs))

It therefore returns a dataframe reconstructed from a matrix, having range(data.shape[1]) as column names.

And as I have pointed below, when pd.DataFrame is instantiated with mismatching columns on another pd.DataFrame, all the contents become NaN.

You can test this by

from impyute.util import preprocess

def test(data):
    return data

data = pd.DataFrame({"time": [1,2,3], "size": [3,2,1]})
columns = data.columns

data = pd.DataFrame(test(data), columns = columns))

size    time
0   NaN NaN
1   NaN NaN
2   NaN NaN

When you instantiate a pd.DataFrame from an existing pd.DataFrame, columns argument specifies which of the columns from original dataframe you want to use.

It does not re-label the dataframe. Which is not odd, just the way pandas intended in reindexing

By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

# Make new pseudo dataset
data = pd.DataFrame({"time": [1,2,3], "size": [3,2,1]})
    size    time
0   3   1
1   2   2
2   1   3

#Make new dataset with original `data`
data = pd.DataFrame(data, columns = ["a", "b"])
a   b
0   NaN NaN
1   NaN NaN
2   NaN NaN

NumPy Array manipulation: reshape() function, How do you convert an array to a DataFrame in Python? My goal is to perform a 2D histogram on it. Replace all values of -999 with NAN. RasterToNumPyArray supports the direct conversion of a multidimensional raster dataset to NumPy array. Replace the NaN values in the dataframe (with a 0 in this case) #Now, we can replace them df = df. Subscribe via email.

There may be some bug in impyute library. You are using em function which is nothing but a way to fill-missing values by expectation-maximization algorithm. You can try without using that function, as

df = pd.DataFrame(data = Data ,columns = columns)

You can raise this issue here after confirming. To confirm first load the data, using above example and find if there are null data present in the data by using df.isnull() method.

How to create Pandas DataFrame from a Numpy array in Python, How do I turn a data frame into an array? If you need to specify the data types on a dataframe you already created you can use. This is a quick solution in case you want to convert more columns of your Pandas DataFrame df from float to integer considering also the case that you can have NaN values. replace(' ',0, regex=True) # convert it back to numpy array X_np = X_replace.

Data = pd.DataFrame(data = np.array(imp.em(Data)),columns = columns)

Doing this solved the issue i was facing, i guess the data after the use of em function doesn't return numpy array.

Convert pandas dataframe to NumPy array, To convert a pandas dataframe into a NumPy array you can use df.values in your code just add .values() with the rename_axis() function and  Count function counting only last line of my list. python,python-2.7. I don't know what you are exactly trying to achieve but if you are trying to count R and K in the string there are more elegant ways to achieve it. But for your reference I had modified your code. N = int(raw_input()) s = [] for i in range(N):

Pandas Dataframe.to_numpy(), data : numpy ndarray (structured or homogeneous), dict, or DataFrame as_matrix([columns]), Convert the frame to its Numpy-array representation. Add two DataFrame objects and do not propagate NaN values, so if for a pct_change([periods, fill_method, limit, freq]), Percent change over given number of periods. (4) For an entire DataFrame using numpy: df.replace(np.nan,0) Let’s now review how to apply each of the 4 methods using simple examples. 4 cases to replace NaN values with zeros in pandas DataFrame Case 1: replace NaN values with zeros for a column using pandas. Suppose that you have a single column with the following data:

pandas.DataFrame, The first sentinel value used by Pandas is None , a Python singleton object that is array to floating point, Pandas automatically converts the None to a NaN value. works quite well in practice and in my experience only rarely causes issues. We cannot drop single values from a DataFrame ; we can only drop full rows or  I want to convert one Numpy array to a tuple. How do I obtain the index list in a NumPy Array of all the NaN values present using Python? How to set dataframe

Handling Missing Data, We will create a temperature DataFrame, in which some data is not defined, i.e. NaN. We will use and change the data from the the temperatures.csv file:. Previous: Write a NumPy program to convert a numpy array to an image. Display the image. Display the image. Next: Write a NumPy program to create a Cartesian product of two arrays into single array of 2D points.

  • My column names are correct as i used Data.columns and used the result to store the column names in a list named 'columns'
  • @JACK Happy to help. If any answer solved your issue, please mark it as accepted.
  • Yes yes sure sorry didn't knew about that,my bad