How to make full matrix from dense pandas dataframe

pandas sparse matrix to dataframe
pandas sparse dataframe to dense
dataframe to scipy matrix
df to csr matrix
python convert dataframe to csr
pandas dataframe convert to sparse
dataframe to sparse matrix r
from_spmatrix

I have a pandas df in the form:

   A  B    C
0  2  1  428
1  4  3   14
2  5  5  177

I wish to have an array, where A are rows, B columns and C values - the tricky part is the array should be full, in sense of indices, so:

[[  0.   0.   0.   0.   0.]
 [428.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.]
 [  0.   0.  14.   0.   0.]
 [  0.   0.   0.   0. 177.]]

and remaining places are filled with zeros. I can do that with series of for loops, but is there any smart way of doing it?

You can use the method put in numpy:

arr = np.zeros((df['A'].max(), df['B'].max()))

idx = (df['A'] - 1) * df['B'].max() + (df['B'] - 1)
arr.put(idx, df['C'])

Output:

[[  0.   0.   0.   0.   0.]
 [428.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.]
 [  0.   0.  14.   0.   0.]
 [  0.   0.   0.   0. 177.]]

If you need a matrix where indices start at zero:

arr = np.zeros((df['A'].max() + 1, df['B'].max() +1 ))

idx = df['A'] * (df['A'].max() + 1) + df['B']
arr.put(idx, df['C'])

Output:

[[  0.   0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.   0.]
 [  0. 428.   0.   0.   0.   0.]
 [  0.   0.   0.   0.   0.   0.]
 [  0.   0.   0.  14.   0.   0.]
 [  0.   0.   0.   0.   0. 177.]]

Sparse data structures — pandas 1.1.0 documentation, The compressed values are not actually stored in the array. In [1]: arr = np. random.randn(10) A DataFrame can have a mixture of sparse and dense columns. I'm searching for an better way to create a scipy sparse matrix from a pandas dataframe.. Here is the pseudocode for what I currently have. row = []; column = []; values = [] for each row of the dataframe for each column of the row add the row_id to row add the column_id to column add the value to values sparse_matrix = sparse.coo_matrix((values, (row, column), shape=(max(row)+1,max(column)+1))

Use DataFrame.pivot with DataFrame.reindex:

s = df['A'].append(df['B'])
r = range(s.min(),s.max()+1)
#r = range(1,6) if you want select a specific range
new_df =( df.pivot(index = 'A',columns = 'B',values = 'C')
           #.pivot(*df) #or this
            .reindex(index = r,columns = r)
            .fillna(0)
            .rename_axis(columns = None,index = None) )

print(new_df)
       1    2     3    4      5
1    0.0  0.0   0.0  0.0    0.0
2  428.0  0.0   0.0  0.0    0.0
3    0.0  0.0   0.0  0.0    0.0
4    0.0  0.0  14.0  0.0    0.0
5    0.0  0.0   0.0  0.0  177.0

Get an array

new_df.to_numpy()
#new_df.values
array([[  0.,   0.,   0.,   0.,   0.],
       [428.,   0.,   0.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.],
       [  0.,   0.,  14.,   0.,   0.],
       [  0.,   0.,   0.,   0., 177.]])

pandas.DataFrame.sparse.from_spmatrix — pandas 1.1.0 , classmethod DataFrame.sparse. from_spmatrix (data, index=None, columns= None)[source]�. Create a new DataFrame from a scipy sparse matrix. I’ll also review the steps to display the matrix using Seaborn and Matplotlib. To start, here is a template that you can apply in order to create a correlation matrix using pandas: df.corr() Next, I’ll show you an example with the steps to create a correlation matrix for a given dataset. Steps to Create a Correlation Matrix using Pandas

I have found one more way to solve this:

from scipy import sparse

sparse.coo_matrix((df['C'], (df['A'], df['B']))).toarray()

Output:

array([[  0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0],
       [  0, 428,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0],
       [  0,   0,   0,  14,   0,   0],
       [  0,   0,   0,   0,   0, 177]])

pandas.DataFrame.sparse.to_dense — pandas 1.1.0 documentation, pandas.DataFrame.sparse.to_dense�. DataFrame.sparse. to_dense ()[source]�. Convert a DataFrame with sparse values to dense. New in version 0.25.0. pandas.DataFrame.sparse.from_spmatrix¶ classmethod DataFrame.sparse. from_spmatrix ( data , index = None , columns = None ) [source] ¶ Create a new DataFrame from a scipy sparse matrix.

Sparse data structures — pandas 0.23.0 documentation, We have implemented “sparse” versions of Series and DataFrame . In [16]: sparr.to_dense() Out[16]: array([-1.9557, -1.6589, nan, nan, nan, 1.1589, 0.1453, � Python Pandas : How to display full Dataframe i.e. print all rows & columns without truncation Varun September 28, 2019 Python Pandas : How to display full Dataframe i.e. print all rows & columns without truncation 2019-09-28T23:04:25+05:30 Dataframe , Pandas , Python 1 Comment

Sparse data structures — pandas 0.24.2 documentation, We have implemented “sparse” versions of Series and DataFrame . In [20]: sparr.to_dense() Out[20]: array([-1.9557, -1.6589, nan, nan, nan, 1.1589, 0.1453, � A DataFrame can have a mixture of sparse and dense columns. As a consequence, assigning new columns to a DataFrame with sparse values will not automatically convert the input to be sparse. # Previous Way >>> df = pd .

Working with sparse data sets in pandas and sklearn, Handling a sparse matrix as a dense one is frequently inefficient, formats and convert our pandas data frame into a scipy sparse matrix. Pandas Series.to_dense() function return dense representation of NDFrame (as opposed to sparse). This basically mean that memory will be allocated to store even the missing values in the dataframe. This basically mean that memory will be allocated to store even the missing values in the dataframe.

Comments
  • I don't know who edited it, but thank you very much for help - sorry for mistakes, it's my very first time here
  • "the tricky part is array should be full, in sense of indices" can you please elaborate on this
  • Don't foget accpet an answer:) meta.stackexchange.com/questions/5234/… @user12774760
  • great, thank you for the tip:)
  • Yes, that's perfect, thank you very much, kind stranger