Unexpected behavior in assigning 2d numpy array to pandas DataFrame

dict to pandas dataframe
pandas dataframe from dict of dicts
pandas dataframe from list of dicts
pandas dataframe tutorial
pandas dataframe index
pandas dataframe to series example

I have the following code:

x = pd.DataFrame(np.zeros((4, 1)), columns=['A'])
y = np.random.randn(4, 2)
x['A'] = y

I expect it to throw an exception because of shape mismatch. But pandas silently accepted the assignment: y's first column is assigned to x.

Is this an intentional design? If yes, what is the rationale behind?

I tried both pandas 0.21 and 0.23.


Thanks for those who tried to help. However, nobody gives a satisfactory answer although the bounty is going to expire.

Let me emphasis what is expected as an answer:

  1. whether this design is intentional? Is it a bug ? Is it a false design?
  2. what is the rationale to design it in this way?

Since the bounty is going to expiry, I accepted the most voted answer. But it does not provide a answer to the above questions.

The values in y are un-indexed matrix. The case x['A'] = y works here as it take the first item from the matrix and assign it to the 'A'.

Similarly,

x = pd.DataFrame(np.zeros((4, 2)), columns=['A', 'B'])
y = np.random.randn(4, 2)
x[['A', 'B']] = y

will also work because the extra data is being discarded by pandas. If you're trying to pass less columns, say:

x = pd.DataFrame(np.zeros((4, 2)), columns=['A', 'B'])
y = np.random.randn(4, 1)
x[['A', 'B']] = y

That will also work as it will assign the same values to both the columns. This case is similar to x['A'] = 0 which will replace all the data in column A with zeros.

Unexpected behavior in the `DataFrame.from_dict()` with numpy , Code Sample import numpy as np import pandas as pd import scipy.io # Two lists l1 = [1,2] Unexpected behavior in the `DataFrame.from_dict()` with numpy arrays as values #26858 Reshaping doesn't matter - it's just the fact that you are passing 2d arrays in as the values of the dict. No one assigned. Unexpected behavior in assigning 2d numpy array to pandas DataFrame (2) Pandas series are numpy array, since its one columns, it treats it as one object, to which the reference has changed. >> import numpy as np >>> x = np . zeros (( 4 , 1 )) >>> x = np . random . randn ( 4 , 2 ) >>> y = np . zeros (( 4 , 1 )) >>> y array ([[ 0.

for

x = pd.DataFrame(np.zeros((4, 1)), columns=['A'])
y = np.random.randn(4, 2)

if x['A'] = y ;then column is replicated and if we iterate it with different column lengths such as:

x = pd.DataFrame(np.zeros((4, 3)), columns=['A','B','C'])
y = np.random.randn(4, 2)

and try x['A'] = y then also first column is replicated but if we equate x = y then the x data frame is replicated with y matrix. So i guess we are getting this ambiguity as we are trying to equate a data frame column with a matix created in numpy. Hope it explains

Intro to data structures, The fundamental behavior about data types, indexing, and axis labeling Like a NumPy array, a pandas Series has a dtype . The Series name will be assigned automatically in many cases, in particular when taking 2-D numpy.ndarray. @tirthajyoti, you should call np.squeeze on your arrays before they're passed to pandas, as arrays used as dict values should be 1-dimensional. You probalby are able to get multidim array to pass in certain circumstances, but that's not what pandas is aiming for.

Pandas series are numpy array, since its one columns, it treats it as one object, to which the reference has changed.

>> import numpy as np
>>> x = np.zeros((4,1))
>>> x = np.random.randn(4,2)
>>> y= np.zeros((4,1))
>>> y
array([[0.],
       [0.],
       [0.],
       [0.]])
>>> x
array([[-1.00731291, -0.37151425],
       [-0.78154847, -0.72854126],
       [-0.98566253,  1.68786232],
       [ 0.12614892,  0.41804799]])
>>> y = x
>>>y
array([[-1.00731291, -0.37151425],
       [-0.78154847, -0.72854126],
       [-0.98566253,  1.68786232],
       [ 0.12614892,  0.41804799]])

What's new in 0.24.0 (January 25, 2019), For Series and Indexes backed by normal NumPy arrays, This prevents unexpected behavior where addition could fail to be monotone or associative. Now a 2-D numpy.ndarray of Timestamp objects is returned (GH24024) Bug in DataFrame assignment with a timezone-aware scalar (GH19843). Track tasks and feature requests. Join 40 million developers who use GitHub issues to help identify, assign, and keep track of the features and bug fixes your projects need.

Indexing and Selecting Data, The Python and NumPy indexing operators [] and attribute operator . provide .​loc is primarily label based, but may also be used with a boolean array. .loc will for those familiar with implementing class behavior in Python) is selecting out 5​, 6] UserWarning: Pandas doesn't allow Series to be assigned into nonexistent  The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn't check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned. dict of {str: callable or Series} Required

How to create Pandas DataFrame from a Numpy array in Python, Kite is a free autocomplete for Python developers. Code faster with the Kite plugin for your code editor, featuring Line-of-Code Completions and cloudless  A DataFrame is a 2D numpy array under the hood: [code]>>> import numpy as np >>> import pandas as pd >>> df = pd.DataFrame(np.random.randint(0, 100, size=(15, 4

API, DataFrame (dsk, name, meta, divisions), Parallel Pandas DataFrame DataFrame.to_records ([index, lengths]), Create Dask Array from a Dask Dataframe This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first Assigning multiple columns within the same assign is possible. Pandas DataFrame check if column value exists in a group of columns Remainder function(%) runtime on numpy arrays is far longer than manual remainder calculation Unexpected behavior in assigning 2d numpy array to pandas DataFrame

Comments
  • Seems to be a peculiarity with 'A' already being a column. For isntance x['B'] = y gives you the expected ValueError: Wrong number of items passed 2, placement implies 1
  • I would expect this to raise key error instead...
  • oh yeah, there is one too.
  • what do you mean by "un-indexed matrix" and what is the first item of y? the first columns?
  • @LiuSha Dataframe and Series have index. as np.random.randn is a list of list its un-indexed.