Combine two columns while giving priority to the first one
From this question, I have two matrices and am looking to merge them in such a way that I left join dfB onto dfA replacing NaN values with non-NaN values wherever I have them.
>>> dfA s_name geo zip date value 0 A zip 60601 2010 NaN # In the earlier question, this was None 1 B zip 60601 2010 NaN # rather than NaN, which was 2 C zip 60601 2010 NaN # a mistake. 3 D zip 60601 2010 NaN >>> dfB s_name geo zip date value 0 A zip 60601 2010 1.0 1 B zip 60601 2010 NaN 3 D zip 60601 2010 4.0
Merging them, I see:
>>> new = pd.merge(dfA,dfB,on=["s_name","geo", "geoid", "date"],how="left") >>> new.head() name geo zip date value_x value_y 0 A state 01 2009 NaN 1.0 1 B state 01 2010 NaN NaN 2 C state 01 2011 NaN NaN 3 D state 01 2012 NaN 4.0 4 E state 01 2013 NaN 5.0
I can't be sure value_y is always numbered and value_x is always NaN. But I want a merged value, call it
value that is whichever-value-is-not-NaN. I try this:
>>> new["value"] = new.apply(lambda r: r.value_x or r.value_y, axis=1) >>> new.head() name geo zip date value_x value_y value 0 A state 01 2009 NaN 1.0 NaN 1 B state 01 2010 NaN NaN NaN 2 C state 01 2011 NaN NaN NaN 3 D state 01 2012 NaN 4.0 NaN 4 E state 01 2013 NaN 5.0 NaN
It makes sense in that NaN should propagate, but is not what I'm looking for. I'd like logic that would return whichever is present, not return NaN if either is present.
I'd like the logic that None gives me. You can see:
>>> new["value_z"] = None >>> new.head() name geo zip date value_x value_y value value_z 0 A state 01 2009 NaN 1.0 NaN None 1 B state 01 2010 NaN NaN NaN None 2 C state 01 2011 NaN NaN NaN None 3 D state 01 2012 NaN 4.0 NaN None 4 E state 01 2013 NaN 5.0 NaN None >>> new["value2"] = new.apply(lambda r: r.value_z or r.value_y, axis=1) >>> new.head() name geo zip date value_x value_y value value_z value2 0 A state 01 2009 NaN 1.0 NaN None 1.0 1 B state 01 2010 NaN NaN NaN None NaN 2 C state 01 2011 NaN NaN NaN None NaN 3 D state 01 2012 NaN 4.0 NaN None 4.0 4 E state 01 2013 NaN 5.0 NaN None 5.0
The logic that creates
value2 is the behavior I'm looking for, not
What's the best way to do this?
if you have a preference for
value_x , you could try:
df.value_x = df.value_x.fillna(df.value_y) df.pop('value_y')
df.value_x=df.value_x.fillna(df.pop('value_y')) >>df name geo zip date value_x 0 A state 1 2009 1.0 1 B state 1 2010 NaN 2 C state 1 2011 NaN 3 D state 1 2012 4.0 4 E state 1 2013 5.0
pandas.DataFrame.combine_first, Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the Following is the data i have in my sql table Date Unit Anchor LU 20171231 ESG 134.08 156.68 20180228 OUT 23.56 11.51 20171231 OUT 525.58 620.05 20180430 GNS 0 0 20180630 GNS 0 0 20180331 ANS 1.5333 15.3775 20180430 ESG 0 15.9999 20180531 ANS 11.8999 45.0722 But in power bi visualisation they woul
combine_first will work after
dfC = pd.merge(dfA, dfB, on=["s_name", "geo", "zip", "date"], how="left") dfC['value'] = dfC.pop('value_x').combine_first(dfC.pop('value_y')) dfC s_name geo zip date value 0 A zip 60601 2010 1.0 1 B zip 60601 2010 NaN 2 C zip 60601 2010 NaN 3 D zip 60601 2010 4.0
combine_first gives preference to "value_x" over "value_y". You can also write this as:
dfC = pd.merge(dfA, dfB, on=["s_name", "geo", "zip", "date"], how="left") dfC['value_x'] = dfC['value_x'].combine_first(dfC.pop('value_y')) dfC s_name geo zip date value_x 0 A zip 60601 2010 1.0 1 B zip 60601 2010 NaN 2 C zip 60601 2010 NaN 3 D zip 60601 2010 4.0
Merge, join, and concatenate, When gluing together multiple DataFrames, you have a choice of how to handle the other The default behavior with join='outer' is to sort the other axis (columns in this case). Let's consider a variation of the very first example presented: In If one is filled then the other one is always empty. A normal SELECT FROM table ORDER BY organization,lastname would list all the organizations first and then the lastnames second, but I wanted to intermix them, so I did this: SELECT FROM table ORDER BY CONCAT(organization,lastname) This will combine the two columns for the ORDER BY without actually creating a new column.
This technically works by hammering out the logic, but is ugly and feels like a hack (I believe it gives preference to value_x due to operator short-circuiting?):
>>> new["value3"] = new.apply(lambda r: (not(pd.isna(r.value_x)) or r.value_y) or (r.value_x or not(pd.isna(r.value_y))), axis=1) >>> new.head() name geo zip date value_x value_y value value_z value2 value3 0 A state 01 2009 NaN 1.0 NaN None 1.0 1.0 1 B state 01 2010 NaN NaN NaN None NaN NaN 2 C state 01 2011 NaN NaN NaN None NaN NaN 3 D state 01 2012 NaN 4.0 NaN None 4.0 4.0 4 E state 01 2013 NaN 5.0 NaN None 5.0 5.0
Federal Register, for fellowships are contained in § 657.2(a) of the program regulations while the selection (2) Initiate or strengthen effective linkages between language and area studies and The Secretary will give priority to applicants that— (1) Propose to award combine language and area studies with professional studies such as To concatenate two or more columns, you configure the Merge Cells' settings in a similar way, but choose Columns under "What to merge": Join rows column-by-column To combine data in each individual row, column-by-column, you choose to merge Rows , select the delimiter you want (line break in this example), configure other settings the way you want and hit the Merge button.
A Dictionary of Chemistry on the Basis of Mr Nicholson's with , I have added the two columns under oxygen, from which we see at once, that and Antiphlogistic Theory, printed in 1788, and published early in 1789. claim of priority to the discovery of multiple proportions, and the atomic theory of chemistry. This fact, of hydrogen not changing its volume, by combining with sulphur, Now, we enter the arguments for the CONCATENATE function, which tell the function which cells to combine. We want to combine the first two columns, with the First Name (column B) first and then the Last Name (column A). So, our two arguments for the function will be B2 and A2. There are two ways you can enter the arguments. First, you can type the cell references, separated by commas, after the opening parenthesis and then add a closing parenthesis at the end:
Priorities in Agricultural Research of the U.S. Department of , Hearings Before the Subcommittee on Administrative Practice and Procedure of the Committee on the Judiciary, United States Senate, Ninety-fifth Congress, First Session . has corn that will make 100 bushels to the acre , while Rolfs has two fields he 4 last fall ) and soybeans at 21 bushels ( compared with 30 last fall ) . @ivyhai - you can merge two columns in the query editor. Select the columns you want to merge and then in the ribbon you will see merge columns.
English Mechanics and the World of Science, With regard to the “Bell "-in operation in | balance this disadvantage, a speed of 50 words in this direction, it has been suggested that working men should combine to The proposed capital is £20,000, in 1,000 shares of £10 each, payable during the first two years; and 1,000 One column * > * to so to e to to to e & * - e. Follow these steps to merge columns in excel using notepad. Hold Shift and select both the parent column headers you need to merge (First Name and Last Name in our case). Press CTRL+C on Windows or Cmd + C on Mac to copy data in both columns. Now open Notepad or TextEdit on your desktop and hit CTRL+V.