How to do complex selection in pandas?

pandas loc
pandas groupby
pandas select columns by condition
pandas query
pandas merge
pandas select rows by condition
pandas iloc
pandas apply

I have a df like below:

President   Start Date  End Date
B Clinton   1992-01-01  1999-12-31
G Bush      2000-01-01  2007-12-31
B Obama     2008-01-01  2015-12-31
D Trump     2016-01-01  2019-12-31 # not too far away!!

I want to create another df, something like this

timestamp   President
1992-01-01  B Clinton
1992-01-02  B Clinton
...
2000-01-01  G Bush
...

Basically I want to create a dataframe which its index is time stamp and then its content is selected based on the condition on the two columns of another df.

I feel there is a way within pandas to do this, but I am not sure how. I tried to use np.piecewise but seems generating the conditions will be very hard for me. How could I do this?

This is another unnesting problem

df['New']=[pd.date_range(x,y).tolist() for x , y in zip (df.StartDate,df.EndDate)]

unnesting(df,['New'])

FYI I have pasted the function here

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')

Indexing and selecting data — pandas 1.1.0 documentation, .loc , .iloc , and also [] indexing can accept a callable as indexer. See more at Selection By Callable. Getting values from an object with multi-axes selection uses� In this chapter, we will discuss how to slice and dice the date and generally get the subset of pandas object. The Python and NumPy indexing operators "[ ]" and attribute operator "." provide quick and easy access to Pandas data structures across a wide range of use cases. However, since the type of

You can use pd.date_range to create range of dates from start and end values. Make sure Start and End dates are in datetime format.

s = df.set_index('President').apply(lambda x: pd.Series(pd.date_range(x['Start Date'], x['End Date'])), axis = 1).stack().reset_index(1, drop = True)

new_df = pd.DataFrame(s.index.values, index=s, columns = ['President'] )



            President
1992-01-01  B Clinton
1992-01-02  B Clinton
1992-01-03  B Clinton
1992-01-04  B Clinton
1992-01-05  B Clinton
1992-01-06  B Clinton
1992-01-07  B Clinton
1992-01-08  B Clinton
1992-01-09  B Clinton

MultiIndex / advanced indexing — pandas 1.1.0 documentation, One of the important features of hierarchical indexing is that you can select data by a “partial” label identifying a subgroup in the data. Partial selection “drops”� See more at Selection by Position, Advanced Indexing and Advanced Hierarchical..loc, .iloc, and also [] indexing can accept a callable as indexer. See more at Selection By Callable. Getting values from an object with multi-axes selection uses the following notation (using .loc as an example, but the following applies to .iloc as well).

Perhaps you could use a PeriodIndex instead of a DatetimeIndex because you are dealing with regularly spaced intervals of time, i.e., years.

# create a list of PeriodIndex objects with annual frequency
p_idxs = [pd.period_range(start, end, freq='A') for idx, (start, end) in df[['Start Date', 'End Date']].iterrows()]

# for each PeriodIndex create a DataFrame where 
# the number of president instances matches the length of the PeriodIndex object
df_list = []
for pres, p_idx in zip(df['President'].tolist(), p_idxs):
    df_ = pd.DataFrame(data=len(p_idx)*[pres], index=p_idx)
    df_list.append(df_)

# concatenate everything to get the desired output
df_desired = pd.concat(df_list, axis=0)

Selecting Subsets of Data in Pandas: Part 2 | by Ted Petrou, Pandas offers a wide variety of options for subset selection which You can also make very complex boolean selections for your rows. This is the first episode of this pandas tutorial series, so let’s start with a few very basic data selection methods – and in the next episodes we will go deeper! 1) Print the whole dataframe The most basic method is to print your whole data frame to your screen.

Selecting with complex criteria using query method in Pandas , Selecting with complex criteria using query method in Pandas. C:\pandas> python example.py Apple Orange Banana Pear Basket1 10 20 30 40 Basket2 7 Get mean(average) of rows and columns of DataFrame in Pandas� The pandas developers have not decided to boolean selection (with a Series) for .iloc so it does not work. You can however convert the Series to a list or a NumPy array as a workaround.

iloc, loc, and ix for data selection in Python Pandas, The iloc, loc and ix indexers for Python Pandas select rows and columns from selections of row and column choices a little complex for my requirements. main options to achieve the selection and indexing activities in Pandas, which can� Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn. Jupyter Notebooks offer a good environment for using pandas to do data exploration and modeling, but pandas can also be used in text editors just as easily.

Tips for Selecting Columns in a DataFrame, that the pandas iloc function can be used to select columns of data. This sounds a little complex but a couple of examples should make this� How to select multiple columns in a pandas dataframe Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Comments
  • You may not want to using apply . :-) stackoverflow.com/questions/54432583/…