Expanding pandas data frame with date range in columns

pandas date range
pandas filter date by year
pandas check if date is between two dates
pandas filter by date
slice date column pandas
pandas select dates in range
python date range
pandas date range quarter

I have a pandas dataframe with dates and strings similar to this:

Start        End           Note    Item
2016-10-22   2016-11-05    Z       A
2017-02-11   2017-02-25    W       B

I need to expand/transform it to the below, filling in weeks (W-SAT) in between the Start and End columns and forward filling the data in Note and Items:

Start        Note    Item
2016-10-22   Z       A
2016-10-29   Z       A
2016-11-05   Z       A
2017-02-11   W       B
2017-02-18   W       B
2017-02-25   W       B

Whats the best way to do this with pandas? Some sort of multi-index apply?

You can iterate over each row and create a new dataframe and then concatenate them together

pd.concat([pd.DataFrame({'Start': pd.date_range(row.Start, row.End, freq='W-SAT'),
               'Note': row.Note,
               'Item': row.Item}, columns=['Start', 'Note', 'Item']) 
           for i, row in df.iterrows()], ignore_index=True)

       Start Note Item
0 2016-10-22    Z    A
1 2016-10-29    Z    A
2 2016-11-05    Z    A
3 2017-02-11    W    B
4 2017-02-18    W    B
5 2017-02-25    W    B

pandas.date_range, DataFrame.expanding · pandas.DataFrame.ewm start_timedatetime.time or str: end_timedatetime.time or str: include_startbool, default True: include_endbool, default True: axis{0 or 'index', 1 or 'columns'}, default 0. New in version 0.24.0. Select initial periods of time series based on a date offset. last. Select final  Expand Pandas date range. Expanding pandas data frame with date range in columns. Related. 938. Selecting multiple columns in a pandas dataframe. 1601.

You don't need iteration at all.

df_start_end = df.melt(id_vars=['Note','Item'],value_name='date')

df = df_start_end.groupby('Note').apply(lambda x: x.set_index('date').resample('W').pad()).drop(columns=['Note','variable']).reset_index()

pandas.DataFrame.between_time, Provide expanding transformations. Parameters. min_periodsint, default 1. Minimum number of observations in window required to have a value (otherwise result  I have to expand a pandas dataframe based on start date and end date, into individual rows. Original dataframe is as below. ORIGINAL DATAFRAME. My final dataframe should be repeated for each day between start and end date of individual rows.The result needs to be expanded for each date while the other columns except 'startdate' and 'enddate' is preserved.

If the number of unique values of df['End'] - df['Start'] is not too large, but the number of rows in your dataset is large, then the following function will be much faster than looping over your dataset:

def date_expander(dataframe: pd.DataFrame,
                  start_dt_colname: str,
                  end_dt_colname: str,
                  time_unit: str,
                  new_colname: str,
                  end_inclusive: bool) -> pd.DataFrame:
    td = pd.Timedelta(1, time_unit)

    # add a timediff column:
    dataframe['_dt_diff'] = dataframe[end_dt_colname] - dataframe[start_dt_colname]

    # get the maximum timediff:
    max_diff = int((dataframe['_dt_diff'] / td).max())

    # for each possible timediff, get the intermediate time-differences:
    df_diffs = pd.concat([pd.DataFrame({'_to_add': np.arange(0, dt_diff + end_inclusive) * td}).assign(_dt_diff=dt_diff * td)
                          for dt_diff in range(max_diff + 1)])

    # join to the original dataframe
    data_expanded = dataframe.merge(df_diffs, on='_dt_diff')

    # the new dt column is just start plus the intermediate diffs:
    data_expanded[new_colname] = data_expanded[start_dt_colname] + data_expanded['_to_add']

    # remove start-end cols, as well as temp cols used for calculations:
    data_expanded = data_expanded.drop(columns=[start_dt_colname, end_dt_colname, '_to_add', '_dt_diff'])

    # don't modify dataframe in place:
    del dataframe['_dt_diff']

    return data_expanded

pandas.DataFrame.expanding, You can also pass a DataFrame of integer or string columns to assemble into a Series date_range and bdate_range make it easy to generate a range of dates using Support has been expanded with bdate_range to work with any custom  You can check the head or tail of the dataset with head(), or tail() preceded by the name of the panda's data frame . Step 1) Create a random sequence with numpy. The sequence has 4 columns and 6 rows random = np.random.randn(6,4) Step 2) Then you create a data frame using pandas. Use dates_m as an index for the data frame.

Time Series / Date functionality, What is a Series? Create Data frame; Range Data; Inspecting data; Slice data; Drop a column; Concatenation. Why use Pandas? Data scientists  pandas.DataFrame.expanding¶ DataFrame.expanding (self, min_periods = 1, center = False, axis = 0) [source] ¶ Provide expanding transformations. Parameters min_periods int, default 1. Minimum number of observations in window required to have a value (otherwise result is NA). center bool, default False. Set the labels at the center of the window.

Python Pandas Tutorial: Dataframe, Date Range, Slice, import datetime as dt >>> sht = xw. 1d lists: Ranges that represent rows or columns in Excel are returned as Using expand() together with a named Range as top left cell gives you a You only need to specify the top left cell when writing a list, a NumPy array or a Pandas DataFrame to Excel, e.g.: sht.​range('A1').value  You will be multiplying two Pandas DataFrame columns resulting in a new column consisting of the product of the initial two columns. You need to import Pandas first: import pandas as pd Now let’s denote the data set that we will be working on as data_set. data_set = {"col1": [10,20,30], "col2": [40,50,60]} data_frame = pd.DataFrame(data_set)

Data Structures Tutorial, Create a new Pandas dataframe column of year-month combinations given datetime We will create random datetime values in increasing order to 60 for signup in range(total_signups): random_days = random.randint(1,  Dealing with Rows and Columns in Pandas DataFrame A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.

Comments
  • Thanks! That is very helpful. I was definitely overthinking it.
  • Is there another way to accomplish this without iterating row by row?
  • i would like to know without iterrows as well.. iterrows is slow for dataset with about 300,000 rows to process.
  • @ihightower See my answer below
  • @ihightower see my answer please. My answer does not need iteration.