Expand rows by date range using start and end date

sql create row for each date
how to check for missing dates in r
r fill in missing dates by group
r complete date range
sequence of dates in r
error in seq.date 'from' must be of length 1
r fill dates between two dates
repeat rows in r

Consider a data frame of the form

       idnum      start        end
1993.1    17 1993-01-01 1993-12-31
1993.2    17 1993-01-01 1993-12-31
1993.3    17 1993-01-01 1993-12-31

with start and end being of type Date

 $ idnum : int  17 17 17 17 27 27
 $ start : Date, format: "1993-01-01" "1993-01-01" "1993-01-01" "1993-01-01" ...
 $ end   : Date, format: "1993-12-31" "1993-12-31" "1993-12-31" "1993-12-31" ...

I would like to create a new dataframe, that has instead monthly observations for every row, for every month in between start and end (including the boundaries):

Desired Output

idnum       month
   17  1993-01-01
   17  1993-02-01
   17  1993-03-01
...
   17  1993-11-01
   17  1993-12-01

I'm not sure what format month should have, I will at some point want to group by idnum, month for regressions on the rest of the data set.

So far, for every single row, seq(from=test[1,'start'], to=test[1, 'end'], by='1 month') gives me the right sequence - but as soon as I try to apply that to the whole data frame, it will not work:

> foo <- apply(test, 1, function(x) seq(x['start'], to=x['end'], by='1 month'))
Error in to - from : non-numeric argument to binary operator

Using data.table:

require(data.table) ## 1.9.2+
setDT(df)[ , list(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]

# you may use dot notation as a shorthand alias of list in j:
setDT(df)[ , .(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]

setDT converts df to a data.table. Then for each row, by = 1:nrow(df), we create idnum and month as required.

Expand rows by date range using start and end date, Use the Display date ranges drop-down list to select two date This feature is most helpful when tracking a task from start to finish. When parent rows are expanded in a sheet, bar colors for child rows� You can, however, “expand” these date ranges or durations to create rows for all the dates including start, end and those in between. This way each date that’s part of the range is then graphed appropriately. Click “Edit queries” to open Power Query editor

Using dplyr :

test %>%
    group_by(idnum) %>%
    summarize(start=min(start),end=max(end)) %>%
    do(data.frame(idnum=.$idnum, month=seq(.$start,.$end,by="1 month")))

Note that here I don't generate a sequence between start and end for each row, instead it is a sequence between min(start) and max(end) for each idnum. If you want the former :

test %>%
    rowwise() %>%
    do(data.frame(idnum=.$idnum, month=seq(.$start,.$end,by="1 month")))

Work with Calendar View, Then in cell C2, type this formula =IF($A$1+ROW(A1)>=$A$2-1,"",C1+1) into date, A2 is the ending date, and C1 is the first date among the date range. Type the starting date and ending date you into two cells, here I type in cell A1 and B1. Often, the term "exploding the table" is used, since a small set of ranges can result in a huge output of rows. If you have one rows with 2020-01-01 as the start date and 2020-12-31 as end date, this would already result in 366 rows. Imagine you need to perform a similar calculation for millions of customers.

Updated2

With new versions of purrr (0.3.0) and dplyr (0.8.0), this can be done with map2

library(dplyr)
library(purrr)
 test %>%
     # sequence of monthly dates for each corresponding start, end elements
     transmute(idnum, month = map2(start, end, seq, by = "1 month")) %>%
     # unnest the list column
     unnest %>% 
     # remove any duplicate rows
     distinct
Updated

Based on @Ananda Mahto's comments

 res1 <- melt(setNames(lapply(1:nrow(test), function(x) seq(test[x, "start"],
 test[x, "end"], by = "1 month")), test$idnum))

Also,

  res2 <- setNames(do.call(`rbind`,
          with(test, 
          Map(`expand.grid`,idnum,
          Map(`seq`, start, end, by='1 month')))), c("idnum", "month"))


  head(res1)
 #  idnum      month
 #1    17 1993-01-01
 #2    17 1993-02-01
 #3    17 1993-03-01
 #4    17 1993-04-01
 #5    17 1993-05-01
 #6    17 1993-06-01

How to list all dates between two dates in Excel?, One option is a recursive CTE: DECLARE @StartDate datetime = '2017-03-05' , @EndDate datetime = '2017-04-11' ; WITH theDates AS (SELECT @StartDate� All of the solutions proposed can be modified to address the reverse case, i.e., where the date on our row is the end date and we want to construct a start date for the range (with the first row

tidyverse answer

Data

df <- structure(list(idnum = c(17L, 17L, 17L), start = structure(c(8401, 
8401, 8401), class = "Date"), end = structure(c(8765, 8765, 8765
), class = "Date")), class = "data.frame", .Names = c("idnum", 
"start", "end"), row.names = c(NA, -3L))

Answer and output

library(tidyverse)
df %>%
  nest(start, end) %>%
  mutate(data = map(data, ~seq(unique(.x$start), unique(.x$end), 1))) %>%
  unnest(data)

# # A tibble: 365 x 2
   # idnum       data
   # <int>     <date>
 # 1    17 1993-01-01
 # 2    17 1993-01-02
 # 3    17 1993-01-03
 # 4    17 1993-01-04
 # 5    17 1993-01-05
 # 6    17 1993-01-06
 # 7    17 1993-01-07
 # 8    17 1993-01-08
 # 9    17 1993-01-09
# 10    17 1993-01-10
# # ... with 355 more rows

How to create a row for every day in a date range using a stored , Extract Start and End Dates with Power Query, method 1 requires few steps but can be slow, method 2 He has a list of non-contiguous dates and wants to identify the various date ranges. Select the End Date column > Home tab > Remove Rows > Remove Duplicates: Step 9: Expand the List Table� With @MarcelBeug's solution I get 79 rows. My way of doing this with DAX also gets me 79 rows! (you do need a Calendar Table not connected to the Campaigns table) Campaigns Table = SUMMARIZE ( GENERATE ( Campaigns, CALCULATETABLE ( VALUES ( 'Calendar Table'[Date] ), DATESBETWEEN ( 'Calendar Table'[Date], 'Campaigns'[Start], 'Campaigns'[End] ) ) ), 'Calendar Table'[Date], 'Campaigns'[Campaigns] )

Considering you want one sequence of months per ID (in this case per "idnum"), a different tidyverse possibility for this task could be using tidyr::complete():

df %>%
 gather(var, date, -idnum) %>%
 group_by(idnum) %>%
 distinct() %>%
 complete(date = seq.Date(min(date), max(date), by = "month")) %>%
 select(-var)

   date       idnum
   <date>     <int>
 1 1993-01-01    17
 2 1993-02-01    17
 3 1993-03-01    17
 4 1993-04-01    17
 5 1993-05-01    17
 6 1993-06-01    17
 7 1993-07-01    17
 8 1993-08-01    17
 9 1993-09-01    17
10 1993-10-01    17
11 1993-11-01    17
12 1993-12-01    17

It, first, transforms the data from wide to long format, excluding the variable "idnum". Second, it groups the data by "idnum". Third, it removes the duplicate rows per "idnum". Third, by using seq.Date() inside tidyr::complete(), it generates a sequence of months (per "idnum") that starts with the first month in the data and ends with the last month in the data. Finally, it removes the redundant "var" variable.

Considering you want one sequence of months per every row, you can modify the above code to:

df %>%
 rowid_to_column() %>%
 gather(var, date, -c(idnum, rowid)) %>%
 group_by(rowid) %>%
 complete(date = seq.Date(min(date), max(date), by = "month")) %>%
 fill(idnum, .direction = "down") %>%
 select(-var)

   rowid date       idnum
   <int> <date>     <int>
 1     1 1993-01-01    17
 2     1 1993-02-01    17
 3     1 1993-03-01    17
 4     1 1993-04-01    17
 5     1 1993-05-01    17
 6     1 1993-06-01    17
 7     1 1993-07-01    17
 8     1 1993-08-01    17
 9     1 1993-09-01    17
10     1 1993-10-01    17
11     1 1993-11-01    17
12     1 1993-12-01    17
13     2 1993-01-01    17
14     2 1993-02-01    17
15     2 1993-03-01    17

In the case, it first generates a unique row ID. Second, it transforms the data from wide to long format, excluding the variables "idnum" and "rowid". Third, it groups the data by "rowid". Forth, it generates the sequence of months per row ID. Finally, it fills the missing values in "idnum" and removes the redundant "var" variable.

Extract Start and End Dates with Power Query • My Online Training , If we simply try to visualize this data with Line chart in Exploratory, it would We can use min and max functions to generate the start and end dates Date' function, the complete function will add rows for the missing dates. Take the following sample data: WITH SampleData AS ( SELECT '8000213' AS EmployeeID, '2014-08-25 00:00:00.000' AS StartDate, '2014-08-31 00:00:00.000' AS EndDate, 28.5 AS HPW UNION ALL

Populating Missing Dates with Complete and Fill Functions in R and , Fill dates between dates with Power BI / Power Query two arrows going in opposite directions and do a “Expand to New Rows” operation: We had a start and an end date, but what if you only have the start date and you� In the above formulas, A1 is the starting date, A2 is the ending date, and C1 is the first date among the date range. List all dates between two dates by VBA If you are interested in macro code, you can use the below VBA to list all dates between two given dates in Excel.

Fill dates between dates with Power BI / Power Query — Powered , Returns a table that contains a column of dates that begins with a specified start date and continues until a specified end date. Use it to filter an expression by a custom date range. function is not supported for use in DirectQuery mode when used in calculated columns or row-level security (RLS) rules. Calculate end date from start date and duration with an amazing feature Perhaps, you are bored with the above formulas, here, I will talk about a useful tool- Kutools for Excel , with its Date & Time helper feature, you can quickly deal with this job as easily as possible.

DATESBETWEEN function (DAX), If you want to filter for a date range, move the field to the Row or In the Between dialog box, type a start and end date, or select them from the� Normalize start/end dates to midnight before generating date range. name str, default None. Name of the resulting DatetimeIndex. closed {None, ‘left’, ‘right’}, optional. Make the interval closed with respect to the given frequency to the ‘left’, ‘right’, or both sides (None, the default). **kwargs. For compatibility.

Comments
  • As a beginner in R, how am I supposed to judge the answers? Is there a way to check them for efficiency, as %timeit in Python?
  • The most efficient answer as far as I can tell. A short follow-up: say I'd have actually a long list of columns that I want in the new dataframe, not just idnum. Is there an elegant way of providing these? Replacing idnum=idnum with colnames(df) surely won't work.
  • On a smallish dataset of about 40k records, this is 25x faster than the dplyr::rowwise() option.
  • How to use multiple columns in place of idnum ?
  • @jeganathanvelu better to ask as a separate question.
  • +1. I had done melt(setNames(lapply(1:nrow(test), function(x) seq(test[x, "start"], test[x, "end"], by = "1 month")), test$idnum)) to avoid calling data.frame unnecessarily.
  • If all these methods work with my R version, how do I chose one? I am a complete beginner here... are some of these methods better generalizable to similar solutions, or newer and less likely to be deprecated? Is there a performance routine I could use to check them?
  • @Ananda Mahto. Thanks I replaced my code with yours.
  • @FooBar, part personal preference, part "what code will I be able to understand 6 months from now?", part "how big is my data?" There are a lot of different reasons to pick one approach over the other. The "microbenchmark" package helps you figure out which approaches are most efficient in terms of computing time.
  • @FooBar, For me, if the datasets are considerably big, in general, dplyr or data.table based solutions would be faster. It is difficult to predict which one to be deprecated.
  • dplyr ver 0.7.4 gives Error: Each column must either be a list of vectors or a list of data frames [data]