Filtering several date columns in R
filter in r
as.date in r
selectbydate in r
filter time series in r
lubridate r
subset time series r
select by date r
I have a data frame with 3 columns. (https://pastebin.com/DFqUuuDp)
The first two columns ("Time1","Time2") contain datetime data and have both the posixct format: "%Y-%m-%d %H:%M:%S"
.
So what i need ultimately is a subselect of rows, where for a particular time in Time1, only the rows are selected where Time2
- is one day before Time1 (here one day is not always 24 hours; one day means the calendar day before. basically "yesterday")
- and Time2 is < 12:00 Am.
A correct example:
+---------------------+----------------------+ | Time1 | Time2 | +---------------------+----------------------+ | 2016-11-01 00:00:00 | 2016-10-31 00:00:00 | +---------------------+----------------------+
A wrong example:
+---------------------+----------------------+ | Time1 | Time2 | +---------------------+----------------------+ | 2016-11-01 00:00:00 | 2016-10-31 12:00:00 | +---------------------+----------------------+
In the uploaded file, I added manually the third column ("Value") as a guidance for rows which i want to have filtered at the end. The rows with a "True" are interesting me.
I solved it with two for loops, but it is very slow going through large tables.
The question has been tagged data.table
. In addition, the sample datasets provided by the OP are of class data.table
. Therefore, I feel obliged to post a data.table
solution:
library(data.table) DT[as.IDate(Time1) - 1L == as.IDate(Time2) & hour(Time2) < 12]
Time1 Time2 Value 1: 2016-11-01 00:00:00 2016-10-31 TRUE 2: 2016-11-01 00:30:00 2016-10-31 TRUE 3: 2016-11-01 01:00:00 2016-10-31 TRUE 4: 2016-11-01 01:30:00 2016-10-31 TRUE
# check if result is correct identical(DT[as.IDate(Time1) - 1L == as.IDate(Time2) & hour(Time2) < 12L], DT[Value == "TRUE"])
[1] TRUE
as.IDate()
is a Date class with integer storage for fast sorting and grouping. So, we can use integer arithmetic to compute the previous day. hour()
is also provided by the data.table
package and returns the hours of the day as integer value.
Reproducible data
Data copied from the pastebin link provided by the OP on 2018-05-29 at 22:00 UTC but with the .internal.selfref
pointer removed:
DT <- structure( list( Time1 = structure(c(1477958400, 1477958400, 1477958400, 1477958400, 1477958400, 1477958400, 1477958400, 1477960200, 1477960200, 1477960200, 1477960200, 1477960200, 1477960200, 1477960200, 1477962000, 1477962000, 1477962000, 1477962000, 1477962000, 1477962000, 1477962000, 1477963800, 1477963800, 1477963800, 1477963800, 1477963800, 1477963800, 1477963800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Time2 = structure(c(1477699200, 1477742400, 1477785600, 1477828800, 1477872000, 1477915200, 1477958400, 1477699200, 1477742400, 1477785600, 1477828800, 1477872000, 1477915200, 1477958400, 1477699200, 1477742400, 1477785600, 1477828800, 1477872000, 1477915200, 1477958400, 1477699200, 1477742400, 1477785600, 1477828800, 1477872000, 1477915200, 1477958400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Value = c("FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE")), .Names = c("Time1", "Time2", "Value"), row.names = c(NA, -28L), class = c("data.table", "data.frame"))
R: How to filter/subset a sequence of dates, you could use subset. Generating your sample data: temp<- read.table(text="date sessions 2014-12-01 1932 2014-12-02 1828 2014-12-03 One of the most basic data wrangling tasks is filtering data. Starting from a large dataset, and reducing it to a smaller, more manageable dataset, based on some criteria. Think of filtering your sock drawer by color, and pulling out only the black socks. Whenever I need to filter in R, I turn to the dplyr filter function.
The following solution works. The solution uses data shared by OP
library(dplyr) library(lubridate) df %>% filter((as.Date(Time2)+days(1)) == as.Date(Time1) & format(Time2, "%H") < 12) # Time1 Time2 Value # 1 2016-11-01 00:00:00 2016-10-31 TRUE # 2 2016-11-01 00:30:00 2016-10-31 TRUE # 3 2016-11-01 01:00:00 2016-10-31 TRUE # 4 2016-11-01 01:30:00 2016-10-31 TRUE # 5 2016-11-01 02:00:00 2016-10-31 TRUE # 6 2016-11-01 02:30:00 2016-10-31 TRUE # 7 2016-11-01 03:00:00 2016-10-31 TRUE # 8 2016-11-01 03:30:00 2016-10-31 TRUE # 9 2016-11-01 04:00:00 2016-10-31 TRUE # 10 2016-11-01 04:30:00 2016-10-31 TRUE # so on
filter.date function, Selects data for a given visual meteor dataset and specified dates. filter.date returns data frame with the same number of columns as the argument data Filtering a dataframe in R based on multiple Conditions [closed] Let df be the dataframe with at least three columns gender, age and bp. The idea behind
A solution using subset
and lubridate
package can be using an approach as:
- Add 1 day to
Time2
and then checkTime1
&Time2
belongs to same day. - Format
Time2
inHHMMSS
format and then check it is less than120000
(noon)
The code:
library(lubridate) subset(df, format(Time1,"%Y%m%d") == format(Time2+days(1),"%Y%m%d") & as.integer(format(Time2, "%H%M%S")) < 120000 ) # Time1 Time2 Value # 19 2016-11-01 00:00:00 2016-10-31 TRUE # 39 2016-11-01 00:30:00 2016-10-31 TRUE # 59 2016-11-01 01:00:00 2016-10-31 TRUE # 79 2016-11-01 01:30:00 2016-10-31 TRUE # 99 2016-11-01 02:00:00 2016-10-31 TRUE # 119 2016-11-01 02:30:00 2016-10-31 TRUE # 139 2016-11-01 03:00:00 2016-10-31 TRUE # 159 2016-11-01 03:30:00 2016-10-31 TRUE # 179 2016-11-01 04:00:00 2016-10-31 TRUE # # so on
Note: Time2
for all rows as part of subset contains 00:00:00
. Hence its not appearing in above print.
Data:
head(df, 20) # Time1 Time2 Value # 1 2016-11-01 2016-10-22 00:00:00 FALSE # 2 2016-11-01 2016-10-22 12:00:00 FALSE # 3 2016-11-01 2016-10-23 00:00:00 FALSE # 4 2016-11-01 2016-10-23 12:00:00 FALSE # 5 2016-11-01 2016-10-24 00:00:00 FALSE # 6 2016-11-01 2016-10-24 12:00:00 FALSE # 7 2016-11-01 2016-10-25 00:00:00 FALSE # 8 2016-11-01 2016-10-25 12:00:00 FALSE # 9 2016-11-01 2016-10-26 00:00:00 FALSE # 10 2016-11-01 2016-10-26 12:00:00 FALSE # 11 2016-11-01 2016-10-27 00:00:00 FALSE # 12 2016-11-01 2016-10-27 12:00:00 FALSE # 13 2016-11-01 2016-10-28 00:00:00 FALSE # 14 2016-11-01 2016-10-28 12:00:00 FALSE # 15 2016-11-01 2016-10-29 00:00:00 FALSE # 16 2016-11-01 2016-10-29 12:00:00 FALSE # 17 2016-11-01 2016-10-30 00:00:00 FALSE # 18 2016-11-01 2016-10-30 12:00:00 FALSE # 19 2016-11-01 2016-10-31 00:00:00 TRUE # 20 2016-11-01 2016-10-31 12:00:00 FALSE
Use Tidyverse Pipes to Subset Time Series Data in R, Learn how to extract and plot data by a range of dates using pipes in R. Subset data using the dplyr filter() function. daily sum values and added some noData values to ensure you learn how to clean them! Also your data have a header (the first row represents column names) so set header = TRUE. The filter () verb helps to keep the observations following a criteria. The filter () works exactly like select (), you pass the data frame first and then a condition separated by a comma: filter (df, condition) arguments: - df: dataset used to filter the data - condition: Condition used to filter the data. First of all, you can count the
Time-based filtering • tibbletime, filter_time() attempts to make filtering data frames by date much easier than dplyr::filter() . In dplyr , if you wanted to get the dates for 2013 in the FB dataset, Note that the form of this is slightly different than the standard, date sessions 1 2014-12-01 1932 2 2014-12-02 1828 3 2014-12-03 2349 4 2014-12-04 8192 5 2014-12-05 3188 6 2014-12-06 3277. And a need to subet/filter this, for example from "2014-12-05" to "2014-12-25". I now that you can create a sequence with the operator ":".
16 Dates and times, This chapter will show you how to work with dates and times in R. At first glance as you might need when filtering date/time data. ymd() is short and unambiguous: the individual components of the date-time spread across multiple columns. I need to filter rows for age below 10 and at the same time above 80. How can I do it in the simplest way? For one condition I can use filter (data.frame, age > 80) but I don't know how to do it for two conditions at the same time? @docendodiscimus, done sir fixed it now. – RavinderSingh13 Jun 26 '18 at 11:46.
How to select a range of dates in R?, I have a dataset that I want to extract certain date ranges to look at the differences monthly/seasonally in R. The data has a column of date values. Q: How to obtain Filter to keep last N days. One of the typical ways to filter date and time data is to filter the last ’n’ number of date and time periods. For example, if you are monitoring active users of your product or revenue of your business, you probably want to filter for the last 3 hours, 7 days or 3 months, and so on.
Comments
- First off, the code from pastebin is giving an error of "Error: unexpected '<' in " -3000L". What did you do in the for loop? This doesn't seem like a problem that needs to be solved with loops. Have you tried using dplyr and just putting in the conditions in a filter? i.e. df %>% filter(time2-time1 <= 1 day & time2 < 12am)?
- sorry, i reuploaded the file. It was a data.table before. Now i i have uploaded a data.frame. ---- I don't know how to formulate (time2-time1 <= 1 day & time2 < 12am). How would you code the 'time2-time1 <= 1 day' part?
- @UDE_Student Your data is very big. Why dont you just share
dput(head(df,20))
? It will be easier for everyone. - yes you are right, i updated my initial post with the shorter version: pastebin.com/DFqUuuDp
- I have added a possible solution in base-R using
lubridate
. Have a look, - Good use of
as.IDate
! Actually, data provided byOP
was indata.frame
when I took it. And I wanted to provide a base-R solution so thatoriginal
row-names in result to give OP an idea which rows were selected. - great solution, i will give the accepted answer to this question because it does not use the lubridate package
- Thank you. I just want to point out that I find
lubridate
to be a very handy package in many circumstances. However, it is not necessary to use thedays()
function orlubridate
's date arithmetic here because we just need to subtract1
to get the previous day. BothDate
andIDate
count days. The difference is thatIDate
uses integer as storage mode. - question regarding DT[as.IDate(Time1) - 1L == as.IDate(Time2) & hour(Time2) < 12]: I try to adjust this in that way, where i am not interested in Time1 being one calendar day before Time2; but Time1 is one OR more calendar days being before Time2. If i use >=, it is technically correct but it gives me more than one result, where i am interested in only "the latest possible single" Time2. Any ideas?
- ok i solved it with: filtered_dt <- filtered_dt[Time1 > Time2, lapply(.SD, last), by = Time1]
- Two minor comments. 1) You don't need to use
$
indplyr
chain. Perhaps, you should try your solution on data shared byOP
and check if it returns valid output.