Combine two datasets, based on POSIXct values

Related searches

I am struggling with combining two datasets with each other.

Dataset1 is containing a "before time" and a "after time", and a "channel".

Dataset2 is containing just one "time" and a "channel" column as well.

I want to add an binary column (Yes/No) to Dataset1, with this logic: If there is a row in Dataset2, where the channel == channel, and Time is within the "before" and "after" time, I want to have "YES". Else "NO".

Data1

ID   Channel   before_time   after_time 
1       A1  2019-09-02 20:13:00 2019-09-02 20:33:00
2       B1  2019-09-02 20:03:00 2019-09-02 20:23:00
3       C1  2019-09-02 20:23:00 2019-09-02 20:43:00
4       D1  2019-09-02 20:23:00 2019-09-02 20:43:00

Data2

ID_B     Channel_B    Time_B
Hallo       A1        2019-09-02 20:23:00
Hi          B2        2019-09-02 20:05:00
Hoi         C1        2019-09-02 22:23:00

Desired Output

ID   Channel   before_time   after_time                     Available
1       A1  2019-09-02 20:13:00 2019-09-02 20:33:00         Yes  # Channel == Channel, Time between before & after
2       B1  2019-09-02 20:03:00 2019-09-02 20:23:00          No  # Channel != Channel
3       C1  2019-09-02 20:23:00 2019-09-02 20:43:00          No  # Time is not between before & after
4       D1  2019-09-02 20:23:00 2019-09-02 20:43:00          No  # There is no matching data where channel is D1

Desired Output 2 (comments Solutions)

Adding extra columns from the second data set (Data2).

ID   Channel   before_time   after_time                     Available   ID_B     
1       A1  2019-09-02 20:13:00 2019-09-02 20:33:00          Yes        Hallo       
2       B1  2019-09-02 20:03:00 2019-09-02 20:23:00          No         x 
3       C1  2019-09-02 20:23:00 2019-09-02 20:43:00          No         x
4       D1  2019-09-02 20:23:00 2019-09-02 20:43:00          No         x

Reproducible example (the data):

ID <- c("1", "2", "3", "4")
channel <- c("A1", "B1", "C1", "D1)
#startdate <- as.POSIXct(c("2019-09-02 20:23:00", "2019-09-02 20:13:00", "2019-09-02 20:33:00", "2019-09-02 20:33:00"))
before_time <- as.POSIXct(c("2019-09-02 20:13:00", "2019-09-02 20:03:00", "2019-09-02 20:23:00", "2019-09-02 20:23:00"))
after_time  <- as.POSIXct(c("2019-09-02 20:33:00", "2019-09-02 20:23:00", "2019-09-02 20:43:00","2019-09-02 20:43:00"))
data1 <- data.frame(ID, channel,   before_time, after_time)
View(data1)


ID_B <- c("Hallo", "Hi", "Hoi")
channel_B <- c("A1", "B2", "C1")
Time_B <- as.POSIXct(c("2019-09-02 20:23:00", "2019-09-02 20:05:00", "2019-09-02 22:23:00"))
data2 <- data.frame(ID_B, channel_B, Time_B)
View(data2)

As mentioned by arg0naut91, here is non-equi join in data.table:

library(data.table)
setDT(data1)
setDT(data2)
data1[, c("Available", "ID_B") :=
        data2[.SD, on=.(channel_B=channel, Time_B>=before_time, Time_B<=after_time), 
            by=.EACHI, .(.N > 0, ID_B)][, (1L:3L) := NULL]
    ]

output:

   ID channel         before_time          after_time Available  ID_B
1:  1      A1 2019-09-02 20:13:00 2019-09-02 20:33:00      TRUE Hallo
2:  2      B1 2019-09-02 20:03:00 2019-09-02 20:23:00     FALSE  <NA>
3:  3      C1 2019-09-02 20:23:00 2019-09-02 20:43:00     FALSE  <NA>
4:  4      D1 2019-09-02 20:23:00 2019-09-02 20:43:00     FALSE  <NA>

Combine two datasets, based on POSIXct values, Combine two datasets, based on POSIXct values. 发布于 2020-05-03 06:20:53. I am struggling with combining two datasets with each other. Dataset1 is� In a many-to-one join, one of your datasets will have many rows in the merge column that repeat the same values (such as 1, 1, 3, 5, 5), while the merge column in the other dataset will not have repeat values (such as 1, 3, 5). As you might have guessed, in a many-to-many join, both of your merge columns will have repeat values.

Here is a base R solution, using merge + ifelse, i.e.,

dfout <- subset(within(merge(data1,data2[-1],by.x = "channel",by.y = "channel_B",all.x = T), 
                       Available <- ifelse(!is.na(Time_B)& Time_B>=before_time & Time_B<=after_time,"Yes","No")),
                select = -Time_B)

such that

> dfout
  channel ID         before_time          after_time Available
1      A1  1 2019-09-02 20:13:00 2019-09-02 20:33:00       Yes
2      B1  2 2019-09-02 20:03:00 2019-09-02 20:23:00        No
3      C1  3 2019-09-02 20:23:00 2019-09-02 20:43:00        No
4      D1  4 2019-09-02 20:23:00 2019-09-02 20:43:00        No

Merge dataframes on timestamps and time intervals using data.table , Because some Receiver numbers are associated with more than one Station , it is important to merge these two datasets on the timestamp� In order to merge the three data sets, this program . sorts FINANCE and COMPANY by Name. merges COMPANY and FINANCE into a temporary data set, TEMP. sorts TEMP by IdNumber. merges TEMP and REPERTORY by IdNumber. The following output displays the resulting data set, ALL: Match-Merging Data Sets That Lack a Common Variable

This solution should work:

df<-(cbind(data1,data2)

df<-df%>%mutate(ID=as.integer(ID),
        channel=as.character(channel),
        ID_B=as.character(ID_B),
        channel_B=as.character(channel_B))

df%>%
  mutate(available= ifelse(channel==channel_B & Time_B >= before_time & Time_B <= after_time, "yes","no"))%>%
  select(-ID_B,Time_B,-channel_B)


 # A tibble: 3 x 5
 ID channel before_time         after_time          available
 <int> <chr>   <dttm>              <dttm>              <chr>    
 1     1 A1      2019-09-02 20:13:00 2019-09-02 20:33:00 yes      
 2     2 B1      2019-09-02 20:03:00 2019-09-02 20:23:00 no       
 3     3 C1      2019-09-02 20:23:00 2019-09-02 20:43:00 no  

Data Cleaning, Merging, and Wrangling in R, Often times, it is necessary to combine two variables from different datasets similar to how VLOOKUP is used in Excel to join two variables based on certain criteria. If you are POSIXct and format to calculate differences between seconds. The default is to use the columns with common names between the two data frames. all, all.x, all.y: Logical values that specify the type of merge. The default value is all=FALSE (meaning that only the matching rows are returned). That last group of arguments — all, all.x and all.y — deserves some explanation. These arguments determine the

It may also make sense to look into sqldf, e.g.:

library(sqldf)

sqldf('SELECT t1.ID, t1.channel, 
      t1.before_time, t1.after_time, 
      CASE WHEN t2.ID_B IS NULL THEN "No" ELSE "Yes" END Available 
      FROM data1 t1 LEFT JOIN data2 t2 ON t1.channel = t2.channel_B
      AND t2.Time_B BETWEEN t1.before_time AND t1.after_time')

Output:

  ID channel         before_time          after_time Available
1  1      A1 2019-09-02 20:13:00 2019-09-02 20:33:00       Yes
2  2      B1 2019-09-02 20:03:00 2019-09-02 20:23:00        No
3  3      C1 2019-09-02 20:23:00 2019-09-02 20:43:00        No
4  4      D1 2019-09-02 20:23:00 2019-09-02 20:43:00        No

Here is also one way of doing this in dplyr:

library(dplyr)

data1 %>%
  left_join(data2,
            by = c('channel' = 'channel_B')
            ) %>%
  mutate(
    Available = ifelse(
      !is.na(Time_B) & Time_B >= before_time & Time_B <= after_time, 'Yes', 'No')
    ) %>%
  select(-ends_with('_B'))

Adding additional column:

# sqldf

sqldf('SELECT t1.ID, t1.channel, 
      t1.before_time, t1.after_time, 
      CASE WHEN t2.ID_B IS NULL THEN "No" ELSE "Yes" END Available,
      CASE WHEN t2.ID_B IS NULL THEN "x" ELSE t2.ID_B END ID_B
      FROM data1 t1 LEFT JOIN data2 t2 ON t1.channel = t2.channel_B
      AND t2.Time_B BETWEEN t1.before_time AND t1.after_time')

# dplyr

data1 %>%
  left_join(data2,
            by = c('channel' = 'channel_B')
  ) %>%
  mutate(
    Available = ifelse(
      !is.na(Time_B) & Time_B >= before_time & Time_B <= after_time, 'Yes', 'No'),
    ID_B = ifelse(
      Available == 'Yes', as.character(ID_B), 'x')
  ) %>%
  select(-Time_B)

merge: Merge Two Data Frames, values which cannot be matched. See match . This is intended to be used for merging on one column, so these are incomparable values of that column. Every time you combine data, there has to be a identical and unique variable in the datasets you combine. After this is met and done you are knowable to merge data in R with the below coding. To merge two dataframes with a outer join in R, use the below coding: # Outer join mymergedata1 <- merge(x = df1, y = df2, by = "var1", all = TRUE)

Combining and merging data sets, To join two data frames (datasets) vertically we can use the bind_rows function. combo = bind_rows(jan,feb) glimpse(combo) ## Observations: 59 ## Variables:� The following figure shows the results of match-merging. Match-merging combines observations from two or more SAS data sets into a single observation in a new data set based on the values of one or more common variables. Data set COMBINED shows the results. Match-Merging Two Data Sets. Updating

In Excel, you can combine or merge text from two or more cells, as well as columns and rows,into one cell. If you have a lot of rows of data where you want to combine text, you can simply start typing the combined text in an adjacent column and Excel will fill in the rest for you.

Comment and share: How to combine and analyze data from multiple data sets using Excel Power Pivot By Susan Harkins Susan Sales Harkins is an IT consultant, specializing in desktop solutions.

Comments
  • Have your real data equal number of rows?
  • Look into non-equi joins, update joins, this can be done with data.table. Joining in dplyr is not (yet) so advanced, but if you do a full join and then compare the dates should also work.
  • @ArmandoGonzálezDíaz the real data does not have equal number of rows. I changed my example.
  • @arg0naut91 appreciate the advice. Has added the 'data.table' tag and will look into full-joins.
  • Great, @chinsoon12. However, if I run your example I receive an error (or is it me?): Error in set(i, j = lc, value = newfactor) : .SD is locked. Updating .SD by reference using := or set are reserved for future use. Use := in j directly. Or use copy(.SD) as a (slow) last resort, until shallow() is exported.
  • can you use an updated version of data.table? and if you cannot upgrade your version, you will need to replace .SD with copy(.SD) or data1
  • Your solution works. Quick question, since i have some trouble with the DT syntax: would it be possible to add columns from data2 if there is a TRUE (match)? For example, adding 'ID_B' to the output file?
  • Can you show some example data? Not in front of computer now
  • @Roverflow, i have updated the code to add the new column
  • I tried this solution. However, it is leading to: Error: Expecting a single value: [extent=3].
  • Could you please upload the solution including the data examples that are posted above? I am still receiving errors (although I adjusted the names of the data1 to df etc.). Many thanks in advance!
  • df=cbind(data1,data2)