merge multiple dataframes based on matching timestamp

pandas merge between dates
r merge multiple data frames by column names
combine two data frames in r with different columns
merge_asof
combine two data frames in r with different rows
pandas merge_asof multiple columns
pandas merge_asof tolerance
pandas merge nearest timestamp

I have 6 dataframes, all with unique column names,the same amount of columns, and the data was collected over the same time period.

Each data frame has a time stamp and minute averages were taken, but some dataframes having missing data and the columns lengths are not equal.

I would like to merge the data frames to display all 6 data frames side by side, but only at times that data was present across all 6 dataframes, i.e. the df with the lowest numbers of columns, which is "H1_min"

> head(H1_min)
            h1min h1temp h1humid   h1db     h1hz
1 2015-09-06 00:00:00   21.5   73.10 39.252 117.1900
2 2015-09-06 00:02:00   21.5   72.50 39.434 125.0000
3 2015-09-06 00:03:00   21.5   72.65 39.338 127.9325
4 2015-09-06 00:04:00   21.5   73.00 39.206 148.4400
5 2015-09-06 00:06:00   21.5   73.00 39.253 144.5350
6 2015-09-06 00:07:00   21.5   72.30 39.293 156.2500

The colnames of the other dataframes are similar, but H1 = H2 thru H6.

dput(head(H2_min))

"2015-09-08 20:21:00", "2015-09-08 20:22:00", "2015-09-08 20:23:00", 
"2015-09-08 20:24:00", "2015-09-08 20:25:00", "2015-09-08 20:26:00", 
"2015-09-08 20:27:00", "2015-09-08 20:28:00", "2015-09-08 20:29:00", 
"2015-09-08 20:30:00", "2015-09-08 20:31:00", "2015-09-08 20:32:00", 
"2015-09-08 20:33:00", "2015-09-08 20:34:00", "2015-09-08 20:35:00"
), class = "factor"), h2temp = c(23.4, 23.4, 23.3, 23.2, 23.2, 
23.1), h2humid = c(38.5, 38.3, 38.05, 38.1, 38.6, 38.6), h2db = c(38.834, 
38.655, 38.679, 38.695, 38.806, 38.702), h2hz = c(191.41, 152.34, 
162.11, 113.28, 121.09, 164.06)), .Names = c("h2min", "h2temp", 
"h2humid", "h2db", "h2hz"), row.names = c(NA, 6L), class = "data.frame")

dput(head(H4_min))

"2015-09-08 17:10:00", "2015-09-08 17:11:00", "2015-09-08 17:12:00", 
"2015-09-08 17:13:00"), class = "factor"), h4temp = c(27.2, 27.2, 
27.2, 27.2, 27.2, 27.2), h4humid = c(33.5, 33.5, 33.5, 33.5, 
33.5, 33.5), h4db = c(36.8225, 36.921, 36.8766666666667, 36.91, 
36.8336666666667, 36.768), h4hz = c(134.765, 136.068333333333, 
137.373333333333, 126.3, 139.323333333333, 128.906666666667)), .Names =       
c("h4min", "h4temp", "h4humid", "h4db", "h4hz"), row.names = c(NA, 6L), class = "data.frame")

this attempt yields:

H_min<-merge(H1_min, H2_min, H3_min, H4_min, H5_min, H6_min, by.x = 'row.names', by.y ='h1_min')

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column

Another way to do this is to convert the data.frames to xts objects and then use merge.xts(...), which merges based on the timestamp automatically, and then convert the result back to a data.frame.

Most of the code below is just to create reproducible sample data. The actual work is in the 6 lines at the end.

# create representative example - you have this already
time <- as.character(as.POSIXct("2015-09-06") + 60*(0:30))
temp = c(23.4, 23.4, 23.3, 23.2, 23.2, 23.1)
humid = c(38.5, 38.3, 38.05, 38.1, 38.6, 38.6)
db = c(38.834, 38.655, 38.679, 38.695, 38.806, 38.702)
hz = c(191.41, 152.34, 162.11, 113.28, 121.09, 164.06)
set.seed(123)   # for reproducible example
get.df <- function(n, name) {
  df <- data.frame(min=sort(sample(time,n)), 
                   temp=sample(temp,n, replace=TRUE), 
                   humid=sample(humid,n,replace=TRUE),
                   db = sample(db,n,replace=TRUE),
                   hz = sample(hz,n,replace=TRUE))
  names(df) <- paste0(name,names(df))
  df
}
H1 <- get.df(20,"h1")    # 20 rows at random times
H2 <- get.df(20,"h2")    # 20 rows at random times
H3 <- get.df(25,"h3")    # 25 rows at random times
H4 <- get.df(30,"h4")    # 30 rows at random times
# you start here
library(xts)
lst <- list(H1, H2, H3, H4)
xts.lst <- lapply(lst, function(df) xts(df[,2:ncol(df)], order.by=as.POSIXct(df[[1]])))
result <- do.call(merge.xts, c(xts.lst, all=FALSE))
result <- data.frame(result)
head(result)
#                     h1temp h1humid   h1db   h1hz h2temp h2humid   h2db   h2hz h3temp h3humid   h3db   h3hz h4temp h4humid   h4db   h4hz
# 2015-09-06 00:03:00   23.2   38.05 38.679 162.11   23.4    38.5 38.695 121.09   23.3    38.3 38.702 191.41   23.4    38.5 38.679 162.11
# 2015-09-06 00:04:00   23.1   38.05 38.655 121.09   23.4    38.3 38.679 152.34   23.2    38.1 38.679 121.09   23.1    38.3 38.834 121.09
# 2015-09-06 00:09:00   23.2   38.50 38.679 162.11   23.4    38.5 38.655 113.28   23.3    38.3 38.834 191.41   23.4    38.6 38.655 191.41
# 2015-09-06 00:12:00   23.4   38.30 38.806 164.06   23.4    38.3 38.679 164.06   23.4    38.6 38.834 162.11   23.4    38.3 38.655 121.09
# 2015-09-06 00:13:00   23.4   38.60 38.679 152.34   23.2    38.6 38.655 164.06   23.3    38.6 38.679 162.11   23.4    38.5 38.679 121.09
# 2015-09-06 00:14:00   23.1   38.50 38.806 191.41   23.2    38.6 38.695 152.34   23.4    38.6 38.834 162.11   23.3    38.5 38.834 191.41

Merge, join, and concatenate — pandas 0.25.0.dev0+752.g49f33f0d , When gluing together multiple DataFrames, you have a choice of how to handle than other open source implementations (like base::merge.data.frame in R). of levels must match the number of join keys from the right DataFrame or Series. This shows that merge operation is performed even if the column names are different. Partial match. It is not surprising that two dataframes do not have the same common key variables. In the full matching, the dataframe returns only rows found in both x and y data frame.

library(dplyr)
library(magrittr)
library(tidyr)

H1_min = 
  data_frame(
    h1min = c("2015-09-06 00:00:00", "2015-09-06 00:02:00"),
    h1temp = c(21.5, 21.5),
    h1humid = c(73.10, 72.50),
    h1db = c(39.252, 39.434),
    h1hz = c(117.1900, 125.000) )

H2_min = H1_min %>% mutate(h1hz = c(117.1900, NA))

answer = 
  list(H1_min, H2_min) %>%
  lapply(. %>% setNames(c("min",
                          "temp",
                          "humid",
                          "db",
                          "hz"))) %>%
  bind_rows(.id = "location") %>%
  gather(variable, value, -location, -min) %>%
  mutate(prefix = "h") %>%
  unite(new_variable, prefix, location, variable, sep = "") %>%
  spread(new_variable, value) %>%
  filter(complete.cases(.))

Merge and Join DataFrames with Pandas in Python, The Pandas merge() command takes the left and right dataframes, matches rows based on the “on” columns, and performs different types of merges – left, right, etc � Because .join() joins on indices and doesn’t directly merge DataFrames, all columns, even those with matching names, are retained in the resulting DataFrame. If you flip the previous example around and instead call .join() on the larger DataFrame, then you’ll notice that the DataFrame is larger, but data that doesn’t exist in the smaller

A bit simpler way to fix this based on @jlhoward answer.

qxts1 <- xts(df1[,-1], order.by = df1[,1]) 
qxts2 <- xts(df2[,-1], order.by = df2[,1])

xts.lst = list(qxts1, qxts2)
result <- do.call(merge.xts, c(xts.lst, all=FALSE))
result <- data.frame(result)

For xts or zoo, make sure your TimeStamp is a vector or matrix carrying data as Date, POSIXct, chron, ...

pandas.merge_asof — pandas 1.1.0 documentation, left_indexbool. Use the index of the left DataFrame as the join key. right_index bool Match on these columns before performing merge operation. left_by column name Timestamp("2016-05-25 13:30:00.023"), pd.Timestamp("2016- 05-25� Merge DataFrames on common columns (Default Inner Join) In both the Dataframes we have 2 common column names i.e. ‘ID’ & ‘Experience’.If we directly call Dataframe.merge() on these two Dataframes, without any additional arguments, then it will merge the columns of the both the dataframes by considering common columns as Join Keys i.e. ‘ID’ & ‘Experience’ in our case.

Pandas: struggling to merge/join dataframes with shared datetime , Hi I'm trying to join multiple dataframes into one bigger dataframe. I made a small Hangman game as my first project and then a cli based login system. How do I merge on the indices to get: id begin conditional confidence discoveryTechnique concept 0 278 56 false 0.0 1 A 1 421 18 false 0.0 1 B I ask because it is my understanding that merge() i.e. df1.merge(df2) uses columns to do the matching. In fact, doing this I get:

merge: Merge Two Data Frames, The rows in the two data frames that match on the specified columns are extracted, and joined together. If there is more than one match, all possible matches� The naive solution is to merge on the timestamp rounded to the nearest 5 minutes, but this will leave some sessions as separate rows if they happen to be on different sides of the 5 minute mark. You could apply the procedure iteratively with a randomized offset, up to a certain number of iterations, which would yield better results.

nearestTimeandID: Merge data frames based on the nearest , Takes two data frames each with time/date columns in date-time or date format ( i.e., able to be compared using the function difftime), finds the rows of df2 that� merge is a function in the pandas namespace, and it is also available as a DataFrame instance method merge(), with the calling DataFrame being implicitly considered the left object in the join. The related join() method, uses merge internally for the index-on-index (by default) and column(s)-on-index join.

Comments
  • Data with spaces is very hard to enter. Please supply the output of dput(head(H1_min)). Such output for an additional data frame would also be helpful.
  • Sure, added it for a second dataframe
  • @Evan that is not dput output...it should start with structure(...
  • It was the tail end of the output since the output was too large to scroll to the top. Would you like to see something else?
  • @Evan Yes, something else. The full output of dput is useful, anything else is noise. Reduce the number of rows by using the second argument to head if necessary, but don't trim what dput reports.
  • Thank you for the response! I actually prefer c(xts.lst, all=TRUE) since it displays the gaps when the sensors failed. Thanks again!