how to use merge() to update a table in R
merge in r
combine two data frames in r with different columns
combine two data frames in r with different rows
merge unequal data frames in r
r merge data frames by column names
combine two columns in r
I'm trying to figure out how to use
merge() to update a database.
Here is an example. Take for example the data frame
foo <- data.frame(index=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA))
Which has the following values
index value 1 a 100 2 b 101 3 c NA 4 d NA
And the data frame
bar <- data.frame(index=c('c', 'd'), value=c(200, 201))
Which has the following values:
index value 1 c 200 2 d 201
When I run the following
merge() function to update the values for
merge(foo, bar, by='index', all=T)
It results in this output:
index value.x value.y 1 a 100 NA 2 b 101 NA 3 c NA 200 4 d NA 201
I'd like the output of
merge() to avoid the creation of, in this specific example, of
value.y but only retain the original column of
value Is there a simple way of doing this?
merge() always bind columns together? Does
foo$value <- replace(foo$value, foo$index %in% bar$index, bar$value)
match() so the order matters
foo$value[match(bar$index, foo$index)] <- bar$value
How to merge data in R using R merge, dplyr, or data.table, base R's merge() function,; dplyr's join family of functions, and; data.table's bracket syntax. One base R way to do this is with the merge () function, using the basic syntax merge (df1, df2). It doesn’t matter the order of data frame 1 and data frame 2, but whichever one is first is
I would also like to present an sql-solution using library sqldf and the R integrated sqlite-database. I like the simplicity, accuratness and power of sql.
Accurateness: since I can exactly define which object=rows I want to change without considering the order of a data.frame (
foo.id = bar.id).
Power: in WHERE after SET and WHERE (third row) I can define all conditions I want to consider to update.
Simplicity: the syntax is more readable than using index in vectors, matrix or dataframes.
library(sqldf) # I changed index to id since index does not work. # Obviously index is a key word in sqlite. (foo <- data.frame(id=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA))) (bar <- data.frame(id=c('c', 'd'), value=c(200, 201))) sqldf(c(paste("UPDATE foo" ," SET value = (SELECT bar.value FROM bar WHERE foo.id = bar.id)" ," WHERE value IS NULL" ) , " SELECT * FROM main.foo" ) )
id value 1 a 100 2 b 101 3 c 200 4 d 201
Similar issues: r equivalent of sql update? R sqlite: update with two tables
Merge two data.tables, y arguments explicitly to override this default. Usage. ## S3 method for class 'data.table' merge(x, y, by = NULL, by.x = NULL, Details. merge is a generic function in base R. It dispatches to either the merge.data.frame method or merge.data.table method depending on the class of its first argument. . Note that, unlike SQL, NA is matched against NA (and NaN against NaN) while mer
The optimal solution using
library(data.table) setDT(foo) setDT(bar) foo[bar, on="index", value:=i.value] foo # index value #1: a 100 #2: b 101 #3: c 200 #4: d 201
first argument in
[ data.table method is named
i thus we can refer to column from table in
i argument using
Table deletes, updates, and merges, Update a table; Upsert into a table using merge; Merge examples. Delete from a table. You can remove data that matches a predicate The inner join keyword selects records that have matching values in both tables. To join two datasets, we can use merge() function. We will use three arguments : merge(x, y, by.x = x, by.y = y) Arguments: -x: The origin data frame -y: The data frame to merge -by.x: The column used for merging in x data frame.
merge() only merges in new data. For instance, if you had a data set of average income for a few cities, and a separate data set of the populations of those cities, you would use
merge() to merge in one set of data into the other.
Like apeescape said,
replace() is probably what you want.
Merging, Merging Data. Adding Columns. To merge two data frames (datasets) horizontally, use the merge function. In most cases, you join two data frames by one or Left join in R: merge() function takes df1 and df2 as argument along with all.x=TRUE there by returns all rows from the left table, and any rows with matching keys from the right table. ##### left join in R using merge() function df = merge(x=df1,y=df2,by="CustomerId",all.x=TRUE) df the resultant data frame df will be
Another approach could be:
Remove the NAs from the first data fram
Use rbind to append the data instead of using merge:
These are the original two data frames:
foo <- data.frame(index=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA)) bar <- data.frame(index=c('c', 'd'), value=c(200, 201))
(1) Use the negation of is.na to remove the NAs:
foo_new <- foo[!is.na(foo$value),]
(2) Bind the data frames and you'll get the answer you were looking for
new_df <- rbind(foo_new,bar) new_df index value 1 a 100 2 b 101 3 c 200 4 d 201
What is the Difference Between Merge and Update?, Whereas UPDATE can only modify column values you can use the MERGE statement to When simply updating one table based on the rows of another table, Merge two data frames by common columns or row names, or do other versions of database join operations.
Merge Data Frames in R: Full and Partial Match, The inner join keyword selects records that have matching values in both tables. To join two datasets, we can use merge() function. We will use Merge – adds variables to a dataset. This document will use –merge– function. Merging two datasets require that both have at least one variable in common (either string or numeric). If string make sure the categories have the same spelling (i.e. country names, etc.). Explore each dataset separately before merging. Make sure to use all
Merge Two Data Frames, Merge two data frames by common columns or row names. Usage. merge(x, y, by, by.x, by.y, sort = TRUE). Arguments After the merge process, the managed table is identical to the staged table at T = 2, and all records are in their respective partitions. Use Case 2: Update Hive Partitions. A common strategy in Hive is to partition data by date.
MERGE, MERGE -- update, insert or delete rows of a table based upon source data. Synopsis. MERGE INTO table [ [ AS ] alias ] USING source-query ON join_condition E. Using MERGE to do INSERT or UPDATE on a target edge table in a graph database. In this example, you create node tables Person and City and an edge table livesIn. You use the MERGE statement on the livesIn edge and insert a new row if the edge doesn't already exist between a Person and City.
- What the result should be in case of no nulls?
- Did you ever get answer to this question? I am looking for a solution for this same problem.
- I wonder too why merge does not have, say an
overwrite=TRUEparameter which would kick in when
byis provided. It is incovienent to delete columns manually every time you want to want to rerun a merge.
- See also: Replace missing values (NA) in one data set with values from another where columns match
- One wrinkle with using
replace()is that if the ordering in
baris not the same as in
foo, it won't work properly. For example, if you try running the above example after
bar <- bar[c(2,1),], the end result does not come out correct.
match()does work for my example. In reality, it turns out that my actual use case is more complicated, where I would like to match across multiple columns and not just a simple vector. I don't think
match()works when you would like to match across multiple columns of a dataframe.
- Thank you! the idea to use the match() is good... however, if bar is to have another element that is not contained in foo (we want to update and add the new stuff) bar <- data.frame(index=c('c', 'd','e'), value=c(200, 201,215)) Then when we try to use match, we get an error. Error in foo$value[match(bar$index, foo$index)] <- bar$value : NAs are not allowed in subscripted assignments Any ideas how to overcome that?
- What if you have multiple columns for indices?
- The SQL statement can run over multiple lines so
pasteis not needed.
- @Grothendieck Thanks for this info.