how to use merge() to update a table in R

r combine two data frames vertically
merge in r
combine two data frames in r with different columns
combine two data frames in r with different rows
merge unequal data frames in r
rbind
r merge data frames by column names
combine two columns in r

I'm trying to figure out how to use merge() to update a database.

Here is an example. Take for example the data frame foo

foo <- data.frame(index=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA))

Which has the following values

index value
1     a   100
2     b   101
3     c    NA
4     d    NA

And the data frame bar

bar <- data.frame(index=c('c', 'd'), value=c(200, 201))

Which has the following values:

 index value
1     c   200
2     d   201

When I run the following merge() function to update the values for c and d

merge(foo, bar, by='index', all=T)

It results in this output:

 index value.x value.y
1     a     100      NA
2     b     101      NA
3     c      NA     200
4     d      NA     201

I'd like the output of merge() to avoid the creation of, in this specific example, of value.x and value.y but only retain the original column of value Is there a simple way of doing this?

Doesn't merge() always bind columns together? Does replace() work?

foo$value <- replace(foo$value, foo$index %in% bar$index, bar$value)

or match() so the order matters

foo$value[match(bar$index, foo$index)] <- bar$value

How to merge data in R using R merge, dplyr, or data.table, base R's merge() function,; dplyr's join family of functions, and; data.table's bracket syntax. One base R way to do this is with the merge () function, using the basic syntax merge (df1, df2). It doesn’t matter the order of data frame 1 and data frame 2, but whichever one is first is

I would also like to present an sql-solution using library sqldf and the R integrated sqlite-database. I like the simplicity, accuratness and power of sql. Accurateness: since I can exactly define which object=rows I want to change without considering the order of a data.frame (foo.id = bar.id). Power: in WHERE after SET and WHERE (third row) I can define all conditions I want to consider to update. Simplicity: the syntax is more readable than using index in vectors, matrix or dataframes.

library(sqldf)

# I changed index to id since index does not work. 
#   Obviously index is a key word in sqlite.

(foo <- data.frame(id=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA)))
(bar <- data.frame(id=c('c', 'd'), value=c(200, 201)))

sqldf(c(paste("UPDATE foo"
             ," SET value = (SELECT bar.value FROM bar WHERE foo.id = bar.id)"
             ," WHERE value IS NULL"
             )
        , " SELECT * FROM main.foo"
    )
)

Which gives

  id value
1  a   100
2  b   101
3  c   200
4  d   201

Similar issues: r equivalent of sql update? R sqlite: update with two tables

Merge two data.tables, y arguments explicitly to override this default. Usage. ## S3 method for class '​data.table' merge(x, y, by = NULL, by.x = NULL,  Details. merge is a generic function in base R. It dispatches to either the merge.data.frame method or merge.data.table method depending on the class of its first argument. . Note that, unlike SQL, NA is matched against NA (and NaN against NaN) while mer

The optimal solution using data.table

library(data.table)
setDT(foo)
setDT(bar)
foo[bar, on="index", value:=i.value]
foo
#   index value
#1:     a   100
#2:     b   101
#3:     c   200
#4:     d   201

first argument in [ data.table method is named i thus we can refer to column from table in i argument using i. prefix.

Table deletes, updates, and merges, Update a table; Upsert into a table using merge; Merge examples. Delete from a table. You can remove data that matches a predicate  The inner join keyword selects records that have matching values in both tables. To join two datasets, we can use merge() function. We will use three arguments : merge(x, y, by.x = x, by.y = y) Arguments: -x: The origin data frame -y: The data frame to merge -by.x: The column used for merging in x data frame.

merge() only merges in new data. For instance, if you had a data set of average income for a few cities, and a separate data set of the populations of those cities, you would use merge() to merge in one set of data into the other.

Like apeescape said, replace() is probably what you want.

Merging, Merging Data. Adding Columns. To merge two data frames (datasets) horizontally, use the merge function. In most cases, you join two data frames by one or  Left join in R: merge() function takes df1 and df2 as argument along with all.x=TRUE there by returns all rows from the left table, and any rows with matching keys from the right table. ##### left join in R using merge() function df = merge(x=df1,y=df2,by="CustomerId",all.x=TRUE) df the resultant data frame df will be

Another approach could be:

  1. Remove the NAs from the first data fram

  2. Use rbind to append the data instead of using merge:

These are the original two data frames:

foo <- data.frame(index=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA))
bar <- data.frame(index=c('c', 'd'), value=c(200, 201))

(1) Use the negation of is.na to remove the NAs:

foo_new <- foo[!is.na(foo$value),]

(2) Bind the data frames and you'll get the answer you were looking for

new_df <- rbind(foo_new,bar)

            new_df
            index value
            1     a   100
            2     b   101
            3     c   200
            4     d   201

What is the Difference Between Merge and Update?, Whereas UPDATE can only modify column values you can use the MERGE statement to When simply updating one table based on the rows of another table,  Merge two data frames by common columns or row names, or do other versions of database join operations.

Merge Data Frames in R: Full and Partial Match, The inner join keyword selects records that have matching values in both tables. To join two datasets, we can use merge() function. We will use  Merge – adds variables to a dataset. This document will use –merge– function. Merging two datasets require that both have at least one variable in common (either string or numeric). If string make sure the categories have the same spelling (i.e. country names, etc.). Explore each dataset separately before merging. Make sure to use all

Merge Two Data Frames, Merge two data frames by common columns or row names. Usage. merge(x, y, by​, by.x, by.y, sort = TRUE). Arguments  After the merge process, the managed table is identical to the staged table at T = 2, and all records are in their respective partitions. Use Case 2: Update Hive Partitions. A common strategy in Hive is to partition data by date.

MERGE, MERGE -- update, insert or delete rows of a table based upon source data. Synopsis. MERGE INTO table [ [ AS ] alias ] USING source-query ON join_condition  E. Using MERGE to do INSERT or UPDATE on a target edge table in a graph database. In this example, you create node tables Person and City and an edge table livesIn. You use the MERGE statement on the livesIn edge and insert a new row if the edge doesn't already exist between a Person and City.

Comments
  • What the result should be in case of no nulls?
  • Did you ever get answer to this question? I am looking for a solution for this same problem.
  • I wonder too why merge does not have, say an overwrite=TRUE parameter which would kick in when by is provided. It is incovienent to delete columns manually every time you want to want to rerun a merge.
  • See also: Replace missing values (NA) in one data set with values from another where columns match
  • One wrinkle with using replace() is that if the ordering in bar is not the same as in foo, it won't work properly. For example, if you try running the above example after bar <- bar[c(2,1),], the end result does not come out correct.
  • Yes, match() does work for my example. In reality, it turns out that my actual use case is more complicated, where I would like to match across multiple columns and not just a simple vector. I don't think match() works when you would like to match across multiple columns of a dataframe.
  • Thank you! the idea to use the match() is good... however, if bar is to have another element that is not contained in foo (we want to update and add the new stuff) bar <- data.frame(index=c('c', 'd','e'), value=c(200, 201,215)) Then when we try to use match, we get an error. Error in foo$value[match(bar$index, foo$index)] <- bar$value : NAs are not allowed in subscripted assignments Any ideas how to overcome that?
  • What if you have multiple columns for indices?
  • The SQL statement can run over multiple lines so paste is not needed.
  • @Grothendieck Thanks for this info.