Conditional replacing of values ina a variable based on valuse in multiple other variables in R

r change column value based on another column
replace values in r dplyr
r recode variable based on condition
r replace values in dataframe column
replace row values in r
r create new column based on multiple condition
r replace values in dataframe from another dataframe
conditional replacement in r dplyr

I have a following line of code which replaces a value in a variable (var2) based on a value in other variable (var1)

df$var2[df$var1 > 0] <- NA

However I would like to extend this and replace variables (e.g var5) in the data frame based on values in multiple other variables (var1, var2, var3, var4) which are coded in specific columns 13:16.

I tried

df$var5[df[c(13:16)] > 0] <- NA

which would not work correctly and I would like to know why and how to optimally amend the code.

Here is a base R solution, slightly modification based on your code

df$var5[rowSums(df[13:16] > 0)>0] <- NA

Note that df[c(13:16)]>0 gives you a matrix, but you need just a vector to subset df$var5, so rowSums() can help you check if the whole row is fill with TRUEs.

Example

Given df as

df <- structure(list(X1 = c(0L, 3L, 0L, 1L, 4L, 2L, NA, 1L, 2L, 2L, 
0L, 4L, 4L, 1L, NA, NA, 1L, 0L, 4L, 4L), X2 = c(0L, 0L, NA, 4L, 
4L, 1L, 1L, NA, 0L, 3L, 0L, 3L, 2L, NA, 1L, 1L, NA, 3L, 3L, 3L
), X3 = c(1L, 3L, 0L, NA, 0L, 3L, 0L, NA, 1L, 2L, 1L, NA, NA, 
1L, 4L, 1L, NA, NA, NA, 0L), X4 = c(2L, 2L, NA, 3L, NA, 2L, 0L, 
3L, 4L, 0L, 0L, NA, 3L, 4L, 4L, 3L, NA, 4L, 3L, 3L), X5 = c(0L, 
4L, 4L, NA, 0L, 0L, 2L, NA, 1L, 1L, 2L, NA, 1L, 3L, 2L, 4L, 1L, 
1L, 0L, 2L), X6 = c(2L, 1L, 1L, 4L, 1L, 4L, 3L, 4L, 3L, NA, 0L, 
2L, 1L, 2L, 2L, 0L, 4L, NA, NA, NA), X7 = c(3L, 3L, 0L, 4L, 4L, 
NA, 0L, 2L, NA, 2L, NA, 2L, 2L, 3L, 0L, 0L, 3L, 1L, NA, 0L), 
    X8 = c(1L, 2L, 3L, 0L, 2L, 4L, 2L, 3L, 1L, 0L, 3L, 0L, 3L, 
    1L, 4L, 1L, 1L, 1L, 2L, 0L), X9 = c(1L, 2L, 2L, 2L, NA, 2L, 
    4L, 2L, 0L, 1L, 3L, 1L, 1L, 3L, 4L, 0L, 4L, 4L, 4L, 3L), 
    X10 = c(NA, NA, 3L, NA, 3L, 1L, 0L, 2L, 0L, NA, 0L, 3L, 4L, 
    0L, 2L, 3L, 4L, 3L, 0L, 0L), X11 = c(4L, 4L, 0L, 4L, 3L, 
    1L, NA, 1L, 0L, 4L, 4L, NA, NA, 1L, NA, NA, 4L, 1L, NA, NA
    ), X12 = c(3L, 1L, 4L, 4L, 3L, 3L, 0L, 1L, 3L, 0L, 0L, 2L, 
    0L, 0L, NA, NA, NA, 3L, 2L, 4L), X13 = c(2L, 4L, 0L, 0L, 
    0L, NA, 4L, 3L, 3L, 3L, NA, 3L, 4L, 1L, 3L, 0L, 3L, NA, 3L, 
    4L), X14 = c(3L, 1L, 1L, 1L, 0L, 0L, 3L, 3L, 4L, 4L, NA, 
    0L, 4L, 3L, NA, 0L, 1L, 0L, 4L, 1L), X15 = c(2L, 2L, 1L, 
    0L, 3L, 1L, 4L, 4L, 2L, 1L, 3L, 2L, 2L, NA, NA, 0L, 3L, 4L, 
    3L, NA), X16 = c(4L, 2L, 2L, 0L, 0L, 1L, 4L, 0L, 2L, 1L, 
    3L, 0L, 2L, 0L, NA, 4L, 3L, 1L, 4L, 4L), resp = c(1.86666666666667, 
    2.26666666666667, 1.5, 2.07692307692308, 1.92857142857143, 
    1.78571428571429, 1.92857142857143, 2.23076923076923, 1.73333333333333, 
    1.71428571428571, 1.46153846153846, 1.83333333333333, 2.35714285714286, 
    1.64285714285714, 2.6, 1.30769230769231, 2.66666666666667, 
    2, 2.66666666666667, 2.15384615384615)), row.names = c(NA, 
-20L), class = "data.frame")

> df
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16     resp
1   0  0  1  2  0  2  3  1  1  NA   4   3   2   3   2   4 1.866667
2   3  0  3  2  4  1  3  2  2  NA   4   1   4   1   2   2 2.266667
3   0 NA  0 NA  4  1  0  3  2   3   0   4   0   1   1   2 1.500000
4   1  4 NA  3 NA  4  4  0  2  NA   4   4   0   1   0   0 2.076923
5   4  4  0 NA  0  1  4  2 NA   3   3   3   0   0   3   0 1.928571
6   2  1  3  2  0  4 NA  4  2   1   1   3  NA   0   1   1 1.785714
7  NA  1  0  0  2  3  0  2  4   0  NA   0   4   3   4   4 1.928571
8   1 NA NA  3 NA  4  2  3  2   2   1   1   3   3   4   0 2.230769
9   2  0  1  4  1  3 NA  1  0   0   0   3   3   4   2   2 1.733333
10  2  3  2  0  1 NA  2  0  1  NA   4   0   3   4   1   1 1.714286
11  0  0  1  0  2  0 NA  3  3   0   4   0  NA  NA   3   3 1.461538
12  4  3 NA NA NA  2  2  0  1   3  NA   2   3   0   2   0 1.833333
13  4  2 NA  3  1  1  2  3  1   4  NA   0   4   4   2   2 2.357143
14  1 NA  1  4  3  2  3  1  3   0   1   0   1   3  NA   0 1.642857
15 NA  1  4  4  2  2  0  4  4   2  NA  NA   3  NA  NA  NA 2.600000
16 NA  1  1  3  4  0  0  1  0   3  NA  NA   0   0   0   4 1.307692
17  1 NA NA NA  1  4  3  1  4   4   4  NA   3   1   3   3 2.666667
18  0  3 NA  4  1 NA  1  1  4   3   1   3  NA   0   4   1 2.000000
19  4  3 NA  3  0 NA NA  2  4   0  NA   2   3   4   3   4 2.666667
20  4  3  0  3  2 NA  0  0  3   0  NA   4   4   1  NA   4 2.153846

then

df$resp[rowSums(df[12:16]>0,na.rm = T)>0] <- NA

such that

> df
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 resp
1   0  0  1  2  0  2  3  1  1  NA   4   3   2   3   2   4   NA
2   3  0  3  2  4  1  3  2  2  NA   4   1   4   1   2   2   NA
3   0 NA  0 NA  4  1  0  3  2   3   0   4   0   1   1   2   NA
4   1  4 NA  3 NA  4  4  0  2  NA   4   4   0   1   0   0   NA
5   4  4  0 NA  0  1  4  2 NA   3   3   3   0   0   3   0   NA
6   2  1  3  2  0  4 NA  4  2   1   1   3  NA   0   1   1   NA
7  NA  1  0  0  2  3  0  2  4   0  NA   0   4   3   4   4   NA
8   1 NA NA  3 NA  4  2  3  2   2   1   1   3   3   4   0   NA
9   2  0  1  4  1  3 NA  1  0   0   0   3   3   4   2   2   NA
10  2  3  2  0  1 NA  2  0  1  NA   4   0   3   4   1   1   NA
11  0  0  1  0  2  0 NA  3  3   0   4   0  NA  NA   3   3   NA
12  4  3 NA NA NA  2  2  0  1   3  NA   2   3   0   2   0   NA
13  4  2 NA  3  1  1  2  3  1   4  NA   0   4   4   2   2   NA
14  1 NA  1  4  3  2  3  1  3   0   1   0   1   3  NA   0   NA
15 NA  1  4  4  2  2  0  4  4   2  NA  NA   3  NA  NA  NA   NA
16 NA  1  1  3  4  0  0  1  0   3  NA  NA   0   0   0   4   NA
17  1 NA NA NA  1  4  3  1  4   4   4  NA   3   1   3   3   NA
18  0  3 NA  4  1 NA  1  1  4   3   1   3  NA   0   4   1   NA
19  4  3 NA  3  0 NA NA  2  4   0  NA   2   3   4   3   4   NA
20  4  3  0  3  2 NA  0  0  3   0  NA   4   4   1  NA   4   NA

How to replace a value in a data frame based on a conditional 'If , How to replace a value in a data frame based on a conditional 'If' How to change the value of a variable using R programming in a data� Hi Experts, This may be simple question, I want to create new variable "seg" and assign values to it based on some conditions satisfied by each observation. Here is the example: ##Below are the conditions ##if variable x2 gt 0 and x3 gt 200 then seg should take value 1, ##if variable x2 gt 100 and x3 gt 300 then seg should take value 2 ##if variable x2 gt 200 and x3 gt 400 then seg should take

First some.dummy data

library(data.table)
dt1 <- data.table(
 "V1" = rnorm(10,0,1),
 "V2" = rnorm(10,0,1),
 "V3" = rnorm(10,0,1),
 "V4" = rnorm(10,0,1),
 "V5" = rnorm(10,0,1))

Then for one variable

dt1[V1 < 0, V6 := NA]

And for multiple

dt1[V1 < 0 & V2 < 0 & V3 <0, V5 := NA]

R Replace Values in Data Frame Conditionally, How to exchange data frame values based on a condition in R - 4 R programming Two of the variables are numeric, one of the variables is a character, and another one of the how to perform a conditional replacement of numeric values in a data frame variable. Sort Data Frame by Multiple Columns in R (3 Examples)� Example 2 : Nested If ELSE Statement in R Multiple If Else statements can be written similarly to excel's If function. In this case, we are telling R to multiply variable x1 by 2 if variable x3 contains values 'A' 'B'. If values are 'C' 'D', multiply it by 3. Else multiply it by 4.

You can can get a better understanding by looking at smaller parts of your code.

First look at df[c(13:16)] > 0: the result is a matrix with just TRUE and FALSE in it. In particular you cannot use df$var5[df[c(13:16)] > 0] because the inner object is a matrix and you cannot subset a vector with matrix.

So what can you do? You can use apply to transform this matrix into a vector:

idx <- apply(df[c(13:16)] > 0, 1, all)

This will result in a vector which contains TRUE if all elements in a row are TRUE and FALSE otherwise. And finally you can use df$var5[idx] <- NA

How to insert values into a column based on another columns value , This would be pretty simple with base R subsetting: If you want to then replace value in the column "jobs" for this subset you would use: I have two sources of the same variable (say, from different excel sheets or whatever). So. Better yet, how do I do this if I have two data frames with the same variables and I want to� I want to create a new variable that is equal to the value of one of two other variables, conditional on the values of still other variables. Here's a toy example with fake data. Each row of the data frame represents a student.

Here is a tidyverse answer. First, we create a dummy dataset. For printing purposes I have only created one with 10 variables, rather than the 16 you requested:

library(tidyverse)

set.seed(1)
df <-
  replicate(9, sample(0:4, size = 10, replace = TRUE)) %>% 
  as_tibble() %>% 
  set_names(paste0("var", 1:9))

df
#> # A tibble: 10 x 9
#>     var1  var2  var3  var4  var5  var6  var7  var8  var9
#>    <int> <int> <int> <int> <int> <int> <int> <int> <int>
#>  1     0     4     4     3     2     0     3     2     1
#>  2     3     4     1     3     1     3     3     1     1
#>  3     0     1     1     3     1     4     0     3     4
#>  4     1     1     0     1     4     0     4     2     1
#>  5     4     0     3     3     1     0     4     4     4
#>  6     2     4     0     0     0     3     0     1     3
#>  7     1     4     3     0     2     4     0     1     4
#>  8     2     0     2     3     2     4     2     0     3
#>  9     2     0     1     0     3     3     1     2     0
#> 10     0     4     1     1     2     4     1     2     2

Next, we conditionally mutate variable var5 such that it is equal to NA only if all of the variables var6:var9 are greater than 0, and otherwise keep their original values:

df <- 
  df %>% 
  mutate(
    var5 = ifelse(var6 > 0 & var7 > 0 & var8 > 0 & var9 > 0, NA, var5)
  )

df
#> # A tibble: 10 x 9
#>     var1  var2  var3  var4  var5  var6  var7  var8  var9
#>    <int> <int> <int> <int> <int> <int> <int> <int> <int>
#>  1     0     4     4     3     2     0     3     2     1
#>  2     3     4     1     3    NA     3     3     1     1
#>  3     0     1     1     3     1     4     0     3     4
#>  4     1     1     0     1     4     0     4     2     1
#>  5     4     0     3     3     1     0     4     4     4
#>  6     2     4     0     0     0     3     0     1     3
#>  7     1     4     3     0     2     4     0     1     4
#>  8     2     0     2     3     2     4     2     0     3
#>  9     2     0     1     0     3     3     1     2     0
#> 10     0     4     1     1    NA     4     1     2     2

Created on 2020-01-22 by the reprex package (v0.3.0)

EDIT

Based on your comment below, we use the | operator to say 'or' instead of & to say 'and'. First, we create a new dummy dataset with many more 0s for demonstration purposes:

library(tidyverse)

set.seed(1)
df <-
  replicate(9, sample(c(rep(0, 10), 1:4), size = 10, replace = TRUE)) %>% 
  as_tibble() %>% 
  set_names(paste0("var", 1:9))

df
#> # A tibble: 10 x 9
#>     var1  var2  var3  var4  var5  var6  var7  var8  var9
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     0     1     0     4     0     2     0     0     0
#>  2     0     0     0     0     2     0     4     0     3
#>  3     0     0     0     2     0     0     0     0     0
#>  4     0     0     0     0     0     0     3     0     0
#>  5     0     0     0     0     2     0     0     2     0
#>  6     3     0     4     0     0     0     0     0     0
#>  7     0     0     0     0     0     0     4     0     4
#>  8     1     4     0     0     0     0     0     0     0
#>  9     4     0     0     0     0     0     0     1     0
#> 10     0     0     0     0     0     0     0     0     3

And now we replace & with |:

df <- 
  df %>% 
  mutate(
    var5 = ifelse(var6 > 0 | var7 > 0 | var8 > 0 | var9 > 0, NA, var5)
  )

df
#> # A tibble: 10 x 9
#>     var1  var2  var3  var4  var5  var6  var7  var8  var9
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     0     1     0     4    NA     2     0     0     0
#>  2     0     0     0     0    NA     0     4     0     3
#>  3     0     0     0     2     0     0     0     0     0
#>  4     0     0     0     0    NA     0     3     0     0
#>  5     0     0     0     0    NA     0     0     2     0
#>  6     3     0     4     0     0     0     0     0     0
#>  7     0     0     0     0    NA     0     4     0     4
#>  8     1     4     0     0     0     0     0     0     0
#>  9     4     0     0     0    NA     0     0     1     0
#> 10     0     0     0     0    NA     0     0     0     3

Created on 2020-01-22 by the reprex package (v0.3.0)

7 Modifying Values, You can use R's notation system to modify values within an R object. First You can replace multiple values at once as long as the number of new values equals the number of selected values: This provides a great way to add new variables to your data set: Every other card gets the value that it already has in deck . 1 has three consecutive zeros and out of those only last two zeros need to replace with '.' . ie values 174 18 0 0 0 will be 174 18 0 . . keep the value zero if its preceeding value is non zero. replace zero with '.' if its preceeding value is zero. if zero comes first keep it as zero. zeros in between non zero values should be zeros only. o/p. x y

19 Functions, The focus of this chapter is on writing functions in base R, so you won't need any extra packages. At this point it's a good idea to check your function with a few different inputs: For example, we might discover that some of our variables include infinite values, An if statement allows you to conditionally execute code . Learning R: How to replace single and multiple values in R - Duration: Change Certain Values in Variable/Column Replacing a character by some other character in column names of a

Replacing values, #Extracting variables of interest (optional), here for illustration purposes # Adding max variable to the dataset #Replacing specific values using conditional. The dataset bearclawpoppy.csv consists of poppy [P, A] and a set of associated environmental variables. The data must ‘cleaned’ prior to building a predictive model, including: All missing values (‘99’, ‘blank’) should be converted to NA; Condensing landform variables according to the rules:

Recode values — recode • dplyr, If not named, the replacement is done based on position i.e. .x represents see # https://adv-r.hadley.nz/vectors-chap.html#missing-values recode(char_vec, to change the replacement value recode(num_vec, "a", "b", "c", .default = "other"). You can also change the value in an existing variable for a subset of cases. You can paste the variable name in the Target Variable window, type the new value in the Numeric Expression window, then click on the If button. You can specify the condition (such as GENDER=1 or RACE=1 AND REGION=3) and then click on the Continue Button.

Comments
  • Thanks for your answer. I do not understand the ==4 element. Could you please explain?
  • @EvaBalgova the number of columns from 13 to 16
  • @ThomaslsCoding, this does not work for me and I am not sure why. It works fine with one variable, but not when I want to use a range of variables. I have a lot of NA's in the data and tried to add na.rm = TRUE to the code line but it still does not change var5 to NA despite values greater than 0 in the other variables.
  • @EvaBalgova I guess I got your point now... you have or conditions over columns, rather than &, is it?
  • @EvaBalgova I guess what you need is df$resp[rowSums(df[12:16]>0,na.rm = T)>0] <- NA. You can check my updates
  • rg255, thanks for your reply. dt1[V1 < 0 & V2 < 0 & V3 <0, V5 := NA] changes V5 to NA if all the variables are <0, I would need to change V5 to NA if any variable v1:v3 is <0.
  • dt1[(V1+V2+V3) < 0, V5 :=NA] solved my problem, however it does not work on my real data set. I do have a lot of NA values in all variables and I am wondering if that could be a problem and if so, how to deal with it so that the NA values are treated as 0.
  • Substitute the & for | (the "or" conditional operator)
  • Also, use !(is.na(V1))... (or without the exclamation mark as needed) to deal with the NAs
  • Thank you for your reply, but my aim is to replace the values in var5 not only if all of the variables in var6:var9 are >0, but if any one of these variables is >0. In other words if var6:var9 are all 0, then the value in var5 remains the same, as soon than there is a value greater than 0 in any of the variables var6:var9, then the value in var9 should become NA.
  • OK I understand - please see my edit in the answer above