R: more efficient solution than this for-loop

nested for loop in r
for loop with two variables in r
for loop in r range
r loop through list of variables
repeat loop in r
r speed up for loops
while loop in r
for loop in r data frame

I wrote a functioning for loop, but it's slow over thousands of rows and I'm looking for more efficient alternative. Thanks in advance!

The task:

  • If column a matches column b, column d becomes NA.
  • If column a does not match b, but b matches c, then column e becomes NA.

The for loop:

for (i in 1:nrow(data)) {
     if (data$a[i] == data$b[i]) {data$d[i] <- NA}
     if (!(data$a[i] == data$b[i]) & data$b[i] == data$c[i])
        {data$e[i] <- NA}
}

An example:

a    b    c    d    e
F    G    G    1    10
F    G    F    5    10
F    F    F    2    8

Would become:

a    b    c    d    e
F    G    G    1    NA
F    G    F    5    10
F    F    F    NA    8

If you're concerned about speed and efficiency, I'd recommend data.table (though technically vectorizing a normal data.frame as recommended by @parfait would probably speed things up more than enough)

library(data.table)

DT <- fread("a    b    c    d    e
             F    G    G    1    10
             F    G    F    5    10
             F    F    F    2    8")
print(DT)
#    a b c d  e
# 1: F G G 1 10
# 2: F G F 5 10
# 3: F F F 2  8

DT[a == b, d := NA]
DT[!a == b & b == c, e := NA]

print(DT)
#    a b c  d  e
# 1: F G G  1 NA
# 2: F G F  5 10
# 3: F F F NA  8

A Tutorial on Loops in R - Usage and Alternatives, A tutorial on loops in R that looks at the constructs available in R for looping. It pays off in terms of efficiency. It is nothing more than automating a multi-step process by organizing sequences of actions or 'batch' processes and by This loop will continue as long as the answer is not the expected 42 . Loops are slower in R than in C++ because R is an interpreted language (not compiled), even if now there is just-in-time (JIT) compilation in R (>= 3.4) that makes R loops faster (yet, still not as fast). Then, R loops are not that bad if you don’t use too many iterations (let’s say not more than 100,000 iterations).

Suppose df is your data then:

ab <- with(df, a==b)
bc <- with(df, b==c)

df$d[ab] <- NA
df$e[!ab & bc] <- NA

which would result in

#   a b c  d  e
# 1 F G G  1 NA
# 2 F G F  5 10
# 3 F F F NA  8

7 Efficient optimisation, For our exam result test example, if_else() works fine and is much faster than base R's (although it is still around 3 times slower than the hard-coded solution​): However if you are sorting inside a loop, or in a shiny application, then it can be  A tutorial on loops in R that looks at the constructs available in R for looping. Discover alternatives using R's vectorization feature. This R tutorial on loops will look into the constructs available in R for looping, when the constructs should be used, and how to make use of alternatives, such as R’s vectorization feature, to perform your

We could create a list of quosure and evaluate it

library(tidyverse)
qs <- setNames(quos(d*NA^(a == b), e*NA^((!(a ==b) & (b == c)))), c("d", "e"))
df1 %>%
    mutate(!!! qs)
#  a b c  d  e
#1 F G G  1 NA
#2 F G F  5 10
#3 F F F NA  8

Why loops are slow in R – Florian Privé – R(cpp) enthusiast, The first solution is to pre-allocate the whole result once (if you know its size in advance) and just Wow, sapply() is so much faster than loops! For example, solutions that make use of loops are less efficient than vectorized solutions that make use of apply functions, such as lapply and sapply. It’s often better to use the latter. Nevertheless, as a beginner in R, it is good to have a basic understanding of loops and how to write them.

Strategies to Speedup R Code, The for-loop in R, can be very slow in its raw un-optimised form, especially The results again is faster in order of magnitudes but slower than  2. Double for loop. Write a double for loop which prints 30 numbers (1:10, 2:11, 3:12). Those are three clusters of ten numbers each. The first loop determines the number of clusters (3) via its length; the second loop the numbers to be printed (1 to 10 at the beginning). Each cluster starts one number higher than the previous one.

21 Iteration, This is very important for efficiency: if you grow the for loop at each iteration using A better solution to save the results in a list, and then combine into a single A while loop is also more general than a for loop, because you can rewrite any  7.3 Efficient base R. In R there is often more than one way to solve a problem. In this section we highlight standard tricks or alternative methods that may improve performance.

24 Improving performance, Some solutions might be slower initially, but end up being faster because they're For example, calling mean.default() is quite a bit faster than calling mean() for The loops in a vectorised function are written in C instead of R. Loops in C are​  While loop in R. The while loop, in the midst of figure 1, is made of an init block as before, followed by a logical condition which is typically expressed by the comparison between a control variable and a value, by means of greater/less than or equal to, although any expression which evaluates to a logical value,

Comments
  • Look into ?ifelse: data$d <- ifelse(data$a == data$b, NA, data$d); data$e <- ifelse(data$a == data$b & data$b == data$c, NA, data$e)
  • No problem, lots of resources out there for using data.table, but here's a link to the main "Getting Started" page on the official site: github.com/Rdatatable/data.table/wiki/Getting-started
  • trying to be overkill ?
  • @denis Just showing another way. It would be useful when there are multiple expressions that are already created and wanted to change the data