Create counter for runs of TRUE among FALSE and NA, by group

%in% r
for loop in r
r subset dataframe by list of values
r extract rows with certain value
subset in r
match function in r
which.min r
r count number of true in column

I have a little nut to crack.

I have a data.frame where runs of TRUE are separated by runs of one or more FALSE or NA:

   group criterium
1      A        NA
2      A      TRUE
3      A      TRUE
4      A      TRUE
5      A     FALSE
6      A     FALSE
7      A      TRUE
8      A      TRUE
9      A     FALSE
10     A      TRUE
11     A      TRUE
12     A      TRUE
13     B        NA
14     B     FALSE
15     B      TRUE
16     B      TRUE
17     B      TRUE
18     B     FALSE

structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 
-18L))

I want to rank the groups of TRUE in column criterium in ascending order while disregarding the FALSEand NA. The goal is to have a unique, consecutive ID for each run of TRUE, within each group.

So the result should look like:

    group criterium goal
1      A        NA   NA
2      A      TRUE    1
3      A      TRUE    1
4      A      TRUE    1
5      A     FALSE   NA
6      A     FALSE   NA
7      A      TRUE    2
8      A      TRUE    2
9      A     FALSE   NA
10     A      TRUE    3
11     A      TRUE    3
12     A      TRUE    3
13     B        NA   NA
14     B     FALSE   NA
15     B      TRUE    1
16     B      TRUE    1
17     B      TRUE    1
18     B     FALSE   NA

I'm sure there is a relatively easy way to do this, I just can't think of one. I experimented with dense_rank() and other window functions of dplyr, but to no avail.

Another data.table approach:

library(data.table)
setDT(dt)
dt[, cr := rleid(criterium)][
    (criterium), goal := rleid(cr), by=.(group)]

which.min: Where is the Min() or Max() or first TRUE or FALSE ?, For a logical vector x with both FALSE and TRUE values, which.min(x) and which.is.max in package nnet differs in breaking ties at random (and having a ' fuzz' in the definition of ties). lambda = 10); x[sample.int(50, 20)] <- NA ## where is the first value >= 20 Run this example. Create a free Jupyter Notebook 11 Create counter for runs of TRUE among FALSE and NA, by group Apr 10 '19 8 compare two data.tables by row and add new column Jul 24 '19 6 Compare the words from a data frame and calculate a matrix with the length of the biggest word for each pair Jul 3 '19

Maybe I have over-complicated this but one way with dplyr is

library(dplyr)

df %>%
  mutate(temp = replace(criterium, is.na(criterium), FALSE), 
         temp1 = cumsum(!temp)) %>%
   group_by(temp1) %>%
   mutate(goal =  +(row_number() == which.max(temp) & any(temp))) %>%
   group_by(group) %>%
   mutate(goal = ifelse(temp, cumsum(goal), NA)) %>%
   select(-temp, -temp1)

#  group criterium  goal
#   <fct> <lgl>     <int>
# 1 A     NA           NA
# 2 A     TRUE          1
# 3 A     TRUE          1
# 4 A     TRUE          1
# 5 A     FALSE        NA
# 6 A     FALSE        NA
# 7 A     TRUE          2
# 8 A     TRUE          2
# 9 A     FALSE        NA
#10 A     TRUE          3
#11 A     TRUE          3
#12 A     TRUE          3
#13 B     NA           NA
#14 B     FALSE        NA
#15 B     TRUE          1
#16 B     TRUE          1
#17 B     TRUE          1
#18 B     FALSE        NA

We first replace NAs in criterium column to FALSE and take cumulative sum over the negation of it (temp1). We group_by temp1 and assign 1 to every first TRUE value in the group. Finally grouping by group we take a cumulative sum for TRUE values or return NA for FALSE and NA values.

Count Number of TRUE Values in Logical Vector in R (2 Examples), This article shows how to count the number of TRUE values in a logical vector in Create example vector x1 # Print example vector # FALSE TRUE TRUE FALSE TRUE A typical problem for the counting of TRUEs in a vector are NA values. directed acyclic graph of dependences among subproblems, such that finding a shortest path in this DAG is equivalent to solving the dynamic program. Solution: False. We saw a counter-example where we couldn’t do this in the matrix parenthesization problem. (o) T F [2 points] Every problem in NP can be solved in exponential time. Solution: True.

A data.table option using rle

library(data.table)
DT <- as.data.table(dat)
DT[, goal := {
  r <- rle(replace(criterium, is.na(criterium), FALSE))
  r$values <- with(r, cumsum(values) * values)          
  out <- inverse.rle(r)                                 
  replace(out, out == 0, NA)
}, by = group]
DT
#    group criterium goal
# 1:     A        NA   NA
# 2:     A      TRUE    1
# 3:     A      TRUE    1
# 4:     A      TRUE    1
# 5:     A     FALSE   NA
# 6:     A     FALSE   NA
# 7:     A      TRUE    2
# 8:     A      TRUE    2
# 9:     A     FALSE   NA
#10:     A      TRUE    3
#11:     A      TRUE    3
#12:     A      TRUE    3
#13:     B        NA   NA
#14:     B     FALSE   NA
#15:     B      TRUE    1
#16:     B      TRUE    1
#17:     B      TRUE    1
#18:     B     FALSE   NA

step by step

When we call r <- rle(replace(criterium, is.na(criterium), FALSE)) we get an object of class rle

r
#Run Length Encoding
#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1
#  values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE ...

We manipulate the values compenent in the following way

r$values <- with(r, cumsum(values) * values)
r
#Run Length Encoding
#  lengths: int [1:9] 1 3 2 2 1 3 2 3 1
#  values : int [1:9] 0 1 0 2 0 3 0 4 0 

That is, we replaced TRUEs with the cumulative sum of values and set the FALSEs to 0. Now inverse.rle returns a vector in which values will repeated lenghts times

out <- inverse.rle(r)
out
# [1] 0 1 1 1 0 0 2 2 0 3 3 3 0 0 4 4 4 0 

This is almost what OP wants but we need to replace the 0s with NA

replace(out, out == 0, NA)

This is done for each group.

data

dat <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), criterium = c(NA, TRUE, TRUE, TRUE, 
FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, NA, FALSE, 
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, 
-18L))

4 Subsetting, Note that a missing value in the index always yields a missing value in the output: x[c(TRUE, TRUE, NA, FALSE)] #> [1] 2.1 4.2 NA. Nothing returns the original� Here are the QAnon supporters running for Congress in 2020. The QAnon conspiracy theory is rooted in the chan message boards. Here are 75 current or former congressional candidates who embrace it.

A pure Base R solution, we can create a custom function via rle, and use it per group, i.e.

f1 <- function(x) {
    x[is.na(x)] <- FALSE
    rle1 <- rle(x)
    y <- rle1$values
    rle1$values[!y] <- 0
    rle1$values[y] <- cumsum(rle1$values[y])
    return(inverse.rle(rle1))
}


do.call(rbind, 
     lapply(split(df, df$group), function(i){i$goal <- f1(i$criterium); 
                                             i$goal <- replace(i$goal, is.na(i$criterium)|!i$criterium, NA); 
    i}))

Of course, If you want you can apply it via dplyr, i.e.

library(dplyr)

df %>% 
 group_by(group) %>% 
 mutate(goal = f1(criterium), 
        goal = replace(goal, is.na(criterium)|!criterium, NA))

which gives,

# A tibble: 18 x 3
# Groups:   group [2]
   group criterium  goal
   <fct> <lgl>     <dbl>
 1 A     NA           NA
 2 A     TRUE          1
 3 A     TRUE          1
 4 A     TRUE          1
 5 A     FALSE        NA
 6 A     FALSE        NA
 7 A     TRUE          2
 8 A     TRUE          2
 9 A     FALSE        NA
10 A     TRUE          3
11 A     TRUE          3
12 A     TRUE          3
13 B     NA           NA
14 B     FALSE        NA
15 B     TRUE          1
16 B     TRUE          1
17 B     TRUE          1
18 B     FALSE        NA

R data.table symbols and operators you should know, With that in mind, we'll look at some special ways to subset, count, and of rows, or number of rows per group if you're aggregating in the “by” section. This is the simpler code to create a TRUE/FALSE vector that checks if You can find a list of them in the “special symbols” data.table help file by running� Our advanced guide to VLOOKUP provides step-by-step directions, sample files, and examples to help you make the most of this important function. This guide will help you solve problems creatively and write complicated formulas that go beyond the basics.

Excel COUNT and COUNTA functions with formula examples, It also shows how to count cells in Excel with one or more condition using dates, times, Boolean values of TRUE and FALSE, and text representation And again, to make your COUNTIFS formula more versatile, you can you are unlikely to run into any difficulty when using your count formula in Excel. CREATE_GROUP Procedure. This procedure creates a group. Groups contain members, which you can specify when you create the group or at a later time. There are three types of groups: window groups, database destination groups, and external destination groups. You can use a group name in other DBMS_SCHEDULER package procedures to specify a list of

Loops: while and for, If false, the loop stops. Run begin → (if condition → run body and run step) → ( if condition Here, the “counter” variable i is declared right in the loop. To make an “infinite” loop, usually the while(true) construct is used. Sites like PolitiFact and Factcheck.org are designed to verify political claims and hold politicians accountable. But critics say fact-checking entities are themselves biased. The Weekly Standard

Group by: split-apply-combine — pandas 1.1.0 documentation, For example, the groups created by groupby() below are in the order they want to include NA values in group keys, you could pass dropna=False to achieve it. of dropna argument is True which means NA are not included in group keys. data_df.groupby(key) # Non-NA count in each group In [116]: grouped.count()� Each member of the control group is matched, or "yoked", to a member of the experimental group so that, for the group as a whole, the time spent participating of the types of events encountered is kept constant. ex: Giving experimental group who experienced trauma electroshock therapy until they reached stress level of zero.

Comments
  • you can just about grab what you need with this work of beauty; as.numeric(as.factor(cumsum(is.na(d$criterium^NA)) + d$criterium^NA)) -- just needs to be applied by group
  • that is a really funny solution. Very good job!
  • In your example all of group A comes first, then group B. We don't need to handle cases with group=A, criterium=TRUE interspersed with group=B, criterium=TRUE?
  • No, when group A stops so stops the sequence for group A.
  • But I'm suggesting if you construct an example with group=A, criterium=TRUE followed by group=B, criterium=TRUE (with no FALSE's in-between), would that get a new 'goal' number or not? Some of the answers here will fail because they don't group-by group or consider the discontinuity in group.
  • Wow, impressive. Thanks for introducing me to rleand inverse.rle. Gruß nach Leipzig.
  • @Humpelstielzchen Gern geschehen. Will try to simplify and explain the logic a bit.
  • Thanks! I was dissecting your answer just like that. Your answer taught me the most. But chinsoon12 is just a Teufelskerl. ^^