In R: help using rle() function in dataframe

r rle by group
r run length encoding

I am trying to find the number of consecutive runs of '1' values from a dataframe of over 1M obs. of 11 binary variables. I have looked at a number of similar questions on here, but none deal with lengthy dataframes like mine.

I can find the consecutive runs of '1's individually row-by-row, but I'm looking for a solution that can deal with my entire dataframe a bit more elegantly.

Simple example data:

test <- data.frame(v1=c(1,0,1),v2=c(1,1,1),v3=c(0,1,1),v4=c(1,1,0),v5=c(1,1,1))
test
vtest <- as.vector(test[1,])
vtest

r <- rle(vtest)
r$length[r$values ==1]
row1_max <- lapply(r$length[r$values ==1], FUN=max)
row1_max

What's the best way for me to find the max consecutive runs of '1' for each row of my dataframe without having to find each one individually by row?

My real dataset also contains an ID# variable that identifies each record uniquely, and I ultimately want to know the max consecutive runs by ID#, so any additional help there would be much appreciated.

Thanks in advance!

You can use apply to apply a function to each row of your data frame:

apply(test, 1, function(x) {
  r <- rle(x)
  max(r$lengths[as.logical(r$values)])
})

This returns the maximum number of consecutive 1s per row:

[1] 2 4 3

R Function of the Day: rle, However, it is very easy with the rle function in R! That function will encode the entire result into its run lengths. Using the example above, we  The rle function performs run length encoding. Although it is not used terribly often when programming in R, there are certain situations, such as time series and longitudinal data analysis, where knowing how it works can save a lot of time and give you insight into your data.

I would use a combinations of the apply family

library(dplyr) apply(test, 1, rle) %>% lapply(function(x) x$lengths) %>% vapply(max, numeric(1))

[1] 2 4 3

Rle-class function, It is based on the rle function from the base package. runLength(x) <- value : Replaces x with a new Rle object using run values runValue(x) contained in x . as.data.frame(x) , as(from, "data.frame") : Creates a data.frame with a single Multiple Rle's must be combined with c() before calling table . tail(x, n = 6L) : If n is  The rle function performs run length encoding. Although it is not used terribly often when programming in R, there are certain situations, such as time series and longitudinal data analysis, where knowing how it works can save a lot of time and give you insight into your data.

I'm assuming your df is tidy and that the binaries are in columns

set.seed(1)
event <- sample(1:3,365*3,replace=TRUE) # proxy for one of your columns
runs <- rle(event)
sum(runs$lengths >= 6 & runs$values == 1)
[1] 2

I'm currently working on finding the row numbers where the 6 or longer sequences begin

31. Run-length encoding in R – Data & Tools, As rle.df is a data frame, it can be stored using standard functions likewrite.csv. Decompressing a vector in run-  Apply a function to groups within a data.frame in R. Ask Question Another solution using the by function, but I had to order the data first Asking for help

Bioinformatics Data Skills: Reproducible and Robust Research with , Reproducible and Robust Research with Open Source Tools Vince Buffalo returning results as data.frame, 200 read.delim() function (R), 260 coercion of column 168 in R, 248 help on, 249 list.files() function, 257 use by sub() function​, 251 454 Rle object, 293 subsetting, using IRanges object, 295 Rle() function, 293  Pass a data.frame column name to a function I am trying to create a function in R where between the inputs there is dataframe and a column name. The code would be something like this:

The R Book, from the hypergeometric distribution, 208 Ricker curve drawing the function, 149 5 risk score Cox's proportional hazards model, 796 rle run length encoding​, to a dataframe, 131 row means of a matrix, 17 row.names using write.table,  Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. Learn more In R: help using rle() function in dataframe

A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data , We start off with a pseudocode as follows: read data into R using scan() with sep It will now be convenient to construct a data frame with one row for each score. the application numbers from the J01 records use the rle() function on the part​  Instead of writing an sql function, I chose to use the R's rle function, to get the longest streaks and then update my db table with the results. The (attached) dataframe is something like this: day user_id 2008/11/01 2001 2008/11/01 2002 2008/11/01 2003 2008/11/01 2004 2008/11/01 2005 2008/11/02 2001 2008/11/02 2005 2008/11/03 2001 2008/11/03

Comments
  • So long as your dataframe only includes 0 and 1 you don't need the as.logical() . R will coerce for you.
  • @CarlWitthoft This is not true. Try, e.g., c(1,2,3)[c(1,0,1)].
  • @SvenHohenstein oops, my bad. Blame it on my post-nap fog.