Calculating difference that starts over when a two-level factor alternates

3 level 2-factor factorial design example
2x2 factorial design example
fractional factorial design
example problems factorial design
2x3 factorial anova
two level factorial design slideshare
if we have 2^3 experiment, what does this mean and we run 10 replicates, what's the result?
why must we include interactions in factorial experiments

I'm trying to calculate the difference and then eventually the cumulative sum of the differences of a variable. The calculations are conditional on a factor that has two levels and the computations need to start over each time the factor alternates back-and-forth.

Lets consider x to be a time-stamp, and ant to be an antenna that detects an individual.

Hopefully my sample data makes this clear.

Using dplyr I've tried group_by(ant) but that doesn't start the difference back to zero when the individual is subsequently detected at the other antenna.

I have found other posts describing cumulative sums with restarts but none of them quite get at what I am trying to accomplish.

I am not tied to dplyr, but I am looking for assistance on keeping this scalable.

set.seed(14)
test <-  data.frame(x = sort(x= round(runif(20,0, 10), 2), decreasing = 
                    F),
                    ant = sample(c("n", "s"), replace = T, size = 20))

library(dplyr)
test %>%
    group_by(ant) %>%
    mutate(diff = x - lag(x))

The result I am looking for is:

   x    ant diff
1.64    n   0
2.54    n   0.9
3.53    s   0
3.82    s   0.29
4.28    s   0.46
4.74    s   0.46
4.86    n   0
5.11    s   0
5.53    s   0.42
5.95    n   0
6.38    s   0
6.73    n   0
 7.4    s   0
8.51    s   1.11
8.52    s   0.01
8.57    n   0
8.91    s   0
9.33    n   0
9.57    s   0
9.83    s   0.26

From here I should be able to figure out how to get the cumulative sum for each factor.

A solution similar to Uwe's, but only using tidyverse functions is

library(tidyverse)
test %>%
  mutate(seq_chg = ant != lag(ant)) %>%
  replace_na(list(seq_chg = TRUE)) %>%
  mutate(seq_id = cumsum(seq_chg)) %>%
  group_by(seq_id) %>%
  mutate(diff = x - lag(x)) %>%
  replace_na(list(diff = 0))

Result

# A tibble: 20 x 5
# Groups:   seq_id [12]
       x ant   seq_chg seq_id    diff
   <dbl> <fct> <lgl>    <int>   <dbl>
 1  1.64 n     TRUE         1 0      
 2  2.54 n     FALSE        1 0.9    
 3  3.53 s     TRUE         2 0      
 4  3.82 s     FALSE        2 0.29   
 5  4.28 s     FALSE        2 0.46   
 6  4.74 s     FALSE        2 0.46   
 7  4.86 n     TRUE         3 0      
 8  5.11 s     TRUE         4 0      
 9  5.53 s     FALSE        4 0.420  
10  5.95 n     TRUE         5 0      
11  6.38 s     TRUE         6 0      
12  6.73 n     TRUE         7 0      
13  7.4  s     TRUE         8 0      
14  8.51 s     FALSE        8 1.11   
15  8.52 s     FALSE        8 0.01000
16  8.57 n     TRUE         9 0      
17  8.91 s     TRUE        10 0      
18  9.33 n     TRUE        11 0      
19  9.57 s     TRUE        12 0      
20  9.83 s     FALSE       12 0.260  

Two Level Factorial Experiments, are the two main effects, and and the interaction effect . Calculating difference that starts over when a two-level factor alternates 1 Summarizing a grouped dataframe while maintaining all columns that are factor vectors

We need a grouping by the run-length-id of 'ant' to create a unique id whenever the 'ant' value switches to another value.

library(tidyverse)
library(data.table)
test %>% 
  group_by(grp = rleid(ant)) %>% # rleid from data.table
  mutate(diff1 = c(0, diff(x))) %>% 
  #or use the OP's code
  # mutate(diff1 = x - lag(x, default = first(x))) %>% 
  ungroup %>% 
  select(-grp) # remove the created grp column
# A tibble: 20 x 4
#       x ant    diff diff1
#   <int> <chr> <int> <int>
# 1     1 n         0     0
# 2     2 s         0     0
# 3     3 s         1     1
# 4     4 n         0     0
# 5     5 s         0     0
# 6     6 n         0     0
# 7     7 s         0     0
# 8     8 s         1     1
# 9     9 s         1     1
#10    10 s         1     1
#11    11 s         1     1
#12    12 n         0     0
#13    13 s         0     0
#14    14 n         0     0
#15    15 s         0     0
#16    16 n         0     0
#17    17 n         1     1
#18    18 n         1     1
#19    19 n         1     1
#20    20 s         0     0
data
test <- structure(list(x = 1:20, ant = c("n", "s", "s", "n", "s", "n", 
"s", "s", "s", "s", "s", "n", "s", "n", "s", "n", "n", "n", "n", 
"s"), diff = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 
 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L)), class = "data.frame", 
 row.names = c(NA, -20L))

10. Factorial experiments, How many interactions can be studied in a 2 * 3 * 5 factorial design? The second (X 2) column starts with -1 repeated twice, then alternates with 2 in a row of the opposite sign until all 2 k places are filled. The third (X 3) column starts with -1 repeated 4 times, then 4 repeats of +1's and so on. In general, the i-th column (X i) starts with 2 i-1 repeats of -1 folowed by 2 i-1 repeats of +1. Example of a 2 3

The OP has requested

to calculate the difference and then eventually the cumulative sum of the differences of a variable. The calculations [...] need to start over when the factor alternates back-and-forth.

Computing the differences

The rleid() function from the data.table package can be used to identify changes in ant:

library(data.table)
setDT(test)[, diff := c(0, diff(x)), by = rleid(ant)]
test
       x ant diff
 1: 1.64   n 0.00
 2: 2.54   n 0.90
 3: 3.53   s 0.00
 4: 3.82   s 0.29
 5: 4.28   s 0.46
 6: 4.74   s 0.46
 7: 4.86   n 0.00
 8: 5.11   s 0.00
 9: 5.53   s 0.42
10: 5.95   n 0.00
11: 6.38   s 0.00
12: 6.73   n 0.00
13: 7.40   s 0.00
14: 8.51   s 1.11
15: 8.52   s 0.01
16: 8.57   n 0.00
17: 8.91   s 0.00
18: 9.33   n 0.00
19: 9.57   s 0.00
20: 9.83   s 0.26

Or, using shift():

setDT(test)[, diff := x - shift(x, fill = x[1]), by = rleid(ant)]
Computing the cumulative sums directly

If I understand correctly, the computation of differences was only meant as an intermediate step to the final calculation of the cumulative sums which need to start over when the factor alternates back-and-forth.

This can be done directly because a cumulative sum of differences of x is equal to x minus the first value of x for each streak of identical values of ant:

setDT(test)[, cumsum := x - x[1L], by = rleid(ant)]
test
       x ant diff cumsum
 1: 1.64   n 0.00   0.00
 2: 2.54   n 0.90   0.90
 3: 3.53   s 0.00   0.00
 4: 3.82   s 0.29   0.29
 5: 4.28   s 0.46   0.75
 6: 4.74   s 0.46   1.21
 7: 4.86   n 0.00   0.00
 8: 5.11   s 0.00   0.00
 9: 5.53   s 0.42   0.42
10: 5.95   n 0.00   0.00
11: 6.38   s 0.00   0.00
12: 6.73   n 0.00   0.00
13: 7.40   s 0.00   0.00
14: 8.51   s 1.11   1.11
15: 8.52   s 0.01   1.12
16: 8.57   n 0.00   0.00
17: 8.91   s 0.00   0.00
18: 9.33   n 0.00   0.00
19: 9.57   s 0.00   0.00
20: 9.83   s 0.26   0.26

Factorial experiment, How many main effects does a 2x2 factorial design have? By convention we start all runs at their low levels and finish off with all factors at their high levels. There will be \(2^k\) runs, where \(k\) is the number of variables in the design and the \(2\) refers to the number of levels for each factor. In this case, \(2^2 = 4\) experiments (runs).

Chapter 10 More On Factorial Designs, 8.1 Calculation of Effects; 8.2 Calculation of Aliases; 8.3 Fold-over Design A full factorial two level design with factors requires runs for a single replicate. the ANOVA model for the experiments with all factors at two levels is different from from their two factor aliases, the alternate fraction that contains runs having at the​  A full factorial two level design with factors requires runs for a single replicate. For example, a two level experiment with three factors will require runs. The choice of the two levels of factors used in two level experiments depends on the factor; some factors naturally have two levels.

Statistical Design and Analysis of Experiments: With Applications , One way is to calculate differences between averages for levels 1 to k − 1 from the average An alternative way is to calculate the differences of the averages at the various (Chapter 7) to be extended to factors having more than two levels. Calculator Use. This is a factoring calculator if specifically for the factorization of the difference of two squares. If the input equation can be put in the form of a 2 - b 2 it will be factored. The work for the solution will be shown for factoring out any greatest common factors then calculating a difference of 2 squares using the idenity:

5.3.3.3.1. Two-level full factorial designs, The numbers `1' through `8' at the corners of the design box reference the `​Standard FIGURE 3.1 A 23 two-level, full factorial design; factors X1, X2, X3 The first (X1) column starts with -1 and alternates in sign for all 2k runs. (factor X3), each at a `high' and `low' setting, on a production tool to determine which had the  The number of levels of a factor or independent variable is equal to the number of variations of that factor that were used in the experiment. If an experiment compared the drug dosages 50 mg, 100 mg, and 150 mg, then the factor "drug dosage" would have three levels: 50 mg, 100 mg, and 150 mg. Be sure not to confuse the number of levels of a factor with the number of factors in an experiment.

Comments
  • @akun: Thanks! I thought this was it! Oddly, your answer appeared to work the first time I ran it, after a reboot it did not. Very strange and I can't get it to reproduce the result again with my data. Now the code you provide does not restart the difference computation with each change in 'ant'. I have edited my question with more complex values for 'x' in hopes to make it more clear what result I am aiming for.
  • @akun: I figured it out! There was a conflict with the 'plyr' package loaded
  • @Scolopax. If plyr is loaded, then you can use dplyr::mutate(..
  • @akun: that's helpful information for lots of tasks! It does work for a solution as does FlorianBrezina's
  • Thanks. This is close but doesn't calculate the difference between records of variable 'x'. I have edited the post in hopes to clarify the process I'm looking for
  • Thank you for clarifying your question and providing a new sample dataset. I have updated my answer which I hope does now meet your requirements.