Dummy variable "switch-point" in R

dummy variables in r
dummy variables example
dummy variable trap
dummy variable python
dummy variable excel
dummy variable spss
dummy variable interpretation
how many dummy variables to use

I have a dummy variable that serves as a flag for a number of conditions in my data set. I can't figure out how to write a function that marks the spot in which the flag assumes a "final switch" -- a value that will not change for the rest of the data frame. In the example below, everything after the 7th observation is a "y".

  dplyr::tibble(
    observation = c(seq(1,10)),
    crop = c(runif(3,1,25),
              runif(1,50,100),
              runif(2,1,10),
              runif(4,50,100)),
    flag = c(rep("n", 3),
             rep("y", 1),
             rep("n", 2),
             rep("y", 4)))

Which yields:

   observation  crop flag 
         <int> <dbl> <chr>
 1           1 13.3  n    
 2           2  4.34 n    
 3           3 17.1  n    
 4           4 80.5  y    
 5           5  9.62 n    
 6           6  8.39 n    
 7           7 92.6  y    
 8           8 74.1  y    
 9           9 95.3  y    
10          10 69.9  y    

I've tried creating a second flag that marks every switch and returns the "final" switch/flag variable, but over my whole data frame that will likely be highly inefficient. Any suggestions are welcome and appreciated.

One way to do this may be to create a flag that cumulatively sums occurrences of flag switches.

cumsum_na <- function(x){
  x[which(is.na(x))] <- 0
  return(cumsum(x))
}

df <- dplyr::tibble(
    observation = c(seq(1,10)),
    crop = c(runif(3,1,25),
              runif(1,50,100),
              runif(2,1,10),
              runif(4,50,100)),
    flag = c(rep("n", 3),
             rep("y", 1),
             rep("n", 2),
             rep("y", 4)))

df %>%
  mutate(flag2 = ifelse(flag != lag(flag), 1, 0) %>%
               cumsum_na)

# A tibble: 10 x 4
   observation  crop flag  flag2
         <int> <dbl> <chr> <dbl>
 1           1 12.1  n         0
 2           2 11.2  n         0
 3           3  4.66 n         0
 4           4 61.6  y         1
 5           5  6.00 n         2
 6           6  9.54 n         2
 7           7 67.6  y         3
 8           8 86.7  y         3
 9           9 91.6  y         3
10          10 84.5  y         3

You can then do whatever you need to using the flag2 column (eg. filter for max value, take first row, which will give you the first occurrence of constant state).

Dummy variable (statistics), takes on a value 1 its coefficient acts to alter the intercept. A dummy variable (aka, an indicator variable) is a numeric variable that represents categorical data, such as gender, race, political affiliation, etc. Technically, dummy variables are dichotomous, quantitative variables. Their range of values is small; they can take on only two quantitative values.

i count all the "n" first, and when when the final "n" is met, i get the index of the next obs

i=0
j=1
while (i<table(df$flag)["n"]) {
  if (as.character(df[j,3]) =="n" ) {
    i=i+1
    j=j+1
  } else j=j+1
}

You are looking for j

Dummy Variables - Social Research Methods, act like 'switches' that turn various parameters on and off in an equation. A dummy variable is a numerical variable used in regression analysis to represent subgroups of the sample in your study. In research design, a dummy variable is often used to distinguish different treatment groups.

We can make use of rleid from data.table

library(data.table)
setDT(df)[, flag2 := rleid(flag)]
df
#    observation      crop flag flag2
# 1:           1 21.472985    n     1
# 2:           2 21.563190    n     1
# 3:           3  1.393184    n     1
# 4:           4 88.422562    y     2
# 5:           5  6.383627    n     3
# 6:           6  8.484030    n     3
# 7:           7 86.998953    y     4
# 8:           8 62.220592    y     4
# 9:           9 93.141503    y     4
#10:          10 96.006885    y     4

Creating Dummy Variables in SPSS - Quick Tutorial, A dummy variable (aka, an indicator variable) is a numeric variable that represents categorical data, such as gender, race, political affiliation, etc. Researchers use´┐Ż A dummy variable is a variable that takes values of 0 and 1, where the values indicate the presence or absence of something (e.g., a 0 may indicate a placebo and 1 may indicate a drug). Where a categorical variable has more than two categories, it can be represented by a set of dummy variables, with one variable for each category.

Dummy Variable: Definition, A dummy variable is a numerical variable used in regression analysis to represent subgroups of the sample in your study. A dummy variableis commonly used in statistics and econometrics and regression analysis. This indicator variable takes on the value of 1 or 0 to indicate the availability or lack of some effect that would change the outcome of whatever is being tested.

Dummy Variables, This video introduces the concept of dummy variables, and explains how we interpret their Duration: 4:47 Posted: Jun 23, 2013 dummy variables, each of which has two levels. The first step in this process is to decide the number of dummy variables. This is easy; it's simply k-1, where k is the number of levels of the original variable. You could also create dummy variables for all levels in the original variable, and simply drop one from each analysis.

Dummy variables - an introduction, In this video we learn about dummy variables: what the are, why we use them, and how we Duration: 20:59 Posted: Jan 7, 2015 What Is a Dummy Variable? In regression analysis, a dummy is a variable that is used to include categorical data into a regression model. In previous tutorials, we have only used numerical data. We did that when we first introduced linear regressions and again when we were exploring the adjusted R-squared.

Comments
  • By constant, I mean never to switch again for the rest of the data frame/observations. Will clarify that in original question.
  • I edited my answer; let me know if this doesn't solve your problem!
  • Appreciate your help on this.
  • rleid seems useful. I'm trying to keep most of my operations in tibbles/dplyr, but I appreciate you offering this answer. Good to know about.
  • @BSHuniversity. No problem, You can just change the syntax to df %>% mutate(flag2 = rleid(flag))