Create a column Status based on the conditions on different states in R

r add column to dataframe based on other columns
r create new column based on condition
r change column value based on another column
r add column with values based on another column
r create new column based on multiple condition
r add calculated column to dataframe
add column to dataframe r
select columns in r

I have a data frame like this:

ID <- c(1,2,3,4,5,5,5,6,6)
States <- c(NA,NA,"All Locked","All Not Locked","All Locked","All Locked"
                   ,"All Not Locked","All Not Locked","All Not Locked")
ToolID <- c(NA,NA,"SWP","SWP","SWP","SWP","SWP","SWP","SWP")
Measurement <- c("Length","Breadth","Width","Height","Time","Time"
                   ,"Time","Mass","Mass")
Location <- c("US","US","UK","UK","US","US","US","UK","UK")

df1 <- data.frame(ID,States,ToolID,Measurement,Location)

I am trying to do some data manipulation on this data frame using the following conditions

For each ID (grouped),     
    if States = NA, then the Status = "No Status"
    if States column contains at least(count >=) 1 "All Locked", then the Status = "Lock Status"
    if States column doesn't contain (count =0)  "All Locked", then the Status = "No Lock Status"

My desired output is

  ID ToolID Measurement Location         Status
   1     NA      Length       US      No Status
   2     NA     Breadth       US      No status
   3    SWP       Width       UK    Lock Status
   4    SWP      Height       UK No Lock Status
   5    SWP        Time       US    Lock Status
   6    SWP        Mass       UK No Lock Status

I am trying to do it this way but getting the logic wrong

df1$Status <- ifelse(df1$States == NA, "No Status",
                ifelse((count(df1$States == "All Locked") >=1),
                  "Lock Status",
                  ifelse((count(df1$States == "All Locked") <1),
                    "No Lock Status", NA)))

Can someone point me in the right direction? I would like to apply to my bigger dataset and so a fast solution would help me a lot.

For NA elements, use is.na and dplyr::count works on data.frame/tbls.

Here, we group by 'ID', check if there is at least one "All Locked" in 'States' column then change it to "All Locked" for the entire group (Instead of using mutate to do this, change it within the group_by and add=TRUE for adding a new grouping variable along with the existing group), get the group by frequency of 'ID' and 'States' and then based on the condition, change the values in 'States'

library(dplyr)
df1 %>% 
  group_by(ID) %>%
  group_by(States = if("All Locked" %in% States) "All Locked" 
              else States, add = TRUE) %>% 
  mutate(n = n()) %>%
  ungroup %>% 
  mutate(States = c("No Lock Status", "Lock Status")[1+ 
                (States == "All Locked" & n >=1)], 
          States = replace(States, is.na(States), "No Status")) %>%
  select(-n) %>% 
  distinct

Manipulating data tables with dplyr, Changing values based on multiple conditions: case_when Note that R is case sensitive, so make sure that you respect each letter's case (i.e. upper or lower). for the United States of America based on the values in column Country type: So, had the last condition been moved to the top of the stack, all temp values  it is best to not store the entire word 'Cancelled' in a status column. status can be a char (1) with values "C"=cancelled, "O"=open, "D"=deleted, "P"=processed, etc. – KM. Sep 2 '09 at 22:01. I agree. Or use a TinyInt field so you can have many more statuses and map using an enum in code. – strider Apr 28 '14 at 16:21. active oldest votes.

Here's a short clean idiom using dplyr::case_when. First we compute Status as the summary statistic proportion of States which are "All Locked" (0..1 or NA), then we immediately recycle Status column into the corresponding string output:

df1 %>% group_by(ID) %>%

    summarize(ToolID=ToolID[1], Measurement=Measurement[1], Location=Location[1],
      Status = sum( States=="All Locked")/n() ) %>%

    mutate(Status = case_when(
      is.na(Status)         ~ "No Status",
      Status == 1           ~ "Lock Status",
      Status == 0           ~ "No Lock Status",
      between(Status, 0, 1) ~ as.character(NA) ))

Output:

     ID ToolID Measurement Location Status        
  <dbl> <fctr> <fctr>      <fctr>   <chr>         
1  1.00 NA     Length      US       No Status     
2  2.00 NA     Breadth     US       No Status     
3  3.00 SWP    Width       UK       Lock Status   
4  4.00 SWP    Height      UK       No Lock Status
5  5.00 SWP    Time        US       NA            
6  6.00 SWP    Mass        UK       No Lock Status

Data wrangling in R, Tidy data has a simple convention: put variables in the columns and observations in the rows. Together these properties make it easy to chain together multiple simple So far we've been using packages included in 'base R'; they are R with the command install.packages("package-name-in-quotes") . I have a data.frame in R. I want to try two different conditions on two different columns, but I want these conditions to be inclusive. Therefore, I would like to use "OR" to combine the conditions. I have used the following syntax before with lot of success when I wanted to use the "AND" condition.

The any() function is well suited for aggregation, here. Joining with a lookup table converts NA, TRUE, and FALSE, resp., into the Status values the OP expects.

The approach can be implemented in data.table syntax as well as dplyr style.

Create lookup table

This will be used by the data.table and the dplyr variants.

library(data.table)
lut <- data.table(st = c(NA, TRUE, FALSE), 
                  Status = c("No Status", "Lock Status", "No Lock Status"))
data.table version
library(data.table)
# aggregate by ID
agg <- setDT(df1)[, .(st = any(States == "All Locked")), by = ID][
  #  join with lookup table
  lut, on = "st"][, -"st"]
# join with df1 to prepend other columns
unique(df1[, -"States"])[agg, on = "ID"]
   ID ToolID Measurement Location         Status
1:  1   <NA>      Length       US      No Status
2:  2   <NA>     Breadth       US      No Status
3:  3    SWP       Width       UK    Lock Status
4:  5    SWP        Time       US    Lock Status
5:  4    SWP      Height       UK No Lock Status
6:  6    SWP        Mass       UK No Lock Status
dplyr version
library(dplyr)
agg <-df1 %>% 
  group_by(ID) %>% 
  summarize(st = any(States == "All Locked")) %>% 
  left_join(lut) %>% 
  select(-st)
df1 %>% 
  select(-States) %>%  
  unique() %>% 
  left_join(agg)
  ID ToolID Measurement Location         Status
1  1   <NA>      Length       US      No Status
2  2   <NA>     Breadth       US      No Status
3  3    SWP       Width       UK    Lock Status
4  4    SWP      Height       UK No Lock Status
5  5    SWP        Time       US    Lock Status
6  6    SWP        Mass       UK No Lock Status

4 data wrangling tasks in R for advanced beginners, One of the easiest tasks to perform in R is adding a new column to a data frame based on one or more other columns. You might want to add up several of your  @TheRedPea For more complicated conditions, based on different columns, non related to each-other, etc. One line for one condition. – Marek Sep 5 '16 at 20:17

4 Subsetting, Subsetting operators interact differently with different vector types (e.g., atomic This means that you can use a 2 column matrix to subset a matrix, a 3 column matrix to subset When extracting a single element, you have two options: you can create a smaller 4.5.7 Selecting rows based on a condition (logical subsetting). Selecting columns and filtering rows. We’re going to learn some of the most common dplyr functions: select(), filter(), mutate(), group_by(), and summarize(). To select columns of a data frame, use select(). The first argument to this function is the data frame (metadata), and the subsequent arguments are the columns to keep.

Tidyr: Crucial Step Reshaping Data with R for Easier Analyses , We'll use the R built-in USArrests data sets. We start by Row names are states, so let's use the function cbind() to add a column named “state” in the data. This will make The function gather() collapses multiple columns into key-value pairs​. Fortunately, there is a COUNTIFS function, and we can use it to get a count based on multiple criteria. We'll check column B for "East" region entries, and check column D for cells that are not empty. The criterion "<>" is the "not equal to" operator. Used alone it means "not equal to 'no text'", so it will count cells that are not empty.

Manipulating Data with dplyr – Data Science Blog by Domino, Each row contains the state , year , percentage of Democrat votes (demVote), While this base R syntax achieves the same end, the dplyr approach provides a The filter() function will extract rows that match all given conditions. The mutate() function allows you to create additional columns for your data  This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Cross Validated. I am new to using R. I am trying to figure out how to create a df from an existing df that excludes specific participants. For example I am looking to exclude Women over 40 with high bp.

Comments
  • Looks like you want the summarized output instead of mutate. So include a distinct after the ungroup
  • I already did distinct on your code sample but as per the logic it returns 2 rows for id =5 when I only need 1 row. please see my comment below to your solution.
  • It is not clear in the logic to remove the rows
  • Please clarify your desired output for ToolID 5; it has 2/3 locked states; according to the intent of your last ifelse(..., NA) it should give Status=NA? not "All Locked". See my answer.
  • Thanks for the solution akrun but it is not grouping by ID. I would like only one row per ID. Please see my desired output. I didn't mention "for each ID" in my question previously but now edited it. apologies for that.
  • @Sharath The grouping is only to get the frequency count after that it is ungrouped. It is as per your logic showed where count should get the frequency by group
  • @Sharath Your expected output is 6 rows while the input data is more
  • What I meant is this: In the above case, the ID = 5 has 3 values for states, "All locked", "All locked" "All Not Locked", since the count of states having "All locked" is greater than 1, then it should return only 1 row per ID with the status saying "Lock Status". For id =6, it doesnt have any "All locked" in states, and so it should return only 1 row with the status saying "No Lock Status". Am I making sense? I am sorry if I am confusing you.
  • Perfect. This is exactly how I wanted. Thank you so much for your help. I apologize for the confusion in explaining what I wanted.
  • Note the output for ToolID 5; it has 2/3 locked states; according to the intent of your last ifelse(..., NA) it should give Status=NA? not "All Locked".