Create a column Status based on the conditions on different states in R
r create new column based on condition
r change column value based on another column
r add column with values based on another column
r create new column based on multiple condition
r add calculated column to dataframe
add column to dataframe r
select columns in r
I have a data frame like this:
ID <- c(1,2,3,4,5,5,5,6,6) States <- c(NA,NA,"All Locked","All Not Locked","All Locked","All Locked" ,"All Not Locked","All Not Locked","All Not Locked") ToolID <- c(NA,NA,"SWP","SWP","SWP","SWP","SWP","SWP","SWP") Measurement <- c("Length","Breadth","Width","Height","Time","Time" ,"Time","Mass","Mass") Location <- c("US","US","UK","UK","US","US","US","UK","UK") df1 <- data.frame(ID,States,ToolID,Measurement,Location)
I am trying to do some data manipulation on this data frame using the following conditions
For each ID (grouped), if States = NA, then the Status = "No Status" if States column contains at least(count >=) 1 "All Locked", then the Status = "Lock Status" if States column doesn't contain (count =0) "All Locked", then the Status = "No Lock Status"
My desired output is
ID ToolID Measurement Location Status 1 NA Length US No Status 2 NA Breadth US No status 3 SWP Width UK Lock Status 4 SWP Height UK No Lock Status 5 SWP Time US Lock Status 6 SWP Mass UK No Lock Status
I am trying to do it this way but getting the logic wrong
df1$Status <- ifelse(df1$States == NA, "No Status", ifelse((count(df1$States == "All Locked") >=1), "Lock Status", ifelse((count(df1$States == "All Locked") <1), "No Lock Status", NA)))
Can someone point me in the right direction? I would like to apply to my bigger dataset and so a fast solution would help me a lot.
NA elements, use
dplyr::count works on
Here, we group by 'ID', check
if there is at least one
"All Locked" in 'States' column then change it to "All Locked" for the entire group (Instead of using
mutate to do this, change it within the
add=TRUE for adding a new grouping variable along with the existing group), get the group by frequency of 'ID' and 'States' and then based on the condition, change the values in 'States'
library(dplyr) df1 %>% group_by(ID) %>% group_by(States = if("All Locked" %in% States) "All Locked" else States, add = TRUE) %>% mutate(n = n()) %>% ungroup %>% mutate(States = c("No Lock Status", "Lock Status")[1+ (States == "All Locked" & n >=1)], States = replace(States, is.na(States), "No Status")) %>% select(-n) %>% distinct
Manipulating data tables with dplyr, Changing values based on multiple conditions: case_when Note that R is case sensitive, so make sure that you respect each letter's case (i.e. upper or lower). for the United States of America based on the values in column Country type: So, had the last condition been moved to the top of the stack, all temp values it is best to not store the entire word 'Cancelled' in a status column. status can be a char (1) with values "C"=cancelled, "O"=open, "D"=deleted, "P"=processed, etc. – KM. Sep 2 '09 at 22:01. I agree. Or use a TinyInt field so you can have many more statuses and map using an enum in code. – strider Apr 28 '14 at 16:21. active oldest votes.
Here's a short clean idiom using
First we compute
Status as the summary statistic proportion of States which are "All Locked" (0..1 or NA), then we immediately recycle
Status column into the corresponding string output:
df1 %>% group_by(ID) %>% summarize(ToolID=ToolID, Measurement=Measurement, Location=Location, Status = sum( States=="All Locked")/n() ) %>% mutate(Status = case_when( is.na(Status) ~ "No Status", Status == 1 ~ "Lock Status", Status == 0 ~ "No Lock Status", between(Status, 0, 1) ~ as.character(NA) ))
ID ToolID Measurement Location Status <dbl> <fctr> <fctr> <fctr> <chr> 1 1.00 NA Length US No Status 2 2.00 NA Breadth US No Status 3 3.00 SWP Width UK Lock Status 4 4.00 SWP Height UK No Lock Status 5 5.00 SWP Time US NA 6 6.00 SWP Mass UK No Lock Status
Data wrangling in R, Tidy data has a simple convention: put variables in the columns and observations in the rows. Together these properties make it easy to chain together multiple simple So far we've been using packages included in 'base R'; they are R with the command install.packages("package-name-in-quotes") . I have a data.frame in R. I want to try two different conditions on two different columns, but I want these conditions to be inclusive. Therefore, I would like to use "OR" to combine the conditions. I have used the following syntax before with lot of success when I wanted to use the "AND" condition.
any() function is well suited for aggregation, here. Joining with a lookup table converts
FALSE, resp., into the
Status values the OP expects.
The approach can be implemented in
data.table syntax as well as
Create lookup table
This will be used by the
data.table and the
library(data.table) lut <- data.table(st = c(NA, TRUE, FALSE), Status = c("No Status", "Lock Status", "No Lock Status"))
library(data.table) # aggregate by ID agg <- setDT(df1)[, .(st = any(States == "All Locked")), by = ID][ # join with lookup table lut, on = "st"][, -"st"] # join with df1 to prepend other columns unique(df1[, -"States"])[agg, on = "ID"]
ID ToolID Measurement Location Status 1: 1 <NA> Length US No Status 2: 2 <NA> Breadth US No Status 3: 3 SWP Width UK Lock Status 4: 5 SWP Time US Lock Status 5: 4 SWP Height UK No Lock Status 6: 6 SWP Mass UK No Lock Status
library(dplyr) agg <-df1 %>% group_by(ID) %>% summarize(st = any(States == "All Locked")) %>% left_join(lut) %>% select(-st) df1 %>% select(-States) %>% unique() %>% left_join(agg)
ID ToolID Measurement Location Status 1 1 <NA> Length US No Status 2 2 <NA> Breadth US No Status 3 3 SWP Width UK Lock Status 4 4 SWP Height UK No Lock Status 5 5 SWP Time US Lock Status 6 6 SWP Mass UK No Lock Status
4 data wrangling tasks in R for advanced beginners, One of the easiest tasks to perform in R is adding a new column to a data frame based on one or more other columns. You might want to add up several of your @TheRedPea For more complicated conditions, based on different columns, non related to each-other, etc. One line for one condition. – Marek Sep 5 '16 at 20:17
4 Subsetting, Subsetting operators interact differently with different vector types (e.g., atomic This means that you can use a 2 column matrix to subset a matrix, a 3 column matrix to subset When extracting a single element, you have two options: you can create a smaller 4.5.7 Selecting rows based on a condition (logical subsetting). Selecting columns and filtering rows. We’re going to learn some of the most common dplyr functions: select(), filter(), mutate(), group_by(), and summarize(). To select columns of a data frame, use select(). The first argument to this function is the data frame (metadata), and the subsequent arguments are the columns to keep.
Tidyr: Crucial Step Reshaping Data with R for Easier Analyses , We'll use the R built-in USArrests data sets. We start by Row names are states, so let's use the function cbind() to add a column named “state” in the data. This will make The function gather() collapses multiple columns into key-value pairs. Fortunately, there is a COUNTIFS function, and we can use it to get a count based on multiple criteria. We'll check column B for "East" region entries, and check column D for cells that are not empty. The criterion "<>" is the "not equal to" operator. Used alone it means "not equal to 'no text'", so it will count cells that are not empty.
Manipulating Data with dplyr – Data Science Blog by Domino, Each row contains the state , year , percentage of Democrat votes (demVote), While this base R syntax achieves the same end, the dplyr approach provides a The filter() function will extract rows that match all given conditions. The mutate() function allows you to create additional columns for your data This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Cross Validated. I am new to using R. I am trying to figure out how to create a df from an existing df that excludes specific participants. For example I am looking to exclude Women over 40 with high bp.
- Looks like you want the summarized output instead of mutate. So include a
- I already did distinct on your code sample but as per the logic it returns 2 rows for id =5 when I only need 1 row. please see my comment below to your solution.
- It is not clear in the logic to remove the rows
- Please clarify your desired output for ToolID 5; it has 2/3 locked states; according to the intent of your last
ifelse(..., NA)it should give Status=NA? not "All Locked". See my answer.
- Thanks for the solution akrun but it is not grouping by ID. I would like only one row per ID. Please see my desired output. I didn't mention "for each ID" in my question previously but now edited it. apologies for that.
- @Sharath The grouping is only to get the frequency count after that it is ungrouped. It is as per your logic showed where
countshould get the frequency by group
- @Sharath Your expected output is 6 rows while the input data is more
- What I meant is this: In the above case, the ID = 5 has 3 values for states, "All locked", "All locked" "All Not Locked", since the count of states having "All locked" is greater than 1, then it should return only 1 row per ID with the status saying "Lock Status". For id =6, it doesnt have any "All locked" in states, and so it should return only 1 row with the status saying "No Lock Status". Am I making sense? I am sorry if I am confusing you.
- Perfect. This is exactly how I wanted. Thank you so much for your help. I apologize for the confusion in explaining what I wanted.
- Note the output for ToolID 5; it has 2/3 locked states; according to the intent of your last
ifelse(..., NA)it should give Status=NA? not "All Locked".