assign grouping variable based on dataframe rows present R
group by in r
mutate in r
dplyr : : cheat sheet
filter in r
subset in r
I have a list like this in R:
cat1 cat7 cat10 cat4 frog dino11 dino12 dino15 rabbit
I need to make a new dataframe that looks like:
cat1 frog cat7 frog cat10 frog cat4 frog dino11 rabbit dino12 rabbit dino15 rabbit
We create a grouping variable based on the non-occurrence of number in the 'v1', take the
lag, create a new column 'v2' as the
last element of 'v1', remove the
last row for each group and
select columns that are of interest
library(tidyverse) df %>% group_by(grp = lag(cumsum(grepl("^[^0-9]+$", v1)), default = 0)) %>% mutate(v2 = last(v1)) %>% slice(-n()) %>% ungroup %>% select(-grp) # A tibble: 7 x 2 # v1 v2 # <chr> <chr> #1 cat1 frog #2 cat7 frog #3 cat10 frog #4 cat4 frog #5 dino11 rabbit #6 dino12 rabbit #7 dino15 rabbit
df <- structure(list(v1 = c("cat1", "cat7", "cat10", "cat4", "frog", "dino11", "dino12", "dino15", "rabbit")), .Names = "v1", class = "data.frame", row.names = c(NA, -9L))
Manipulating, analyzing and exporting data with tidyverse, Select certain rows in a data frame according to filtering conditions with the dplyr The results from a base R function sometimes depend on the type of data. package to read the data and avoid having to set stringsAsFactors to FALSE Once the data are grouped, you can also summarize multiple variables at the same I'd like to group the data based on the lists, using the name column; df name value group 1 A 1 group1 2 B 2 group1 3 C 3 group2 4 D 4 group1 5 E 5 group2 6 F 6 group3 and sum the values for each group. df group sum 1 group1 7 2 group2 8 3 group3 6
Similar to @akrun's answer, but with data.table:
library(data.table) setDT(df) df[, .( anum = v1[-.N], a = v1[.N] ), by=.(g = cumsum(!(shift(v1) %like% "\\d")))] g anum a 1: 1 cat1 frog 2: 1 cat7 frog 3: 1 cat10 frog 4: 1 cat4 frog 5: 2 dino11 rabbit 6: 2 dino12 rabbit 7: 2 dino15 rabbit
Compute and Add new Variables to a Data Frame in R, We'll also present three variants of mutate() and transmute() to modify We'll use the R built-in iris data set, which we start by converting into a tibble data 0.4 setosa ## # with 144 more rows, and 1 more variable: sepal_by_petal_l <dbl> R: Create conditional variable based on factor level of one variable and assign to all rows of same group. Refresh. April 2019. Views. 44 time. 1.
With base R only, you can do it with
where <- grepl("[[:digit:]]", x) r <- rle(where) A <- x[where] B <- rep.int(x[!where], times = r$lengths[r$values]) data.frame(A, B) # A B #1 cat1 frog #2 cat7 frog #3 cat10 frog #4 cat4 frog #5 dino11 rabbit #6 dino12 rabbit #7 dino15 rabbit
x <- scan(what = character(), text = " cat1 cat7 cat10 cat4 frog dino11 dino12 dino15 rabbit ")
Data wrangling in R, Tidy data has a simple convention: put variables in the columns and observations in the rows. Right now we are going to use dplyr to wrangle this tidyish data set (the You can refer to columns in the data frame directly without using $ . So far we've been using packages included in 'base R'; they are and would like to build this kind of matrix with only two values "1" being in the two variables. I would not like to count rows where there are more then two values like: KEY C1 C2 C3 C4 L 1 0 1 1 or less then two: M 1 0 0 0 Output should be frequency table. C1 C2 C3 C4 C1 3 0 1 1 C2 0 3 2 0 C3 1 2 7 0 C4 1 0 0 1
Manipulating data tables with dplyr, The basic set of R tools can accomplish many data table queries, but the syntax can be Tables can be subsetted by rows based on column values. This function is invoked for its side effect, which is assigning value to the variable x. If no envir is specified, then the assignment takes place in the currently active environment. If inherits is TRUE, enclosing environments of the supplied environment are searched until the variable x is encountered.
5 Data transformation, Often you'll need to create some new variables or summaries, or maybe you just It tells you that dplyr overwrites some functions in base R. If you want to use the from other data frames you might have used in the past: it only shows the first few rows from operating on the entire dataset to operating on it group-by-group. Grouping variable (s) and variables to be aggregated can be specified with R’s formula notation. Setting drop = TRUE means that any groups with zero count are removed. na.action controls the treatment of missing values within the data. The analysis in this post was performed in Displayr using R. You can repeat or amend this analysis for
R: Basics of R objects; entering and manipulating data, Basics of R objects; Accessing and manipulating R objects; Entering and into x (<- and = are alternative/synonymous assignment operators) v <- c(1,2,3,4,5) # c matrix, dataframe) head(mydata, n=3) # Shows the first 3 rows tail(mydata, It's easy to sort data frames and to create new variables based on existing ones. Below is the first part of the mtcars data frame that is provided in the base R package. Now, suppose we interested in purchasing a car. We’re interested in 3 things regarding the car we’re seeking to purchase: the fuel economy, the power, and the speed.
- You might want to clarify what your data is.
DF = data.frame(c("cat1", "cat7", "cat10", "cat4", "frog", "dino11", "dino12", "dino15", "rabbit"))? "List" has a specific meaning in r.
- @Frank right, I think I was in a hurry. (But am not sure.)