assign grouping variable based on dataframe rows present R

r add column to dataframe based on other columns
group by in r
mutate in r
dplyr
dplyr : : cheat sheet
filter in r
subset in r
dplyr filter

I have a list like this in R:

cat1  
cat7  
cat10  
cat4  
frog  
dino11  
dino12  
dino15  
rabbit  

I need to make a new dataframe that looks like:

cat1 frog  
cat7 frog  
cat10 frog  
cat4 frog  
dino11 rabbit  
dino12 rabbit  
dino15 rabbit

Ideas? Thanks!

We create a grouping variable based on the non-occurrence of number in the 'v1', take the lag, create a new column 'v2' as the last element of 'v1', remove the last row for each group and select columns that are of interest

library(tidyverse)
df %>%
  group_by(grp = lag(cumsum(grepl("^[^0-9]+$", v1)), default = 0)) %>% 
  mutate(v2 = last(v1)) %>% 
  slice(-n()) %>%
  ungroup %>%
  select(-grp)
# A tibble: 7 x 2
#  v1     v2    
#  <chr>  <chr> 
#1 cat1   frog  
#2 cat7   frog  
#3 cat10  frog  
#4 cat4   frog  
#5 dino11 rabbit
#6 dino12 rabbit
#7 dino15 rabbit
data
df <- structure(list(v1 = c("cat1", "cat7", "cat10", "cat4", "frog", 
"dino11", "dino12", "dino15", "rabbit")), .Names = "v1",
 class = "data.frame", row.names = c(NA, -9L))

Manipulating, analyzing and exporting data with tidyverse, Select certain rows in a data frame according to filtering conditions with the dplyr The results from a base R function sometimes depend on the type of data. package to read the data and avoid having to set stringsAsFactors to FALSE Once the data are grouped, you can also summarize multiple variables at the same  I'd like to group the data based on the lists, using the name column; df name value group 1 A 1 group1 2 B 2 group1 3 C 3 group2 4 D 4 group1 5 E 5 group2 6 F 6 group3 and sum the values for each group. df group sum 1 group1 7 2 group2 8 3 group3 6

Similar to @akrun's answer, but with data.table:

library(data.table)
setDT(df)

df[, .(
  anum = v1[-.N], 
  a = v1[.N]
), by=.(g = cumsum(!(shift(v1) %like% "\\d")))]

   g   anum      a
1: 1   cat1   frog
2: 1   cat7   frog
3: 1  cat10   frog
4: 1   cat4   frog
5: 2 dino11 rabbit
6: 2 dino12 rabbit
7: 2 dino15 rabbit

Compute and Add new Variables to a Data Frame in R, We'll also present three variants of mutate() and transmute() to modify We'll use the R built-in iris data set, which we start by converting into a tibble data 0.4 setosa ## # with 144 more rows, and 1 more variable: sepal_by_petal_l <dbl>​  R: Create conditional variable based on factor level of one variable and assign to all rows of same group. Refresh. April 2019. Views. 44 time. 1.

With base R only, you can do it with grepl and rle.

where <- grepl("[[:digit:]]", x)
r <- rle(where)
A <- x[where]
B <- rep.int(x[!where], times = r$lengths[r$values])

data.frame(A, B)
#       A      B
#1   cat1   frog
#2   cat7   frog
#3  cat10   frog
#4   cat4   frog
#5 dino11 rabbit
#6 dino12 rabbit
#7 dino15 rabbit

DATA.

x <- scan(what = character(), text = "
cat1  
cat7  
cat10  
cat4  
frog  
dino11  
dino12  
dino15  
rabbit  
")

Data wrangling in R, Tidy data has a simple convention: put variables in the columns and observations in the rows. Right now we are going to use dplyr to wrangle this tidyish data set (the You can refer to columns in the data frame directly without using $ . So far we've been using packages included in 'base R'; they are  and would like to build this kind of matrix with only two values "1" being in the two variables. I would not like to count rows where there are more then two values like: KEY C1 C2 C3 C4 L 1 0 1 1 or less then two: M 1 0 0 0 Output should be frequency table. C1 C2 C3 C4 C1 3 0 1 1 C2 0 3 2 0 C3 1 2 7 0 C4 1 0 0 1

Manipulating data tables with dplyr, The basic set of R tools can accomplish many data table queries, but the syntax can be Tables can be subsetted by rows based on column values. This function is invoked for its side effect, which is assigning value to the variable x. If no envir is specified, then the assignment takes place in the currently active environment. If inherits is TRUE, enclosing environments of the supplied environment are searched until the variable x is encountered.

5 Data transformation, Often you'll need to create some new variables or summaries, or maybe you just It tells you that dplyr overwrites some functions in base R. If you want to use the from other data frames you might have used in the past: it only shows the first few rows from operating on the entire dataset to operating on it group-by-group​. Grouping variable (s) and variables to be aggregated can be specified with R’s formula notation. Setting drop = TRUE means that any groups with zero count are removed. na.action controls the treatment of missing values within the data. The analysis in this post was performed in Displayr using R. You can repeat or amend this analysis for

R: Basics of R objects; entering and manipulating data, Basics of R objects; Accessing and manipulating R objects; Entering and into x (<- and = are alternative/synonymous assignment operators) v <- c(1,2,3,4,5) # c matrix, dataframe) head(mydata, n=3) # Shows the first 3 rows tail(mydata, It's easy to sort data frames and to create new variables based on existing ones. Below is the first part of the mtcars data frame that is provided in the base R package. Now, suppose we interested in purchasing a car. We’re interested in 3 things regarding the car we’re seeking to purchase: the fuel economy, the power, and the speed.

Comments
  • You might want to clarify what your data is. DF = data.frame(c("cat1", "cat7", "cat10", "cat4", "frog", "dino11", "dino12", "dino15", "rabbit"))? "List" has a specific meaning in r.
  • @Frank right, I think I was in a hurry. (But am not sure.)