dplyr mutate + unlist issue

dplyr mutate if else
dplyr : : cheat sheet
dplyr mutate_if
dplyr mutate in place
r mutate add column
dplyr mutate with function
dplyr mutate recode
could not find function "mutate"

I'm trying to extract part of character in data frame.

d<-data.frame(a=c("aa_bb_cc", "ddd_eee_fff", "sss_rrr_eee"))

I'd like to get "bb", "eee", "rrr" part in new column. When use construction like below it works fine:

unlist(str_split(d$a[1],"_"))[2]
unlist(str_split(d$a[2],"_"))[2]

So I apply it into mutate (dplyr):

t<-d %>% mutate(new1=(unlist(str_split(a,"_"))[2])) 

But the results is "bb" in all cases. What I do wrong?

When you do

d %>% mutate(new1=(unlist(str_split(a,"_"))[2]))

it passes a column in str_split. So this is equivalent of

unlist(str_split(d$a, "_"))
#[1] "aa"  "bb"  "cc"  "ddd" "eee" "fff" "sss" "rrr" "eee"

and now when you subset it and get the 2nd element it gives

unlist(str_split(d$a, "_"))[2]
#[1] "bb"

Hence, this value is assigned to all the cases.


To resolve this you can add the operation rowwise to get your desired output as it will pass the value of a for each row separately in str_split.

library(tidyverse)

d %>%
  rowwise() %>%
  mutate(new1= unlist(str_split(a,"_"))[2])

#      a      new1 
#    <fct>    <chr>
#1 aa_bb_cc    bb   
#2 ddd_eee_fff eee  
#3 sss_rrr_eee rrr  

Or another safer option is to use separate and divide the string into different columns based on delimiter and select the relevant column

d %>%
  separate(a, into = c("one", "two", "three"), sep = "_", remove = FALSE) %>%
  select(a, two)

#            a two
#1    aa_bb_cc  bb
#2 ddd_eee_fff eee
#3 sss_rrr_eee rrr

Obviously you can also use base R option using sapply and strsplit

sapply(strsplit(as.character(d$a), "_"), "[[", 2)
#[1] "bb"  "eee" "rrr"

mutate function, mutate() adds new variables and preserves existing; transmute() drops Documentation reproduced from package dplyr, version 0.7.8, License: MIT + file​  Mutate multiple columns Source: This argument has been renamed to .vars to fit dplyr's terminology and is deprecated. Value . A data frame. By default, the newly created columns have the shortest names needed to uniquely identify the output. To force

d1 <- as.data.frame(unlist(str_split_fixed(d$a,"_", n =3)))

Hope this works

5 Data transformation, Mutate adds new variables and preserves existing; transmute drops existing variables. johnny.nguyen1192@gmail.com at Jun 27, 2018 dplyr v0.5.0. # NOT RUN { # Newly created variables are available immediately mtcars %>% as_tibble() %>% mutate( cyl2 = cyl * 2, cyl4 = cyl2 * 2 ) johnny.nguyen1192@gmail.com at Jun 27, 2018 dplyr v0.5.0

Maybe a good excuse to start using (highly customisable) regular expressions:

d[["new"]] <- gsub(".*_(.*)_.*", "\\1", d[["a"]])
d
            a new
1    aa_bb_cc  bb
2 ddd_eee_fff eee
3 sss_rrr_eee rrr

mutate: Create or transform variables in dplyr: A Grammar of Data , It tells you that dplyr overwrites some functions in base R. If you want to use the base Create new variables with functions of existing variables ( mutate() ). dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges: mutate() adds new variables that are functions of existing variables select() picks variables based on their names. filter() picks cases based on their values.

We can use str_extract

library(tidyverse)
d %>% 
   mutate(new = str_extract(a, "(?<=_)[^_]+"))
#            a new
#1    aa_bb_cc  bb
#2 ddd_eee_fff eee
#3 sss_rrr_eee rrr

Or with base R

d$new <- read.table(text = as.character(d$a), header = FALSE, sep="_")[,2]

Mutate Function in R Programming, mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. Both functions preserve the number of rows  Stack Overflow Public questions and answers; Teams Private questions and answers for your team; Enterprise Private self-hosted questions and answers for your enterprise; Jobs Programming and related technical career opportunities

How to use mutate in R, The dplyr package is an add-on to R. It includes a host of cool functions for selecting, filtering, grouping, and arranging data. It also includes the mutate function. The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument. dplyr version >= 0.7

Introduction to dplyr, The mutate() function is a function for creating new variables. Essentially, that's all it does. Like all of the dplyr functions, it is designed to do one  Apply function within mutate. Ask Question Asked 3 years, 4 months ago. Active 3 years, dplyr mutate - How to properly apply custom function with mutate? 2. How to mutate with ifelse within a function . Hot Network Questions Conger vs Partir Does the

Aggregating and analyzing data with dplyr, mutate() changes the values of columns and creates new columns. relocate() changes the order of the columns. Groups of rows: summarise() collapses a group  Mutate Function in R (mutate, mutate_all and mutate_at) is used to create new variable or column to  the dataframe in R.  Dplyr package in R is provided with mutate(), mutate_all() and mutate_at() function which creates the new variable to the dataframe. Syntax of mutate function in dplyr:

Comments
  • The always fun function word, i.e. stringr::word(d$a, 2, sep = '_')