tidyr separate column values into character and numeric using regex

r split column into multiple columns by separator
tidyr separate multiple columns
split one column into multiple columns in spark dataframe
tidyr::separate unknown number of columns
r split column by number of characters
tidyr separate regex
error: var must evaluate to a single number or a column name, not a character vector
separate function in r

I'd like to separate column values using tidyr::separate and a regex expression but am new to regex expressions

df <- data.frame(A=c("enc0","enc10","enc25","enc100","harab0","harab25","harab100","requi0","requi25","requi100"), stringsAsFactors=F) 

This is what I've tried

library(tidyr)
df %>%
   separate(A, c("name","value"), sep="[a-z]+")

Bad Output

   name value
1           0
2          10
3          25
4         100
5           0
# etc

How do I save the name column as well?


Separate a character column into multiple columns with a regular , Separate a character column into multiple columns with a regular expression or numeric locations separate( data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra If numeric, sep is interpreted as character positions to split at. tidyr is a part of the tidyverse, an ecosystem of packages designed with  If numeric, sep is interpreted as character positions to split at. Positive values start at 1 at the far-left of the string; negative value start at -1 at the far-right of the string. The length of sep should be one less than into. remove: If TRUE, remove input column from output data frame. convert: If TRUE, will run type.convert() with as.is = TRUE on new columns. This is useful if the component columns are integer, numeric or logical. NB: this will cause string "NA"s to be converted to


You can add one more step If you really want to get it with separate, in which I don't see the point, i.e. (Using the same regex as @ WiktorStribiżew),

df %>% 
  mutate(A = gsub('^([a-z]+)(\\d+)$', '\\1_\\2', A)) %>% 
  separate(A, into = c('name', 'value'), sep = '_')

Extract a character column into multiple columns using regular , Extract a character column into multiple columns using regular expression groups. Source: R/extract.R. extract.Rd. Given a regular expression with capturing groups, extract() turns each group a regular expression used to extract the desired values. This is useful if the component columns are integer, numeric or logical. Separator between columns. If character, is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values. If numeric, interpreted as positions to split at. Positive values start at 1 at the far-left of the string; negative value start at -1 at the far-right of the string.


For a bare R version without a lookaround-based regex, define the regular expression first:

> re <- "[a-zA-Z][0-9]"

Then use two substr() commands to separate and return the desired two components, before and after the matched pattern.

> with(df,
      data.frame(name=substr(A, 1L, regexpr(re, A)), 
                 value=substr(A, regexpr(re, A) + 1L, 1000L))
      )
    name value
1    enc     0
2    enc    10
3    enc    25
4    enc   100
5  harab     0
6  harab    25
7  harab   100
8  requi     0
9  requi    25
10 requi   100

The regex here looks for the pattern "any alpha" [a-zA-Z] followed by "any numeric" [0-9]. I believe this is what the reshape command does if the sep argument is specified as "".

separate: Separate a character column into multiple columns using , Given either regular expression or a vector of character positions, separate() turns a single character column into multiple columns. In tidyr: Tidy Messy Data If TRUE , remove input column from output data frame. c(NA, "B")) # If every row doesn't split into the same number of pieces, use # the extra and fill arguments to  regex: a regular expression used to extract the desired values. There should be one group (defined by ()) for each element of into. remove: If TRUE, remove input column from output data frame. convert: If TRUE, will run type.convert() with as.is = TRUE on new columns. This is useful if the component columns are integer, numeric or logical.


You could use the package unglue

library(unglue)
unglue_unnest(df, A, "{name=\\D+}{value}")
#>     name value
#> 1    enc     0
#> 2    enc    10
#> 3    enc    25
#> 4    enc   100
#> 5  harab     0
#> 6  harab    25
#> 7  harab   100
#> 8  requi     0
#> 9  requi    25
#> 10 requi   100

Created on 2019-10-08 by the reprex package (v0.3.0)

Tidyr::separate() at second/last occurence of character, Tidyr::separate() at second/last occurence of character These data are saved in columns entitled "daily_measure_X", where each part of the Maybe it could be done using a regex (to detect the first occurence starting from you want to split at the underscore that precedes the day, which is numerical. If numeric, interpreted as positions to split at. Positive values start at 1 at the far-left of the string; negative value start at -1 at the far-right of the string. The length of sep should be one less than into. remove. If TRUE, remove input column from output data frame. convert. If TRUE, will run type.convert() with as.is = TRUE on new columns. This is useful if the component columns are integer, numeric or logical. extra. If sep is a character vector, this controls what happens when


Reshaping Your Data with tidyr · UC Business Analytics R , Dealing with Numbers Dealing with Characters Dealing with Regex Dealing Although all the functions in tidyr and dplyr can be used without the pipe convert values to logical, integer, numeric, complex or factor as appropriate using the separate() function which turns a single character column into multiple columns. <tidy-select> Columns to pivot into longer format. names_to: A string specifying the name of the column to create from the data stored in the column names of data. Can be a character vector, creating multiple columns, if names_sep or names_pattern is provided. In this case, there are two special values you can take advantage of:


Chapter 11 Character vectors, Especially useful for functions that split one character vector into many and vice 2015 Regular expressions and character data in R by TA Kieran Samuk 11.4 Regex-free string manipulation with stringr and tidyr If you are willing to commit to the number of pieces, you can use str_split_fixed() and get a character matrix. Names of new variables to create as character vector. Use NA to omit the variable in the output. sep: Separator between columns. If character, sep is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values. If numeric, sep is interpreted as character positions to split


Split vector in r, R. and Wilks, A. split: character string containing a regular expression to use as ``​split''. Each part should have an equal number of elements. Value. Split the Elements of a Character Vector. Hi folks, Suppose I create the character Mar 22​, 2018 · Let us use separate function from tidyr to split the “file_name” column into​  In each of these cases, our objective may be to separate characters within the variable string. This can be accomplished using the separate() function which turns a single character column into multiple columns.