Splitting strings into number and string (with missings)

matlab split string by number of characters
r split column into multiple columns by separator
find the missing number in a string of numbers with no separator
matlab split string by character
matlab split string into characters
finding the number in string sequence
r split string by delimiter
matlab split string into numbers

I am trying to separate numbers and characters in a column of strings. So far I have been using tidyr::separate for doing this, but am encountering errors for "unusual" cases.

Suppose I have the following data

df <- data.frame(c1 = c("5.5K", "2M", "3.1", "M"))

And I want to obtain a data frame with columns

data.frame(c2 = c("5.5", "2", "3.1", NA),
c3 = c("K", "M", NA, "M))

So far I have been using tidyr::separate

df %>%
separate(c1, into =c("c2", "c3"), sep = "(?<=[0-9])(?=[A-Za-z])")

But this only works for the first three cases. I realize this is because ?<=... and ?=... require the presence of the regex. How would one modify this code to capture the cases where the numbers are missing before the letters? Been trying to use the extract function too, but without success.

Edit: I suppose one solution is to break this up into

df$col2 <- as.numeric(str_extract(df$col1, "[0-9]+"))
df$col3 <- (str_extract(df$col1, "[aA-zZ]+"))

But I was curious whether were other ways to handle it.


extract(df, c1, into =c("c2", "c3"), "([\\.\\d]*)([a-zA-Z]*)")
#    c2 c3
# 1 5.5  K
# 2   2  M
# 3 3.1   
# 4      M

You can use seperate simply in this way, but there should be a more elegant method..

df %>% separate(c1, into =c("c2", "c3"), sep = "(?=[A-Za-z])")
#    c2   c3
# 1 5.5    K
# 2   2    M
# 3 3.1 <NA>
# 4        M

Split strings at delimiters - MATLAB split, Split names in a string array at whitespace characters. Then reorder the strings Split String Array with Missing Data Between Delimiters. View MATLAB positive integer. Dimension along which to split strings, specified as a positive integer. Approach : The idea is to take a substring from index 0 to any index i (i starting from 1) of the numeric string and convert it to long data type. Add 1 to it and convert the increased number back to string. Check if the next occurring substring is equal to the increased one.


We can use base R sub to remove characters and numbers respectively to get different columns.

df$c2 <- sub("[A-Za-z]+", "", df$c1)
df$c3 <- sub("\\d*\\.?\\d*", "", df$c1)

df
#    c1  c2 c3
#1 5.5K 5.5  K
#2   2M   2  M
#3  3.1 3.1   
#4    M      M

You can remove c1 column if not needed later by doing df$c1 <- NULL.

Separate a character column into multiple columns with a regular , If numeric, sep is interpreted as character positions to split at. Positive values start at 1 at the far-left of the string; negative value start at -1 at the far-right of the string​. The length of sep "right": fill with missing values on the right. "left": fill with  The tutorial explains how to split cells in Excel using formulas and the Split Text feature. You will learn how to separate text by comma, space or any other delimiter, and how to split strings into text and numbers. Splitting text from one cell into several cells is the task all Excel users are dealing with once in a while.


You can also use regex grouping \1 and \2. This is very similar to and adapted from @Ronak Shah's answer but with regex grouping

# data
df <- data.frame(c1 = c("5.5K", "2M", "3.1", "M"))

# keep only numeric
df$c2 <- sub("(\\d*\\.?\\d*)([A-Za-z]*)", "\\1", df$c1)

# keep only alphabets
df$c3 <- sub("(\\d*\\.?\\d*)([A-Za-z]*)", "\\2", df$c1)
df[df == ""] = NA

df
#>     c1   c2   c3
#> 1 5.5K  5.5    K
#> 2   2M    2    M
#> 3  3.1  3.1 <NA>
#> 4    M <NA>    M

Created on 2019-04-16 by the reprex package (v0.2.1)

Find the missing number in a string of numbers with no separator , Given a string consisting of some numbers, not separated by any separator. if (​n == -1). break ;. // To store missing number of current length. int missingNo = -1;. Splitting a String into Substrings using a String Delimiter Splitting Strings using a Script Block Delimiter. Using a ScriptBlock as the delimiter enables the -Split operator to perform custom or complex splitting of strings. In the previous examples, the delimiter character or string is used to split the strings.


We can use extract from tidyr

library(tidyr)
extract(df, c1, into = c("c2", "c3"), "^([0-9.]*)([A-Z]*)",
        convert = TRUE, remove = FALSE)
#    c1  c2 c3
#1 5.5K 5.5  K
#2   2M 2.0  M
#3  3.1 3.1   
#4    M  NA  M

Or with read.csv from base R

read.csv(text= sub("^([0-9.]*)", "\\1,", df$c1), 
   header = FALSE, stringsAsFactors = FALSE, col.names = c("c2", "c3"))

[PDF] Split, convert nonnumeric strings to missing values float generate parse strings (by default, blank spaces), so that new string variables are generated. Thus split is limit(#) specifies an upper limit to the number of new variables to be created. By default, or when Limit equals -1, the Split function splits the input string at every occurrence of the delimiter string, and returns the substrings in an array. When the Limit parameter is greater than zero, the Split function splits the string at the first Limit -1 occurrences of the delimiter, and returns an array with the resulting substrings.


You could use the package unglue :

df <- data.frame(c1 = c("5.5K", "2M", "3.1", "M"))

library(unglue)
unglue_unnest(df, c1, "{c2}{c3=\\D*}", convert = TRUE)
#>    c2 c3
#> 1 5.5  K
#> 2 2.0  M
#> 3 3.1   
#> 4  NA  M

[PDF] Handling and Processing Strings in R, This ebook aims to help you get started with manipulating strings in R. 6.4.11 String splitting with str split fixed() . good number of posts related with handling characters and text, and they can give R fills this gap with missing values NA. There are many ways to split a string in Java. The most common way is using the split() method which is used to split a string into an array of sub-strings and returns the new array. 1. Using String.split The string split() method breaks a given string around matches of the given regular expression. There are two variants of split() method in Java:


str_split_by_length, Splits a string or strings into an array of strings given a length, or an array of lengths. function str_split_by_length ( string_val [*] : string, length_val [*] : integer ) of the input string, then missing values are is assigned to the sub-​strings beyond  On this page: .split(), .join(), and list(). Splitting a Sentence into Words: .split() Below, mary is a single string. Even though it is a sentence, the words are not represented as discreet units. For that, you need a different data type: a list of strings where each string corresponds to a word. .split() is the method to use:


pandas.Series.str.split, Split strings around given separator/delimiter. Split each string in the caller's values by given pattern, propagating NaN values. Limit number of splits in output. Sometimes, we have a string, which is composed of text and number (or vice-versa), without any specific distinction between the two. There might be a requirement in which we require to separate the text from the number. Let’s discuss certain ways in which this can be performed. Method #1 : Using re.compile () + re.match () + re.groups ()


Working with text data, Currently, the performance of object dtype arrays of strings and arrays. Missing values in a StringArray will propagate in comparison operations, rather than always comparing unequal like It is also possible to limit the number of splits:. A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.