R gsub numbers and space from variables

gsub r
r gsub special characters
r replace space with underscore
gsub replace
gsub cheat sheet
gsub 1
gsub spaces
r gsub ends with

With gsub I am able to remove the # from these person variables, however the way I am trying to remove the random number is not correct. I also would like to remove the space after the persons name as well but keep the space in the middle of the name.

c('mike smith #99','John johnson #2','jeff johnson #50') -> person

c(1:99) -> numbers

person <- gsub("#", "", person, fixed=TRUE)

# MY ISSUE
person <- gsub(numbers, "", person, fixed=TRUE)

df <- data.frame(PERSON = person)

Current Results:

PERSON
mike smith 99
John johnson 2
jeff johnson 50

Expected Results:

PERSON
mike smith
John johnson
jeff johnson
c('mike smith #99','John johnson #2','jeff johnson #50') -> person
sub("\\s+#.*", "", person)
[1] "mike smith"   "John johnson" "jeff johnson"

R gsub Function Examples -- EndMemo, R gsub function examples, R gsub usage. If ignore.case is not set to True, no replace take place: > gsub("tut","ot",x). [1] "R Digits: 0 1 2 3 4 5 6 7 8 9. [:graph:]. R’s gsub() function can work with regular expressions. Here’s an example of this below, where we are going to remove all of the punctuation from a phone number. # gsub in R - regular expressions > phone <-"(206) 555 - 1212" > gsub("[[:punct:][:blank:]]","",phone) [1] "2065551212"

Here's another pattern as an alternative:

> gsub("(\\.*)\\s+#.*", "\\1", person)
[1] "mike smith"   "John johnson" "jeff johnson"

In the above regex, (\\.*) will match a subgroup of any characters before a space (\\s+) following by # symbol and following by anything. Then \\1 indicates that gsub should replace all the original string with that subgroup (\\.*)

An easier way to get your desired output is :

> gsub("\\s+#.*$", "", person)
[1] "mike smith"   "John johnson" "jeff johnson"

The above regex \\s+#.*$ indicates that everything consisting of space (\\s+), a # symbol and everyting else until the end of string (\.$) should be removed.

Using str_extract_all from stringr package

> library(stringr)
> str_extract_all(person, "[[a-z]]+", simplify = TRUE)
     [,1]   [,2]     
[1,] "mike" "smith"  
[2,] "ohn"  "johnson"
[3,] "jeff" "johnson"

Also you can use:

library(stringi)
stri_extract_all(person, regex="[[a-z]]+", simplify=TRUE)

Data Wrangling in R: Regular Expressions, Examples: # - all 5-digit numbers in a document # - all 5-digit numbers ending in 00 regex patterns: # find elements in vector beginning with 1 or more spaces variable a factor: allStocks$Stock <- factor(allStocks$Stock) head(allStocks) sub() and gsub() function in R. sub() and gsub() function in R are replacement functions, which replaces the occurrence of a substring with other substring. sub() Function in R replaces the first instance of a substring. gsub() function in R replaces all the instances of a substring.

This could alternately be done with read.table.

read.table(text = person, sep = "#", strip.white = TRUE, 
  as.is = TRUE, col.names = "PERSON")

giving:

        PERSON
1   mike smith
2 John johnson
3 jeff johnson

Regular Expressions in R, In R, many string functions in base R as well as in stringr package use regular Replace a pattern: gsub() , stringr::str_replace() , stringr::str_replace_all() specify entire classes of characters, such as numbers, letters, etc. There are [:​space:] : space characters: tab, newline, vertical tab, form feed, carriage return, space. R's base paste function is used to combine (or paste) set of strings. In machine learning, it is quite frequently used in creating / re-structuring variable names. For example, let's say, you want to use two strings (Var1 and Var2) to create a new string Var3.

We can create the pattern with paste

pat <- paste0("\\s*#(", paste(numbers, collapse = "|"), ")")
gsub(pat, "", person)
#[1] "mike smith"   "John johnson" "jeff johnson"

Note that the above solution was based on creating pattern with 'numbers'. If it is only to remove the numbers after the # including it

sub("\\s*#\\d+$", "", person)
#[1] "mike smith"   "John johnson" "jeff johnson"

Or another option is

unlist(strsplit(person, "\\s*#\\d+"))

NOTE: All the above are base R methods


library(tidyverse)
data_frame(person) %>% 
      separate(person, into = c("person", "notneeded"), "\\s+#") %>% 
      select(person)

Removing Space From Strings in R Programming, This lesson will discuss the use of removing spaces from a string. Extra spaces at the beginning, end, even in the middle; Trying to count the number of actual removes the leading and trailing spaces and assigns the output back to the variable. The gsub() function is just like sub(), except it replaces all occurrences. R gsub. gsub() function replaces all matches of a string, if the parameter is a string vector, returns a string vector of the same length and with the same attributes (after possible coercion to character). Elements of string vectors which are not substituted will be returned unchanged (including any declared encoding).

An alternative that deletes any sequence of non (lowercase) alphabetic characters at the end of the string.

gsub("[^a-z]+$", "", person)
[1] "mike smith"   "John johnson" "jeff johnson"

If you want to allow for words that are all upper case or end with an uppercase character.

gsub("[^a-zA-Z]+$", "", person)

Some names might end with .:

gsub("[^a-zA-Z.]+$", "", person)

Introduction to stringr, Whitespace tools to add, remove, and manipulate whitespace. This is now equivalent to the base R function nchar() . matrix. str_split() splits a string into a variable number of pieces and returns a list of character vectors. characters are the same are complicated, coll() is relatively slow compared to regex() and fixed() . In contrast, the computer program will tell you ' Dan' because the ASCII value for a space (32 in decimal) is lower than either the ASCII value for the letter 'A' (65 in decimal) or the ASCII value for the letter 'D' (68 in decimal). Using trimws (), sub (), or gsub () would work here.

Simple Tutorial on Regular Expressions and String Manipulations in R, Regular Expressions (a.k.a regex) are a set of pattern matching commands used to detect comprises a series of functions used to extract information from text variables. nchar(), It counts the number of characters in a string or vector. gsub(pattern = "[[:space:]]",replacement = "",x = "and going there today tomorrow") Remove space from string. Dear R users, I have some trivial query. I have a string, I want to remove space from the string. For eg. Input: a <- " Remove space " Output

multigsub function, multigsub - A wrapper for gsub that takes a vector of search terms and a vector or single sub_holder - This function holds the place for particular character values​, allowing The text variable. If TRUE and fixed = TRUE , the pattern string is sorted by number of characters to prevent substrings replacing meta strings (e.g.,​  Re: [FORGED] function for remove white space On 22/02/17 12:51, José Luis Aguilar wrote: > Hi all, > > i have a dataframe with 34 columns and 1534 observations. > > In some columns I have strings with spaces, i want remove the space.

Pattern Matching and Replacement, sub and gsub perform replacement of the first and all matches respectively. setting environment variable R_PCRE_JIT_STACK_MAXSIZE before JIT is used to a 8.34: see ?regex sub("\\s+$", "", str, perl = TRUE) ## PCRE-style white space  Trouble with passing Variable from bash to awk gsub command Would really appreciate it if someone could point out my mistake in this line of code, i've been staring blankly at it trying everything i can think of some time now and coming up with nothing.

Comments
  • Check the answer
  • I went with the gsub option as it was the first I saw, Thank you for the other answers. Could you let us know what the arguments you are referring to in the line above are?
  • I'm not sure what you're asking for with 'arguments', in each function calls arguments are explicitly written.
  • (\\.*)\\s+#.* and \\1
  • is there any need of capturing as that will slow down. why not just use sub("\\s+#.*", "", person)??
  • no need for gsub, sub is enough, in this example but also in general if you include $ in your pattern