How to replace strings with the matching string from a list?

replace string in list python
python replace character in string at index
python replace list of words in string
python string replace multiple occurrences
python list replace
replace string from list pandas
python replace multiple characters in string
python replace multiple items in list

Say I have a column df1$z with some "dirty" strings within

> df1$z
 [1] alpha uybkh   kilo-mdjfyrs  lima qxaucnpe gamma-qpnej  
 [5] beta-okmwy    beta-uybkh    gamma mdjfyrs lima qxaucnpe
 [9] beta qpnej    kilo okmwy   
9 Levels: alpha uybkh beta-okmwy beta-uybkh ... lima qxaucnpe

Some of the strings begin with patterns which are included in another vector a.

> a
[1] "alpha" "beta"  "gamma"

These a-matching strings in z I want to replace with the corresponding pattern of vector a so that the following results:

# [1] "alpha"         "kilo-mdjfyrs"  "lima qxaucnpe" "gamma"        
# [5] "beta"          "beta"          "gamma"         "lima qxaucnpe"
# [9] "beta"          "kilo okmwy" 

I wrote a function that brought me close, but it replaces the strings not at once, and I couldn't manage to put things together:

> lapply(seq_along(a), function(x) {z[grep(paste0("^", a[x]), z)] <- a[x]; z})
[[1]]
 [1] "beta sfrmyijl" "lima-xudwfkm"  "lima-kirvpys"  "gamma wriygcb"
 [5] "alpha"         "alpha"         "kilo xudwfkm"  "alpha"        
 [9] "gamma wriygcb" "kilo-wvxgar"  

[[2]]
 [1] "beta"           "lima-xudwfkm"   "lima-kirvpys"   "gamma wriygcb" 
 [5] "alpha wvxgar"   "alpha-sfrmyijl" "kilo xudwfkm"   "alpha-kirvpys" 
 [9] "gamma wriygcb"  "kilo-wvxgar"   

[[3]]
 [1] "beta sfrmyijl"  "lima-xudwfkm"   "lima-kirvpys"   "gamma"         
 [5] "alpha wvxgar"   "alpha-sfrmyijl" "kilo xudwfkm"   "alpha-kirvpys" 
 [9] "gamma"          "kilo-wvxgar"   

I also failed with some mapply() approaches which I think could be helpful here, and looked into some existing answers like this one which I couldn't adapt to my specific problem though.

So how would I do this in an efficient base R way? Note that the replacement should be put back into the data frame df1 without disturbing the order of the rows.

Data
a <- c("alpha", "beta", "gamma")
set.seed(105056)
z <- paste0(sample(c(a, "kilo", "lima"), 10, replace=TRUE), 
            sample(c("-", " "), 10, replace=TRUE), 
            replicate(5, paste0(sample(letters, sample(5:9)), collapse="")))
df1 <- data.frame(z, x=rnorm(10))

You may use the following sub solution:

> sub(paste0(".*\\b(",paste(a, collapse="|"),")\\b.*"), "\\1", df1$z)
 [1] "alpha"         "kilo-mdjfyrs"  "lima qxaucnpe" "gamma"         "beta"         
 [6] "beta"          "gamma"         "lima qxaucnpe" "beta"          "kilo okmwy"

The pattern will match any chars before and after the keyword in your a vector, and will capture the keyword into Group 1 while \1 replacement pattern will only keep the found keyword and discard all text before and after it. If there is no match, there won't be any change.

See the regex demo.

Python String, How do you replace a substring in another string in Python? Now what i want is compare the string1 with all the strings in the list and return the nearest match for the string1 (i.e harish in example) I tried LevenshteinDistance algorithm but it is different we need to pass two strings and it will return how many characters changed, but in my case it is totally different.

We could use sub. Create a pattern with paste after making a single string from 'a', then use it to capture the pattern with backreference (\\1) in the replacement

sub(paste0(".*\\b(", paste(a, collapse="|"), ")\\b.*"), "\\1", df1$z)
#[1] "alpha"         "kilo-mdjfyrs"  "lima qxaucnpe" "gamma"         "beta"          "beta"          "gamma"        
#[8] "lima qxaucnpe" "beta"          "kilo okmwy"   

NOTE: sub solution was posted first here


Or using str_replace from stringr

library(tidyverse)
df1 %>% 
  mutate(z = str_replace(z, 
      paste0(".*\\b(", paste(a, collapse="|"), ")\\b.*"), "\\1"))
#           z           x
#1          alpha -0.18973111
#2   kilo-mdjfyrs -0.88150363
#3  lima qxaucnpe  0.01665189
#4          gamma  0.62647841
#5           beta -0.29526632
#6           beta  0.42480082
#7          gamma  1.03653486
#8  lima qxaucnpe -1.51910745
#9           beta  1.21504343
#10    kilo okmwy  1.25321421

Replace strings in Python (replace, translate, re.sub, re.subn), How do you replace multiple values in a string in python? Method #2 : Using filter() + lambda This function can also perform this task of finding the strings with the help of lambda. It just filters out all the strings matching the particular substring and then adds it in a new list.

Here's a somewhat longer but less opaque solution, using ifelse and grepl:

df1$z <- ifelse(grepl("alpha.*", df1$z), a[1],
            ifelse(grepl("beta.*", df1$z), a[2],
                   ifelse(grepl("gamma.*", df1$z), a[3], as.character(df1$z))))
df1
               z           x
1          alpha -0.18973111
2   kilo-mdjfyrs -0.88150363
3  lima qxaucnpe  0.01665189
4          gamma  0.62647841
5           beta -0.29526632
6           beta  0.42480082
7          gamma  1.03653486
8  lima qxaucnpe -1.51910745
9           beta  1.21504343
10    kilo okmwy  1.25321421

Replacing a character from a certain index, () in order, so if the first new contains the following old , the first new is also replaced. In SQL Server, you can use the T-SQL REPLACE() function to replace all instances of a given string with another string. For example, you can replace all occurrences of a certain word with another word. Syntax. Here’s the official syntax: REPLACE ( string_expression , string_pattern , string_replacement )

Replace Strings with specific list of words, How do you change the index of a string in Python? I’ve highlighted the matching words in column A red. What we want Excel to do is to check the text string in column A to see if any of the words in our list in H1:H3 are present, if they are then return the matching word.

How to replace a string in a list in Python, in input string and replace them if they match public static String replaceRanges(String text, List<Replacement> replacements) { StringBuilder  To perform multiple replacements in each element of string, pass a named vector (c (pattern1 = replacement1)) to str_replace_all. Alternatively, pass a function to replacement: it will be called once for each match and its return value will be used to replace the match. To replace the complete string with NA, use replacement = NA_character_.

Replace strings in a file based on a list of strings and a list of , Use a for-loop to iterate over each element in the list. Call str.replace(old, new) to replace old with new in each string str . Append the resultant strings to a new  Complex string replace - multiple files, multiple different strings, must include certain text Hot Network Questions 70's/80's children's book with a magic paint brush

Comments
  • Try gsub(paste0(".*\\b(",paste(a, collapse="|"),")\\b.*"), "\\1", df1$z)