Python regex string escaping for re.sub replace argument?

re.sub python
python regex tester
re.sub python 3
python regex cheat sheet
python re sub capture group
python regex extract
regular expression in python for beginners
python regex multiple patterns

Using re module it's possible to use escaping for the replace pattern. eg:

def my_replace(string, src, dst):
    import re
    return re.sub(re.escape(src), dst, string)

While this works for the most-part, the dst string may include "\\9" for example.

This causes an issue:

  • \\1, \\2 ... etc in dst, literals will be interpreted as groups.
  • using re.escape(dst) causes . to be changed to \..

Is there a way to escape the destination without introducing redundant character escaping?


Example usage:

>>> my_replace("My Foo", "Foo", "Bar")
'My Bar'

So far, so good.


>>> my_replace("My Foo", "Foo", "Bar\\Baz")
...
re.error: bad escape \B at position 3

This tries to interpret \B as having a special meaning.


>>> my_replace("My Foo", "Foo", re.escape("Bar\\Baz"))
'My Bar\\Baz'

Works!


>>> my_replace("My Foo", "Foo", re.escape("Bar\\Baz."))
'My Bar\\Baz\\.'

The . gets escaped when we don't want that.


While in this case str.replace can be used, the question about destination string remains useful since there may be times we want to use other features of re.sub such as the ability to ignore case.

In this case only the back-slash is interpreted as a special character, so instead of re.escape, you can use a simple replacement on in destination argument.

def my_replace(string, src, dst):
    import re
    return re.sub(re.escape(src), dst.replace(r"\", r"\\"), string)

re — Regular expression operations, similarly, when asking for a substitution, the replacement string must be of the same type as A regular expression (or RE) specifies a set of strings that matches it; the functions in Either escapes special characters (permitting you to match characters like '*' , '?' , and so in a string passed to the repl argument of re.sub(). To replace a string in Python using regex (regular expression), we can use the regex sub () method. If you use the str.replace () method, the new string will be replaced if they match the old string entirely. If you want to replace the string that matches the regular expression instead of a perfect match, use the sub () method of the re module.

You could resort to split:

haystack = r"some text with stu\ff to replace"
needle = r"stu\ff"
replacement = r"foo.bar"

result = replacement.join(re.split(re.escape(needle), haystack))
print(result)

This should also work with needle at the beginning or end of haystack.

7.2. re — Regular expression operations, A regular expression (or RE) specifies a set of strings that matches it; the functions in this Either escapes special characters (permitting you to match characters like '*' , '?' , and so forth), in a string passed to the repl argument of re.sub(). Replace with regular expression: re.sub (), re.subn () If you use replace () or translate (), they will be replaced if they completely match the old string. If you want to replace a string that matches a regular expression instead of perfect match, use the sub () of the re module.

Your code works fine, if you would just remove that re.escape, which I'm not sure why we would have that:

Test 1
import re 

def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = 'abbbbbb'
src = r'(ab)b+'
dst = r'\1z'

print(my_replace(src, dst, string))
Output 1
abz
Test 2
import re


def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = re.escape("abbbbbbBar\\Baz")
src = r'(ab)b+'
dst = r'\1z'

print(my_replace(src, dst, string))
Output 2
abzBar\Baz
Test 3
import re


def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = re.escape("abbbbbbBar\\Baz")
src = r'(ab)b+'
dst = r'\1' + re.escape('\\z')

print(my_replace(src, dst, string))
Output 3
ab\zBar\\Baz
Test 4

To construct the dst, we have to first know if we'd be replacing our string with any capturing groups such as \1 in this case. We cannot re.escape \1, otherwise we would replace our string with \\1, we have to construct the replacement, if there are capturing groups, then append it to any other part that requires re.escaping.

import re


def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = re.escape("abbbbbbBar\\Baz")
src = r'(ab)b+'
dst = r'\1' + re.escape('\9z')

print(my_replace(src, dst, string))
Output 4
ab\9zBar\\Baz

7.2. re — Regular expression operations, A regular expression (or RE) specifies a set of strings that matches it; the Either escapes special characters (permitting you to match characters like '*', '?' the regular expression, instead of passing a flag argument to the re.compile() function. regular expression itself (using (?P=id)) and replacement text given to .sub()  The re.sub() method performs global search and global replace on the given string. It is used for substituting a specific pattern in the string. There are in total 5 arguments of this function. Syntax: re.sub(pattern, repl, string, count=0, flags=0) Parameters: pattern – the pattern which is to be searched and substituted

Regular Expression HOWTO, For example, the regular expression test will match the string test exactly. It's also used to escape all the metacharacters so you can still match them in re.​compile() also accepts an optional flags argument, used to enable various sub​(). Find all substrings where the RE matches, and replace them with a different string. re.sub: search and replace: re.sub(r'pat', f, s) function f with re.Match object as argument: re.escape: automatically escape all metacharacters: re.split: split a string based on RE: re.findall: returns all the matches as a list if 1 capture group is used, only its matches are returned 1+, each element will be tuple of capture groups: re.finditer

6.2. re — Regular expression operations, A regular expression (or RE) specifies a set of strings that matches it; the functions in Either escapes special characters (permitting you to match characters like '*', '?', and so in a string passed to the repl argument of re.sub() Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern. A Regular Expression (RegEx) is a sequence of characters that defines a search pattern.For example, ^as$ The above code defines a RegEx pattern. The pattern is: any five letter string starting with a and ending with s.

Replace strings in Python (replace, translate, re.sub, re.subn), Replace strings in Python (replace, translate, re.sub, re.subn) In re.sub() , specify a regular expression pattern in the first argument, a new string It is necessary to escape \ like \\1 if it is a normal string surrounded by '' or ""  If you’re not using a raw string to express the pattern, remember that Python also uses the backslash as an escape sequence in string literals; if the escape sequence isn’t recognized by Python’s parser, the backslash and subsequent character are included in the resulting string.

Comments
  • I'm not sure I understand the issue - could you give an example string, src, dst which demonstrates it?
  • Looks like what you really want is src.replace(r'\', r'\\') as you don't seem to want . be replaced.
  • @metatoaster Do you meant dst ? - if this avoids all possible interpretations, then yes.
  • @ideasman42 yes. If you only want just this character this would be a way. If you want multiple modifications from this subset, using str.translate may be more desirable. Best approach is to create a number of test cases (add them to your unit test module) to formalise the problem you are trying to solve.
  • @ideasman42 Did you get a solution to this without replacing the dst variable. In my case the capture groups are being treated as literals without the re.escape()
  • Escape is needed because I don't have control over the arguments. they may contain special characters which need to be interpreted as literals.
  • Test 2, is interpreting the destination, try: dst = r'\9z'