Process escape sequences in a string in Python

python 3 escape characters
escape sequence interpretation in python
escape sequence in python with examples
python escape backslash
python raw string
python backslash in string
remove escape characters from string python
which statements prevent the escape sequence interpretation in python

Sometimes when I get input from a file or the user, I get a string with escape sequences in it. I would like to process the escape sequences in the same way that Python processes escape sequences in string literals.

For example, let's say myString is defined as:

>>> myString = "spam\\neggs"
>>> print(myString)
spam\neggs

I want a function (I'll call it process) that does this:

>>> print(process(myString))
spam
eggs

It's important that the function can process all of the escape sequences in Python (listed in a table in the link above).

Does Python have a function to do this?


The correct thing to do is use the 'string-escape' code to decode the string.

>>> myString = "spam\\neggs"
>>> decoded_string = bytes(myString, "utf-8").decode("unicode_escape") # python3 
>>> decoded_string = myString.decode('string_escape') # python2
>>> print(decoded_string)
spam
eggs

Don't use the AST or eval. Using the string codecs is much safer.

Python 2.7 Tutorial, There are two ways to go about unescaping backslash escaped strings in Python​. First is using literal_eval to evaluate the string. Note that in  In particular, we can make sure only to apply it to valid Python escape sequences, which are guaranteed to be ASCII text. The plan is, we’ll find escape sequences using a regular expression, and use a function as the argument to re.sub to replace them with their unescaped value.


unicode_escape doesn't work in general

It turns out that the string_escape or unicode_escape solution does not work in general -- particularly, it doesn't work in the presence of actual Unicode.

If you can be sure that every non-ASCII character will be escaped (and remember, anything beyond the first 128 characters is non-ASCII), unicode_escape will do the right thing for you. But if there are any literal non-ASCII characters already in your string, things will go wrong.

unicode_escape is fundamentally designed to convert bytes into Unicode text. But in many places -- for example, Python source code -- the source data is already Unicode text.

The only way this can work correctly is if you encode the text into bytes first. UTF-8 is the sensible encoding for all text, so that should work, right?

The following examples are in Python 3, so that the string literals are cleaner, but the same problem exists with slightly different manifestations on both Python 2 and 3.

>>> s = 'naïve \\t test'
>>> print(s.encode('utf-8').decode('unicode_escape'))
naïve   test

Well, that's wrong.

The new recommended way to use codecs that decode text into text is to call codecs.decode directly. Does that help?

>>> import codecs
>>> print(codecs.decode(s, 'unicode_escape'))
naïve   test

Not at all. (Also, the above is a UnicodeError on Python 2.)

The unicode_escape codec, despite its name, turns out to assume that all non-ASCII bytes are in the Latin-1 (ISO-8859-1) encoding. So you would have to do it like this:

>>> print(s.encode('latin-1').decode('unicode_escape'))
naïve    test

But that's terrible. This limits you to the 256 Latin-1 characters, as if Unicode had never been invented at all!

>>> print('Ernő \\t Rubik'.encode('latin-1').decode('unicode_escape'))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0151'
in position 3: ordinal not in range(256)
Adding a regular expression to solve the problem

(Surprisingly, we do not now have two problems.)

What we need to do is only apply the unicode_escape decoder to things that we are certain to be ASCII text. In particular, we can make sure only to apply it to valid Python escape sequences, which are guaranteed to be ASCII text.

The plan is, we'll find escape sequences using a regular expression, and use a function as the argument to re.sub to replace them with their unescaped value.

import re
import codecs

ESCAPE_SEQUENCE_RE = re.compile(r'''
    ( \\U........      # 8-digit hex escapes
    | \\u....          # 4-digit hex escapes
    | \\x..            # 2-digit hex escapes
    | \\[0-7]{1,3}     # Octal escapes
    | \\N\{[^}]+\}     # Unicode characters by name
    | \\[\\'"abfnrtv]  # Single-character escapes
    )''', re.UNICODE | re.VERBOSE)

def decode_escapes(s):
    def decode_match(match):
        return codecs.decode(match.group(0), 'unicode-escape')

    return ESCAPE_SEQUENCE_RE.sub(decode_match, s)

And with that:

>>> print(decode_escapes('Ernő \\t Rubik'))
Ernő     Rubik

Python Strings decode() method, Hello World! Some escape sequences are only recognized in string literals. These are: Escape Sequence, Description. \  This conversion of character to a number is called encoding, and the reverse process is decoding. ASCII and Unicode are some of the popular encoding used. In Python, string is a sequence of Unicode character. Unicode was introduced to include every character in all languages and bring uniformity in encoding.


What does "\r" do in the following script?, In a string literal, hexadecimal and octal escapes denote the byte with the given value; Unless an 'r' or 'R' prefix is present, escape sequences in strings are  The escape sequence starts with a backslash (\) character then followed by a character. This will be interpreted by Python as a special character.


The ast.literal_eval function comes close, but it will expect the string to be properly quoted first.

Of course Python's interpretation of backslash escapes depends on how the string is quoted ("" vs r"" vs u"", triple quotes, etc) so you may want to wrap the user input in suitable quotes and pass to literal_eval. Wrapping it in quotes will also prevent literal_eval from returning a number, tuple, dictionary, etc.

Things still might get tricky if the user types unquoted quotes of the type you intend to wrap around the string.

How to process escape sequences in a string in Python?, Python code to demonstrate escape character. # string. ch = "I\nLove\​tGeeksforgeeks". print ( "The string after resolving escape character is : " ). print (​ch)  List of escape sequences available in Python 3. Some escape sequences are only recognized in string literals.


This is a bad way of doing it, but it worked for me when trying to interpret escaped octals passed in a string argument.

input_string = eval('b"' + sys.argv[1] + '"')

It's worth mentioning that there is a difference between eval and ast.literal_eval (eval being way more unsafe). See Using python's eval() vs. ast.literal_eval()?

Python 3 Escape Sequences, Python string literals come in many different forms, but the main ones look But strings can also contain escape sequences, such as '\n' for In general, the consensus in the thread seems to be to slow down the process of  Python | Ignoring escape sequences in the string Here, we are going to learn how to ignore escape sequence in python programming language and print the actual assigned value ? Submitted by IncludeHelp , on November 28, 2018


Escape Characters, The backslash ( \ ) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. String​  In Python, a tab inserts 8 spaces into the string. To add a tab into a string, we use the tab escape character (\t) in the string. You can see that in the output string, there is 8 spaces at the beginning. Lastly, we want to insert a line break into a string.


Ways to print escape characters in Python, In source files and strings, any of the standard platform line termination sequences Unless an 'r' or 'R' prefix is present, escape sequences in string and bytes  Printing in HEX with the use of Escape Sequences: This is Geeks in HEX Printing Raw String in HEX Format: This is \x47\x65\x65\x6b\x73 in \x48\x45\x58 Formatting of Strings Strings in Python can be formatted with the use of format() method which is very versatile and powerful tool for formatting of Strings.


Escape sequences in Python strings [LWN.net], You have some string input with some specical characters escaped using syntax rules resemble Use Python's builtin codecs to decode them efficiently. This procedure is shown in the second algorithm of the recipe. String literals may optionally be prefixed with a letter `r' or `R'; such strings are called raw strings and use different rules for backslash escape sequences. In triple-quoted strings, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the string.