How do I get rid of the b-prefix in a string in python?

python bytes to string
python 3 string decode
python decode
what is unicode in python
python get rid b
python b string to unicode
change b to string python
python encode

A bunch of the tweets I am importing are having this issue where they read

b'I posted a new photo to Facebook'

I gather the b indicates it is a byte. But this is proving problematic because in my CSV files that I end up writing, the b doesn't go away and is interferring in future code.

Is there a simple way to remove this b prefix from my lines of text?

Keep in mind, I seem to need to have the text encoded in utf-8 or tweepy has trouble pulling them from the web.


Here's the link content I'm analyzing:

https://www.dropbox.com/s/sjmsbuhrghj7abt/new_tweets.txt?dl=0

new_tweets = 'content in the link'
Code Attempt
outtweets = [[tweet.text.encode("utf-8").decode("utf-8")] for tweet in new_tweets]
print(outtweets)
Error
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-21-6019064596bf> in <module>()
      1 for screen_name in user_list:
----> 2     get_all_tweets(screen_name,"instance file")

<ipython-input-19-e473b4771186> in get_all_tweets(screen_name, mode)
     99             with open(os.path.join(save_location,'%s.instance' % screen_name), 'w') as f:
    100                 writer = csv.writer(f)
--> 101                 writer.writerows(outtweets)
    102         else:
    103             with open(os.path.join(save_location,'%s.csv' % screen_name), 'w') as f:

C:\Users\Stan Shunpike\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20 
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 64-65: character maps to <undefined>

you need to decode the bytes of you want a string:

b = b'1234'
print(b.decode('utf-8'))  # '1234'

What does the 'b' character do in front of a string literal?, 3 (e.g. when code is automatically converted with 2to3). They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes. You need to decode it to convert it to a string. Check the answer here about bytes literal in python3. In [1]: b'I posted a new photo to Facebook'.decode('utf-8') Out[1]: 'I posted a new photo to Facebook'


It is just letting you know that the object you are printing is not a string, rather a byte object as a byte literal. People explain this in incomplete ways, so here is my take.

Consider creating a byte object by typing a byte literal (literally defining a byte object without actually using a byte object e.g. by typing b'') and converting it into a string object encoded in utf-8. (Note that converting here means decoding)

byte_object= b"test" # byte object by literally typing characters
print(byte_object) # Prints b'test'
print(byte_object.decode('utf8')) # Prints "test" without quotations

You see that we simply apply the .decode(utf8) function.

Bytes in Python

https://docs.python.org/3.3/library/stdtypes.html#bytes

String literals are described by the following lexical definitions:

https://docs.python.org/3.3/reference/lexical_analysis.html#string-and-bytes-literals

stringliteral   ::=  [stringprefix](shortstring | longstring)
stringprefix    ::=  "r" | "u" | "R" | "U"
shortstring     ::=  "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring      ::=  "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem ::=  shortstringchar | stringescapeseq
longstringitem  ::=  longstringchar | stringescapeseq
shortstringchar ::=  <any source character except "\" or newline or the quote>
longstringchar  ::=  <any source character except "\">
stringescapeseq ::=  "\" <any source character>

bytesliteral   ::=  bytesprefix(shortbytes | longbytes)
bytesprefix    ::=  "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
shortbytes     ::=  "'" shortbytesitem* "'" | '"' shortbytesitem* '"'
longbytes      ::=  "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""'
shortbytesitem ::=  shortbyteschar | bytesescapeseq
longbytesitem  ::=  longbyteschar | bytesescapeseq
shortbyteschar ::=  <any ASCII character except "\" or newline or the quote>
longbyteschar  ::=  <any ASCII character except "\">
bytesescapeseq ::=  "\" <any ASCII character>

What does a b prefix before a python string mean?, 3 source code, the expression creates a bytes object, not a regular Unicode str object. bytes objects are immutable, just like str strings are. The b prefix signifies a bytes string literal. If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes object represented using this notation.


You need to decode it to convert it to a string. Check the answer here about bytes literal in python3.

In [1]: b'I posted a new photo to Facebook'.decode('utf-8')
Out[1]: 'I posted a new photo to Facebook'

Python Strings decode() method, slices point to the same address in the memory. A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix. The Python 3 documentation states: Bytes literals are always prefixed with 'b' or 'B'; they produce an instance


I got it done by only encoding the output using utf-8. Here is the code example

new_tweets = api.GetUserTimeline(screen_name = user,count=200)
result = new_tweets[0]
try: text = result.text
except: text = ''

with open(file_name, 'a', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerows(text)

i.e: do not encode when collecting data from api, encode the output (print or write) only.

How Python saves memory when storing strings, If you see a sequence in Python like: b'foo' it's technically not a “string.” That's a sequence of bytes. You can render it into a string by decoding it  The b prefix signifies a bytes string literal. If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes object represented using this notation.


On python 3.6 with django 2.0, decode on a byte literal does not works as expected. Yeah i get the right result when i print it, but the b'value' is still there even if you print it right.

This is what im encoding

uid': urlsafe_base64_encode(force_bytes(user.pk)),

This is what im decoding:

uid = force_text(urlsafe_base64_decode(uidb64))

This is what django 2.0 says :

urlsafe_base64_encode(s)[source]

Encodes a bytestring in base64 for use in URLs, stripping any trailing equal signs.

urlsafe_base64_decode(s)[source]

Decodes a base64 encoded string, adding back any trailing equal signs that might have been stripped.


This is my account_activation_email_test.html file

{% autoescape off %}
Hi {{ user.username }},

Please click on the link below to confirm your registration:

http://{{ domain }}{% url 'accounts:activate' uidb64=uid token=token %}
{% endautoescape %}

This is my console response:

Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Activate Your MySite Account From: webmaster@localhost To: testuser@yahoo.com Date: Fri, 20 Apr 2018 06:26:46 -0000 Message-ID: <152420560682.16725.4597194169307598579@Dash-U>

Hi testuser,

Please click on the link below to confirm your registration:

http://127.0.0.1:8000/activate/b'MjU'/4vi-fasdtRf2db2989413ba/

as you can see uid = b'MjU'

expected uid = MjU


test in console:

$ python
Python 3.6.4 (default, Apr  7 2018, 00:45:33) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from django.utils.http import urlsafe_base64_encode, urlsafe_base64_decode
>>> from django.utils.encoding import force_bytes, force_text
>>> var1=urlsafe_base64_encode(force_bytes(3))
>>> print(var1)
b'Mw'
>>> print(var1.decode())
Mw
>>> 

After investigating it seems like its related to python 3. My workaround was quite simple:

'uid': user.pk,

i receive it as uidb64 on my activate function:

user = User.objects.get(pk=uidb64)

and voila:

Content-Transfer-Encoding: 7bit
Subject: Activate Your MySite Account
From: webmaster@localhost
To: testuser@yahoo.com
Date: Fri, 20 Apr 2018 20:44:46 -0000
Message-ID: <152425708646.11228.13738465662759110946@Dash-U>


Hi testuser,

Please click on the link below to confirm your registration:

http://127.0.0.1:8000/activate/45/4vi-3895fbb6b74016ad1882/

now it works fine. :)

How to get rid of the b-prefix in a string in Python, Accordingly, the items you get out of a Python 3 str are Unicode characters, just bytes literals start with a b prefix. 4 This means that you can use familiar string methods like endswith , replace , strip , translate , upper , and dozens of others  To quote the Python 2.x documentation: A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix. The Python 3 documentation states:


4. Text versus Bytes - Fluent Python [Book], How to encode and decode strings in Python between Unicode, UTF-8 is by default a Unicode string), and even the 'b' prefix is preserved:  The string class has a method replace that can be used to replace substrings in a string. We can use this method to replace characters we want to remove with an empty string. If you want to remove multiple characters from a string in a single line, it's better to use regular expressions.


Encoding and Decoding Strings (in Python 3.x), This type is available as str in Python 3, and unicode in Python 2. In code, we in Python 2. Add a b or u prefix to all strings, unless a native string is desired. However, this did not turn off processing Unicode escapes ( \u. or \U.. ), as the  This program removes all punctuations from a string. We will check each character of the string using for loop. If the character is a punctuation, empty string is assigned to it. To understand this example, you should have the knowledge of the following Python programming topics: Sometimes, we may wish to break a sentence into a list of words.


Strings, In Python 3, strings are represented in Unicode. If we want to represent a byte string, we add the b prefix for string literals. Note that the early Python versions  A Python program is read by a parser. Input to the parser is a stream of tokens, generated by the lexical analyzer. This chapter describes how the lexical analyzer breaks a file into tokens. Python uses the 7-bit ASCII character set for program text. New in version 2.3: An encoding declaration can be used to indicate that string literals and