Python, how to print Japanese, Korean, Chinese strings

python japanese
python print utf-8
python decode chinese characters
python chinese encoding
python print unicode variable
python utf8
python encoding
python write japanese to file

In Python, for Japanese, Chinese, and Korean,Python can not print the correct strings, for example hello in Japanese, Korean and Chinese are:

こんにちは
안녕하세요
你好

And print these strings:

In [1]: f = open('test.txt')

In [2]: for _line in f.readlines():
   ...:     print(_line)
   ...:     
こんにちは

안녕하세요

你好


In [3]: f = open('test.txt')

In [4]: print(f.readlines())
[ '\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf\n', '\xec\x95\x88\xeb\x85\x95\xed\x95\x98\xec\x84\xb8\xec\x9a\x94\n', '\xe4\xbd\xa0\xe5\xa5\xbd\n']

In [5]: a = '你好'

In [6]: a
Out[6]: '\xe4\xbd\xa0\xe5\xa5\xbd'

My Python version is 2.7.11 and OS is Ubuntu 14.04

How to handle these '\xe4\xbd\xa0\xe5\xa5\xbd\n' strings.

Thanks!


First you need to read the text as unicode

import codecs
f = codecs.open('test.txt','r','utf-8')

Second

When you print you should encode it like this

unicodeText.encode('utf-8')

Third

you should insure that your console support unicode display

Use

print sys.getdefaultencoding()

if it doesn't try

reload(sys)
sys.setdefaultencoding('utf-8')

python: Python, how to print Japanese, Korean, Chinese strings, In Python, for Japanese, Chinese, and Korean,Python can not print the correct strings, for example hello in Japanese, Korean and Chinese are:  ***Additional Chinese, Japanese, Korean, and Vietnamese characters, plus more symbols and emojis Note : In the interest of not losing sight of the big picture, there is an additional set of technical features of UTF-8 that aren’t covered here because they are rarely visible to a Python user.


What you see is the difference between

  1. Printing a string
  2. Printing a list

Or more generally, the difference between an objects "informal" and "official" string representation (see documentation).

In the first case, the unicode string will be printed correctly, as you would expect, with the unicode characters.

In the second case, the items of the list will be printed using their representation and not their string value.

for line in f.readlines():
    print line

is the first (good) case, and

print f.readlines()

is the second case.

You can check the difference by this example:

 a = u'ð€œłĸªßð'
 print a
 print a.__repr__()
 l = [a, a]
 print l

This shows the difference between the special __str__() and __repr__() methods which you can play with yourself.

class Person(object):
    def __init__(self, name):
        self.name = name
    def __str__(self):
        return self.name
    def __repr__(self):
        return '<Person name={}>'.format(self.name)

p = Person('Donald')
print p  #  Prints 'Donald' using __str__
p # On the command line, prints '<Person name=Donald>' using __repr__

I.e., the value you see when simply typing an object name on the console is defined by __repr__ while what you see when you use print is defined by __str__.

Detecting Chinese Characters in Unicode Strings - Veritable, CJK in Unicode and Python 3 implementation. Ceshine Lee It is a commonly used acronym for “Chinese, Japanese, and Korean”. The term  The default encoding for Python 2 files is ASCII, so by declaring an encoding you make it possible to use Japanese directly. Use byte string literals, ready encoded. Encode the codepoints by some other means and include them in your byte string literals.


My python version 2.7.11 and operating system is Mac OSX,I write

こんにちは
안녕하세요
你好

to test.txt. My program is :

# -*-coding:utf-8-*-

import json


if __name__ == '__main__':
    f = open("./test.txt", "r")
    a = f.readlines()
    print json.dumps(a, ensure_ascii=False)
    f.close()

run the program, result:

["こんにちは\n", "안녕하세요\n", "你好"]

Unicode & Character Encodings in Python: A Painless Guide – Real , Python's Built-In Functions; Python String Literals: Ways to Skin a Cat; Other If you don't see a character here, then you simply can't express it as printed text ***Additional Chinese, Japanese, Korean, and Vietnamese characters, plus more​  Detecting Chinese Characters in Unicode Strings. It is a commonly used acronym for “Chinese, Japanese, and Korean”. The term “CJK character” generally refers to “Chinese characters


Try this:

import codecs

fp = codecs.open('test.txt', encoding='utf-8')

for line in fp:
    print line

Unicode for non-English characters, Python's strings are in unicode, which allows for characters to be from a much larger ideographic characters used in Chinese, Japanese, and Korean alphabets. If you issue a print statement on a string containing other characters, it may  Strings in Python can be a bit frustrating too, and I want to share some more details of how strings are handled, so that you're aware of them. In Python 3 strings are Unicode based. In early computing characters of strings were limited to one of 256 different values.


Maya Python for Games and Film: A Complete Reference for the Maya , Strings. Python supports what are called raw strings. A raw string, though still a string languages (e.g., Roman characters, Korean, Japanese, Chinese, Arabic)​. Their type is Unicode. var = u'-5'; print(type(var)); You can include escape  4.9.2 Standard Encodings Python comes with a number of codecs built-in, either implemented as C functions or with dictionaries as mapping tables. The following table lists the codecs by name, together with a few common aliases, and the languages for which the encoding is likely used.


Using Python with Chinese, Japanese and Korean, even the large character sets of Chinese-Japanese- Korean (CJK) alphabets. On the face of it, it is a happy situation that Python supports Unicode — it brings unichr(969) >» unicodedata. name (alef ) >» print alef Traceback (most recent  Chinese Version. translation is a python translation package based on website service. It provids google, youdao, baidu, iciba translation service.


Text Processing in Python, If the replacement is a string, the encoder will encode the replacement. The joined output of calls to the encode()/decode() method is the same as if all the single iso2022jp-2, iso-2022-jp-2, Japanese, Korean, Simplified Chinese, Western  On the other hand transliteration (i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system) of languages like Chinese, Japanese or Korean is a very complex issue and this library does not even attempt to address it.