Is there any function like iconv in Python?
python encoding types
pip install libiconv
remove unicode characters python
I have some CSV files need to convert from shift-jis to utf-8.
Here is my code in PHP, which is successful transcode to readable text.
$str = utf8_decode($str); $str = iconv('shift-jis', 'utf-8'. '//TRANSLIT', $str); echo $str;
My problem is how to do same thing in Python.
I don't know PHP, but does this work :
Also I assume the CSV content is from a file. There are a few options for opening a file in python.
with open(myfile, 'rb') as fin
would be the first and you would get data as it is
with open(myfile, 'r') as fin
would be the default file opening
Also I tried on my computed with a shift-js text and the following code worked :
with open("shift.txt" , "rb") as fin : text = fin.read() text.decode('shift-jis').encode('utf-8')
result was the following in UTF-8 (without any errors)
' \xe3\x81\xa6 \xe3\x81\xa7 \xe3\x81\xa8'
Ok I validate my solution :)
The first char is indeed the good character: "\xe3\x81\xa6" means "E3 81 A6" It gives the correct result.
You can try yourself at this URL
PHP function.iconv in Python, Performs a character set conversion on the string str from in_charset to out_charset . Parameters. in_charset. The input charset. out_charset. The output charset. If Definition and Usage. The any() function returns True if any item in an iterable are true, otherwise it returns False. If the iterable object is empty, the any() function will return False.
for when pythons built-in encodings are insufficient there's an
iconv at PyPi.
pip install iconv
unfortunately the documentation is nonexistant.
pip install iconv_codecs
>>> import iconv_codecs >>> iconv_codecs.register('ansi_x3.110-1983') >>> "foo".encode('ansi_x3.110-1983')
iconv · PyPI, This package provides an iconv wrapper as well as a Python codec to convert between Unicode objects and all iconv-provided encodings. Parameters. in_charset. The input charset. out_charset. The output charset. If you append the string //TRANSLIT to out_charset transliteration is activated. . This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking charac
It would be helpful if you could post the string that you are trying to convert since this error suggest some problem with the in-data, older versions of PHP failed silently on broken input strings which makes this hard to diagnose.
According to the documentation this might also be due to differences in shift-jis dialects, try using 'shift_jisx0213' or 'shift_jis_2004' instead.
If using another dialect does not work you might get away with asking python to fail silently by using
Python and Encodings · Martin Thoma, It doesn't feel very constructive to just make Python read a file / print some output. know how to deal with other encodings, you can change the encoding like this: $ iconv -f UTF-8 -t ISO-8859-1 test.txt > test-iso-8859-1.txt $ file Make it work with Python 2 and Python 3 import sys PY3 = sys.version > "3" if Dismiss Join GitHub today. GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
How to auto detect text file encoding?, Try the chardet Python module, which is available on PyPI: As mentioned in comments it is quite slow, but some distributions also ship the original C++ version as @Xavier Didn't work for me (maybe because my files are terrible) but it looks good. echo "öasd" | iconv -t ISO-8859-1 | python -c 'import chardet,sys; print The any () method takes an iterable (list, string, dictionary etc.) in Python. When you run the program, the output will be: The any () method works in similar way for tuples and sets like lists. In case of dictionaries, if all keys (not values) are false, any () returns False. If at least one key is true, any () returns True.
iconv_codecs, iconv_codec: module to register python codecs to encode/decode any char supported If you want to force iconv usage for an encoding already supported by python, just use That will poll iconv for a list of codecs it supports and register the ones python doesn't support already. Functions, [hide private] In Python you'd use a regular expression: import re pattern = re.compile(r'^A1.8301$') matches = [x for x in yourlist if pattern.match(x)] This produces a list of elements that match your requirements. The ^ and $ anchors are needed to prevent substring matches; BA1k8301-42 should not match, for example.
iconv - Manual, Performs a character set conversion on the string str from in_charset to On some systems there may be no such function as iconv(); this is due to the following Fraction module in Python. This module provides support for rational number arithmetic. It allows to create a Fraction instance from integers, floats, numbers, decimals and strings. Fraction Instances : A Fraction instance can be constructed from a pair of integers, from another rational number, or from a string.
utf8_decodeon a Shift-JIS encoded string?! That makes no sense.
- Thanks. I tried that already, but i have no idea to solve the error. UnicodeEncodeError: 'shift_jis' codec can't encode character u'\x83' in position 191: illegal multibyte sequence If i added
text.replace(u'\x83', u'\'')UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 191: ordinal not in range(128)
- Thank you so much for helpful reply. Finally I find out the problem, which the CSV file needs utf-8 decode before shift-jis to utf-8. So
mystring.decode('shift-jis').encode('utf-8')is WORK for me. Thanks again. :)
- I realized is source file problem. Here is my solution.
text = text.decode('utf-8').encode('iso-8859-1')
text = text.decode('shift-jis').encode('utf-8')
- This answer works fine for transforming strings between encodings, but iconv can also do more than just that, for example you can use it to transliterate characters, as asked by OP. The
//TRANSLITwill cause characters that can't be represented by the target encoding to be substituted by something meaningful.