Force encode from US-ASCII to UTF-8 (iconv)

convert ascii to utf-8 linux
linux convert file to utf-8 encoding
iconv: illegal input sequence at position
utf-8 file converter
convert ascii to utf-8 java
convert iso-8859-1 to utf-8 windows
iconv convert file encoding
text plain charset us ascii to utf 8

I'm trying to transcode a bunch of files from US-ASCII to UTF-8.

For that, I'm using iconv:

iconv -f US-ASCII -t UTF-8 file.php > file-utf8.php

Thing is my original files are US-ASCII encoded, which makes the conversion not to happen. Apparently it occurs cause ASCII is a subset of UTF-8...

http://www.linuxquestions.org/questions/linux-software-2/iconv-us-ascii-to-utf-8-or-iso-8859-15-a-705054/

And quoting:

There's no need for the textfile to appear otherwise until non-ascii characters are introduced

True. If I introduce a non-ASCII character in the file and save it, let's say with Eclipse, the file encoding (charset) is switched to UTF-8.

In my case, I'd like to force iconv to transcode the files to UTF-8 anyway. Whether there is non-ASCII characters in it or not.

Note: The reason is my PHP code (non-ASCII files...) is dealing with some non-ASCII string, which causes the strings not to be well interpreted (french):

Il était une fois... l'homme série animée mythique d'Albert

Barillé (Procidis), 1ère

...

EDIT

  • US-ASCII -- is -- a subset of UTF-8 (see Ned's answer below)
  • Meaning that US-ASCII files are actually encoded in UTF-8
  • My problem came from somewhere else

ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded. The bytes in the ASCII file and the bytes that would result from "encoding it to UTF-8" would be exactly the same bytes. There's no difference between them, so there's no need to do anything.

It looks like your problem is that the files are not actually ASCII. You need to determine what encoding they are using, and transcode them properly.

How Convert Files Between Ascii and Utf-8 – POFTUT, Convert ASCII to UTF-8. We will convert our java code by providing from and to encodings. root@ubu1:~# iconv -f us-ascii -t UTF8 main.java -o  I face the need to see file -i myfile.htm to show utf-8 instead of us-ascii (yes i known it is a subset of utf-8). So here is a one liner inspired from above answers that will convert on linux all *.htm file from us-ascii to utf-8 so file -i will show you utf-8 . You can change *.htm (2 places in the command below) to fit your need.

How to Convert Files to UTF-8 Encoding in Linux, There are various encoding schemes out there such as ASCII, ANSI, In Linux, the iconv command line tool is used to convert text from one form of Let us start by checking the encoding of the characters in the file and then  Si vous voulez vraiment le montrer dans utf-8 au lieu de nous-ascii, vous devez le faire en 2 étapes. premier : iconv -f us-ascii -t utf-16 yourfile > youfileinutf16.* seconde: iconv -f utf-16le -t utf-8 yourfileinutf16 > yourfileinutf8.* alors si vous faites un fichier -i vous verrez que le nouveau jeu de caractères est utf-8.

So people say you can't and I understand you may be frustrated when asking a question and getting such an answer.

If you really want it to show in utf-8 instead of us-ascii then you need to do it in 2 steps.

first :

iconv -f us-ascii -t utf-16 yourfile > youfileinutf16.*

second:

iconv -f utf-16le -t utf-8 yourfileinutf16 > yourfileinutf8.*

then if you do a file -i you'll see the new charset is utf-8.

Hope it helps.

iconv us-ascii to UTF-8 or ISO-8859-15, Why isn't it possible to convert us-ascii or ASCII to UTF-8 ? Or am I doing something wrong? Code: root@martin-desktop:/home/martin/test#  I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc 1. It should replace all occurrences of characters outside target character set by " " (space) or (3 Replies)

I think Ned's got the core of the problem -- your files are not actually ASCII. Try

iconv -f ISO-8859-1 -t UTF-8 file.php > file-utf8.php

I'm just guessing that you're actually using iso-8859-1, it is popular with most European languages.

Can't convert encoding from us-ascii to utf-8. What am screwing up , A file using only plain ASCII characters is indistinguishable from UTF8, as by definition the 7-bit ASCII characters are mapped 1-to-1 in UTF8. Hence your  In my rails app I'm working with RSS feeds from all around the world, and some feeds have links that are not in UTF-8. The original feed links are out of my control, and in order to use them in other parts of the app, they need to be in UTF-8. How can I detect encoding and convert to UTF-8?

There is no difference between US-ASCII and UTF-8, so no need to reconvert it. But here a little hint, if you have trouble with special-chars while recodeing.

Add //TRANSLIT after the source-charset-Parameter.

Example:

iconv -f ISO-8859-1//TRANSLIT -t UTF-8 filename.sql > utf8-filename.sql

This helps me on strange types of quotes, which are allways broke the charset reencode process.

Converting from ascii to utf-8 format - iconv not working, Then you have to specify it with iconv, e.g.: to look at the byte values of certain characters, change the used encoding on the fly etc. Should you really want file to state your ASCII file is UTF-8, which it is already anyway, you can run this meta chat tour help blog privacy policy legal contact us full site. With this tool you can easily convert UTF8 text to ASCII text, where each UTF8 character is represented by one or more simple ASCII symbols. The way it works is it breaks each UTF8 character into raw bytes and creates ASCII characters from their values. Because UTF8 is a multi-byte encoding, there can be one to four bytes per UTF8 character and

Why did this file not convert to UTF-8 when using iconv?, ASCII is always proper UTF-8, so no conversion was needed — if it was ASCII. The file utility does not look at the entire file, but only at the  You can control the behaviour of iconv_mime_encode() by specifying an associative array that contains configuration items to the optional third parameter preferences. The items supported by iconv_mime_encode() are listed below. Note that item names are treated case-sensitive.

iconv - Manual, iconv — Convert string to requested character encoding You must choose another locale otherwise all non-ASCII characters will be replaced with question​  Please note that utf8_encode only converts a string encoded in ISO-8859-1 to UTF-8. A more appropriate name for it would be "iso88591_to_utf8". If your text is not encoded in ISO-8859-1, you do not need this function.

mb_convert_encoding - Manual, 5, PHP 7). mb_convert_encoding — Convert character encoding to convert non-ascii code into html-readable stuff. Due to my $conv_str = iconv($fromCS,$​toCS.'//IGNORE' <?php $text = iconv($from_enc, 'US-ASCII//TRANSLIT', $text); ?> It is necessary to force a specific search order for the conversion to work.

Comments
  • can you remember where your problem came from? I am having a similar issue
  • @DrogoNevets Don't remember exactly but I think it has to do with working with UTF8 in PHP and to/from the DB... utf8_encode, utf8_decode, etc... Or more in depth: toptal.com/php/a-utf-8-primer-for-php-and-mysql stackoverflow.com/questions/279170/utf-8-all-the-way-through
  • To do the opposite (utf8 to ASCII), see How to remove accents and turn letters into "plain" ASCII characters?.
  • Indeed, file only looks at the first few kb of a file to produce its verdict.
  • Thanks for your feedback, I updated my answer to attempt to be more helpful. ;)
  • I added the missing links, though I was not sure if I guessed the last one correctly.
  • (Tempted to also fix the useless cat but I'll leave it to yourself.)
  • Excellent explanation. This should be the top answer. I have the exact scenario that you described here.
  • Nope. It didn't do the trick.. I've tried it but anyway, if I run $ file --mime file.php I get file.php: text/x-php charset=us-ascii... So I presume my files are actually ASCII encoded?
  • file won't inspect an entire file; try moving the strings to the top of the file, perhaps in a comment block.
  • Another option to see if you've got an ascii file is to run a script like this Ruby program: File.open("file.php").each_char {|c| puts c if c.ord > 127}. (I picked Ruby because I knew how to write this quickly; any other similar language would be similarly easy.)
  • According to Smultron my files are Unicode (UTF-8) encoded... So Ned is right indeed. US-ASCII is a subset of UTF-8. Then my problem should come from something else (thing is I am not dealing with the non-ASCII strings inside the php file BUT am receiving them over the internet: I'm scraping a webpage...). Thanks for your time!