Effective way to find any file's Encoding

find files on my computer windows 10
c# detect file encoding
find encoding of file in windows
how to find a saved file on your computer
how do you determine the encoding of a file
java detect file encoding
how to determine the encoding of a file python
c# streamreader detect encoding

Yes is a most frequent question, and this matter is vague for me and since I don't know much about it.

But i would like a very precise way to find a files Encoding. So precise as Notepad++ is.

The StreamReader.CurrentEncoding property rarely returns the correct text file encoding for me. I've had greater success determining a file's endianness, by analyzing its byte order mark (BOM):

/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
    // Read the BOM
    var bom = new byte[4];
    using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))
    {
        file.Read(bom, 0, 4);
    }

    // Analyze the BOM
    if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
    if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
    if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
    if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
    if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
    return Encoding.ASCII;
}

As a side note, you may want to modify the last line of this method to return Encoding.Default instead, so the encoding for the OS's current ANSI code page is returned by default.

Windows Basics: Finding Files on Your Computer, In the previous lesson, we talked about how folders can help to keep your files organized. However Searching allows you to look for any file on your computer​. The way is simple, if all characters are between x00-x7E, ASCII, UTF-8 and Latin-1 are all the same, but if I read a non ASCII file by UTF-8, we will find the special character show up, so try to read with Latin-1.

The following code works fine for me, using the StreamReader class:

  using (var reader = new StreamReader(fileName, defaultEncodingIfNoBom, true))
  {
      reader.Peek(); // you need this!
      var encoding = reader.CurrentEncoding;
  }

The trick is to use the Peek call, otherwise, .NET has not done anything (and it hasn't read the preamble, the BOM). Of course, if you use any other ReadXXX call before checking the encoding, it works too.

If the file has no BOM, then the defaultEncodingIfNoBom encoding will be used. There is also a StreamReader without this overload method (in this case, the Default (ANSI) encoding will be used as defaultEncodingIfNoBom), but I recommand to define what you consider the default encoding in your context.

I have tested this successfully with files with BOM for UTF8, UTF16/Unicode (LE & BE) and UTF32 (LE & BE). It does not work for UTF7.

Three Ways to Quickly Search Your Computer's Files on Windows 10, While Windows still has some powerful search features, they're a bit harder to find—and you might want to consider a third-party tool instead. The Start Menu (and Cortana) The Start menu search functionality on Windows 10 is handled by Cortana, and it searches Bing and other online sources in addition to the files on your local PC. In the initial version of Windows 10, you could click a “My Stuff” button while searching to search only your PC.

I'd try the following steps:

1) Check if there is a Byte Order Mark

2) Check if the file is valid UTF8

3) Use the local "ANSI" codepage (ANSI as Microsoft defines it)

Step 2 works because most non ASCII sequences in codepages other that UTF8 are not valid UTF8.

Find Files Faster: How to Organize Files and Folders, How? With organized file and folder structures. An Intro to Folder Structures; What Makes a Good Folder Structure? 4 Effective Folder Structures  For most people, a file type-based structure works best when it’s within a client or project-based or date-based structure. If your client folders are getting messy, adding file type-based subfolders is a great way to sort things out. Again, think about what kind of work you do.

Check this.

UDE

This is a port of Mozilla Universal Charset Detector and you can use it like this...

public static void Main(String[] args)
{
    string filename = args[0];
    using (FileStream fs = File.OpenRead(filename)) {
        Ude.CharsetDetector cdet = new Ude.CharsetDetector();
        cdet.Feed(fs);
        cdet.DataEnd();
        if (cdet.Charset != null) {
            Console.WriteLine("Charset: {0}, confidence: {1}", 
                 cdet.Charset, cdet.Confidence);
        } else {
            Console.WriteLine("Detection failed.");
        }
    }
}

How to search in Windows 10 to find a file, folder , You may need to search in a Windows 10 computer to find files, folders, or programs. Windows 10's search feature is a quick way to find what you need. Lenovo IdeaPad 130 (From $299.99 at Best Buy) in the upper right-hand corner of the File Explorer screen to type in any identifying information about  Whenever you need to find a document or put something in a folder, just look first for the correct major category (easily identified by both the labels and the color). With the right filing system it’s easy to put your hands on the correct file without a lot of searching. by Ramona Creel. Watch or Listen about How to Set Up an Effective

Providing the implementation details for the steps proposed by @CodesInChaos:

1) Check if there is a Byte Order Mark

2) Check if the file is valid UTF8

3) Use the local "ANSI" codepage (ANSI as Microsoft defines it)

Step 2 works because most non ASCII sequences in codepages other that UTF8 are not valid UTF8. https://stackoverflow.com/a/4522251/867248 explains the tactic in more details.

using System; using System.IO; using System.Text;

// Using encoding from BOM or UTF8 if no BOM found,
// check if the file is valid, by reading all lines
// If decoding fails, use the local "ANSI" codepage

public string DetectFileEncoding(Stream fileStream)
{
    var Utf8EncodingVerifier = Encoding.GetEncoding("utf-8", new EncoderExceptionFallback(), new DecoderExceptionFallback());
    using (var reader = new StreamReader(fileStream, Utf8EncodingVerifier,
           detectEncodingFromByteOrderMarks: true, leaveOpen: true, bufferSize: 1024))
    {
        string detectedEncoding;
        try
        {
            while (!reader.EndOfStream)
            {
                var line = reader.ReadLine();
            }
            detectedEncoding = reader.CurrentEncoding.BodyName;
        }
        catch (Exception e)
        {
            // Failed to decode the file using the BOM/UT8. 
            // Assume it's local ANSI
            detectedEncoding = "ISO-8859-1";
        }
        // Rewind the stream
        fileStream.Seek(0, SeekOrigin.Begin);
        return detectedEncoding;
   }
}


[Test]
public void Test1()
{
    Stream fs = File.OpenRead(@".\TestData\TextFile_ansi.csv");
    var detectedEncoding = DetectFileEncoding(fs);

    using (var reader = new StreamReader(fs, Encoding.GetEncoding(detectedEncoding)))
    {
       // Consume your file
        var line = reader.ReadLine();
        ...

4 Ways To Find Large Files In Windows 10, If you quickly want to free-up memory space on your Windows machine, the best way to do that is to find large files on your Windows 10 PC and  Files are marked private by default and are shared only when you decide to share them. You can access your files online on your PC through a browser, in the downloadable desktop app, and via your

8 Fantastic Ways to Find Anything in Google Drive, The search inside Google Drive is very powerful and helps me find any file that I need. Even with the 8 Fantastic Ways to Find Anything in Google Drive. March 5, 2019 by Kasey Bell Now that's an effective presentation!” – Jane McKinney  In the Find toolbar, type the search text, and then choose Open Full Acrobat Search from the pop-up menu. In the Search window, type the search text. In the Search window, select All PDF Documents In. From the pop-up menu directly below this option, choose Browse For Location. Select the location,

The Art of Filing, You owe it to yourself to file effectively, however boring this may seem. There's a flurry of data pouring in from all directions that we need to process and, usually, store to This way, it's much quicker to find documents for a particular project.

The Most Effective Way of Organizing Your Files and Folders, Discover how to efficiently organize your files, folders and documents back and if you don't do anything with your file organization methods, 

Comments
  • possible duplicate of Java : How to determine the correct charset encoding of a stream
  • Which encodings? UTF-8 vs UTF-16, big vs little endian? Or are you referring to the old MSDos codepages, such as shift-JIS or Cyrillic etc?
  • Another possible duplicate: stackoverflow.com/questions/436220/…
  • @Oded: Quote "The getEncoding() method will return the encoding which was set up (read the JavaDoc) for the stream. It will not guess the encoding for you.".
  • For some background reading, joelonsoftware.com/articles/Unicode.html is a good read. If there is one thing you should know about text, it's that there is no such thing as plain text.
  • +1. This worked for me too (whereas detectEncodingFromByteOrderMarks did not). I used "new FileStream(filename, FileMode.Open, FileAccess.Read)" to avoid a IOException because the file is read only.
  • UTF-8 files can be without BOM, in this case it will return ASCII incorrectly.
  • This answer is wrong. Looking at the reference source for StreamReader, that implementation is what more people will want. They make new encodings rather than using the existing Encoding.Unicode objects, so equality checks will fail (which might rarely happen anyway because, for instance, Encoding.UTF8 can return different objects), but it (1) doesn't use the really weird UTF-7 format, (2) defaults to UTF-8 if no BOM is found, and (3) can be overridden to use a different default encoding.
  • i had better success with new StreamReader(filename, true).CurrentEncoding
  • There is a fundamental error in the code; when you detect the big-endian UTF32 signature (00 00 FE FF), you return the system-provided Encoding.UTF32, which is a little-endian encoding (as noted here). And also, as noted by @Nyerguds, you still are not looking for UTF32LE, which has signature FF FE 00 00 (according to en.wikipedia.org/wiki/Byte_order_mark). As that user noted, because it is subsuming, that check must come before the 2-byte checks.
  • I get back what set as default encoding. Could I be missing momething?
  • @DRAM - this can happen if the file has no BOM