How to return xml as UTF-8 instead of UTF-16

xmlwriter utf-8
xml serialize to utf-8
xml set encoding=utf-8 c#
stringwriterwithencoding
c# serialize stringwriter
stringwriter tostring encoding
c# convert object to xml utf-8
using stringwriter

I am using a routine that serializes <T>. It works, but when downloaded to the browser I see a blank page. I can view the page source or open the download in a text editor and I see the xml, but it is in UTF-16 which I think is why browser pages show blank?

How do I modify my serializer routine to return UTF-8 instead of UTF-16?

The XML source returned:

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfString xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <string>January</string>
  <string>February</string>
  <string>March</string>
  <string>April</string>
  <string>May</string>
  <string>June</string>
  <string>July</string>
  <string>August</string>
  <string>September</string>
  <string>October</string>
  <string>November</string>
  <string>December</string>
  <string />
</ArrayOfString>

An example call to the serializer:

DateTimeFormatInfo dateTimeFormatInfo = new DateTimeFormatInfo();
var months = dateTimeFormatInfo.MonthNames.ToList();

string SelectionId = "1234567890";

return new XmlResult<List<string>>(SelectionId)
{
    Data = months
};

The Serializer:

public class XmlResult<T> : ActionResult
{
    private string filename = DateTime.Now.ToString("ddmmyyyyhhss");

    public T Data { private get; set; }

    public XmlResult(string selectionId = "")
    {
        if (selectionId != "")
        {
            filename = selectionId;
        }
    }

    public override void ExecuteResult(ControllerContext context)
    {
        HttpContextBase httpContextBase = context.HttpContext;
        httpContextBase.Response.Buffer = true;
        httpContextBase.Response.Clear();

        httpContextBase.Response.AddHeader("content-disposition", "attachment; filename=" + filename + ".xml");
        httpContextBase.Response.ContentType = "text/xml";

        using (StringWriter writer = new StringWriter())
        {
            XmlSerializer xml = new XmlSerializer(typeof(T));
            xml.Serialize(writer, Data);
            httpContextBase.Response.Write(writer);
        }
    }
}
Encoding of the Response

I am not quite familiar with this part of the framework. But according to the MSDN you can set the content encoding of an HttpResponse like this:

httpContextBase.Response.ContentEncoding = Encoding.UTF8;
Encoding as seen by the XmlSerializer

After reading your question again I see that this is the tough part. The problem lies within the use of the StringWriter. Because .NET Strings are always stored as UTF-16 (citation needed ^^) the StringWriter returns this as its encoding. Thus the XmlSerializer writes the XML-Declaration as

<?xml version="1.0" encoding="utf-16"?>

To work around that you can write into an MemoryStream like this:

using (MemoryStream stream = new MemoryStream())
using (StreamWriter writer = new StreamWriter(stream, Encoding.UTF8))
{
    XmlSerializer xml = new XmlSerializer(typeof(T));
    xml.Serialize(writer, Data);

    // I am not 100% sure if this can be optimized
    httpContextBase.Response.BinaryWrite(stream.ToArray());
}

Other approaches

Another edit: I just noticed this SO answer linked by jtm001. Condensed the solution there is to provide the XmlSerializer with a custom XmlWriter that is configured to use UTF8 as encoding.

Athari proposes to derive from the StringWriter and advertise the encoding as UTF8.

To my understanding both solutions should work as well. I think the take-away here is that you will need one kind of boilerplate code or another...

Encoding XML in UTF-8 with .NET, It explains how to replace the default UTF-16 encoding with UTF-8. I have implemented UTF8 instead of Encoding.UTF16). Using an instance of this class as the target for the XML serialization output produces UTF-8 output. Force XmlWriter or XmlTextWriter to use Encoding Other Than UTF-16 Posted by Timm 2 Comments » You may have noticed the first line of XML output generated by XmlWriter or XmlTextWriter shows that the encoding defaults to UTF-16:

You can use a StringWriter that will force UTF8. Here is one way to do it:

public class Utf8StringWriter : StringWriter
{
    // Use UTF8 encoding but write no BOM to the wire
    public override Encoding Encoding
    {
         get { return new UTF8Encoding(false); } // in real code I'll cache this encoding.
    }
}

and then use the Utf8StringWriter writer in your code.

using (StringWriter writer = new Utf8StringWriter())
{
    XmlSerializer xml = new XmlSerializer(typeof(T));
    xml.Serialize(writer, Data);
    httpContextBase.Response.Write(writer);
}

answer is inspired by Serializing an object as UTF-8 XML in .NET

Serialize with UTF-8 - MSDN, I would like the output to include an xml declaration that looks like this: I get utf-​16 as the explicit encoding (and the output isn't nicely formatted). Use WCF for All New Web Service Development, instead of legacy ASMX or  In fact, we measured about 25% performance degradation for intensive read I/O when a dataset is mostly in this range, and is using UTF-8 instead of UTF-16. In the Supplementary character range (65536 to 1114111) there is no measurable difference between UTF-8 and UTF-16 encoding, both from a storage and performance perspective.

To serialize as UTF8 string:

    private string Serialize(MyData data)
    {
        XmlSerializer ser = new XmlSerializer(typeof(MyData));
        // Using a MemoryStream to store the serialized string as a byte array, 
        // which is "encoding-agnostic"
        using (MemoryStream ms = new MemoryStream())
            // Few options here, but remember to use a signature that allows you to 
            // specify the encoding  
            using (XmlTextWriter tw = new XmlTextWriter(ms, Encoding.UTF8)) 
            {
                tw.Formatting = Formatting.Indented;
                ser.Serialize(tw, data);
                // Now we get the serialized data as a string in the desired encoding
                return Encoding.UTF8.GetString(ms.ToArray());
            }
    }

To return it as XML on a web response, don't forget to set the response encoding:

    string xml = Serialize(data);
    Response.ContentType = "application/xml";
    Response.ContentEncoding = System.Text.Encoding.UTF8;
    Response.Output.Write(xml);

XML is UTF-16 encoding but I want UTF-8 encoding ? - .NET , My question is why am I getting the encoding as UTF-16, I want a UTF-8 encoding, how can this be achieve by getting the output in the xout string variable​. If you really care about getting UTF-8 encoding, then instead of It works, but when downloaded to the browser I see a blank page. I can view the page source or open the download in a text editor and I see the xml, but it is in UTF-16 which I think is why browser pages show blank? How do I modify my serializer routine to return UTF-8 instead of UTF-16? The XML source returned:

How To Create XML in C# with UTF-8 , The Problem: My XML refuses to use UTF-8 encoding. I have some Here's the code I used to create and return my XDocument. UTF-16?? in my code instead of StringWriter and I'll be using UTF-8 encoding in my writer. 4 Answers 4. Your code doesn't get the UTF-8 into memory as you read it back into a string again, so its no longer in UTF-8, but back in UTF-16 (though ideally its best to consider strings at a higher level than any encoding, except when forced to do so).

Serialize xml: XML declaration returns utf-16 versus utf-8 · Issue , Serialize xml: XML declaration returns utf-16 versus utf-8 #119 However, I'd rather the encoding attribute gets fixed so I can reload your  Here the XML is incorrectly converted to a string, which does not have an encoding. The XML declaration node is not updated to reflect the encoding of the response, and the response is not properly encoded to match the response's encoding. Also, storing the XML in an intermediate string wastes memory.

UTF-16 Encoding vs. UTF-8?, Replace("utf-16", "utf-8") Return body End Get End Property End Class Jason said he need UTF-8 encoded XML document, not UTF-16  UTF-8 vs UTF-16. UTF stands for Unicode Transformation Format. It is a family of standards for encoding the Unicode character set into its equivalent binary value. UTF was developed so that users have a standardized means of encoding the characters with the minimal amount of space.UTF-8 and UTF 16 are only two of the established standards for encoding.

Comments
  • I think this article gives you what you are looking for: stackoverflow.com/questions/22453036/…
  • "it is in UTF-16 which I think is why browser pages show blank?" I see no reason to think that. Investigate your file, what encoding is it actually? Any BOM code at the start? etc.
  • The downside of this answer is that for large XML responses, you are now writing them all into memory which will leads to potentially unwanted large memory consumption, and if you exceed 85KB your response will go into the large object heap. When that happens often your app might start freezing during garbage collection.
  • ".NET uses the UTF-16 encoding ... to represent characters and strings" (the citation)
  • Is overriding the Encoding free of unwanted side effects? I am not aware of any negative implications this might have, but I'd have a bad feeling about it...
  • None that I'm aware of, i've used it in the past in many scenarios. But for a server we don't use a stringwriter for this scenario at all, because it will double buffer unnecessarily. This is what we do in MVC vNext (and similarly in Web API) github.com/aspnet/Mvc/blob/dev/src/Microsoft.AspNet.Mvc.Core/… github.com/aspnet/Mvc/blob/dev/src/Microsoft.AspNet.Mvc.Core/…
  • Yishai, NobodysNightmare's answer works to do what I needed. I tried his answer before seeing yours. Maybe you pointed me in the right direction also. Thank you for taking the time to try to help.
  • When using this technique, you need to implement a default constructor as well otherwise you will see errors.
  • @ITExpert thanks for the pointers, It likely be even more helpful for other users if you can expand on the previous comment as to why that is necessary, or what the error is.