Convert utf-8 XML document to utf-16 for inserting into SQL

sql query xml file
sql xml encoding
sql xml query
sql server cast text to xml
sql select from xml string
read xml in sql server

I have an XML document that has been created using utf-8 encoding. I want to store that document in a sql 2008 xml column but I understand I need to convert it to utf-16 in order to do that.

I've tried using XDocument to do this but I'm not getting a valid XML result after the conversion. Here is what I've tried to do the conversion on (Utf8StringWriter is a small class that inherits from StringWriter and overloads Encoding):

XDocument xDoc = XDocument.Parse(utf8Xml);
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() 
                { Encoding = writer.Encoding, Indent = true });

xDoc.WriteTo(xml);

string utf16Xml = writer.ToString();

The data in the utf16Xml is invalid and when trying to insert into the database I get the error:

{"XML parsing: line 1, character 38, unable to switch the encoding"}

However the initial utf8Xml data is definitely valid and contains all the info I need.

UPDATE: The initial XML is obtained by using XMLSerializer (with an Utf8StringWriter class) to create the xml string from an existing object model (engine). The code for this is:

public static void Serialise<T>(T engine, ref StringWriter writer)
{
    XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() { Encoding = writer.Encoding });

    XmlSerializer xs = new XmlSerializer(engine.GetType());

    xs.Serialize(xml, engine);
}

I have to leave this like this as that code is out of my control to change.

Before I even send the utf16Xml string to the failing database call I can view it via the Visual Studio debugger and I notice that the entire string is not present and instead I get a string literal was not closed error on the XML viewer.

The error is on first line XDocument xDoc = XDocument.Parse(utf8Xml);. Most likely you converted utf8 stream into a string (utf8xml), but encoding specified in the string is still utf-8, so XML reader fails. If it is true than load XML directly from stream using Load instead of converting it to string first.

Using XML Data Types, The xml data type is a built-in data type in SQL Server, and is in Data stored in columns of type XML in a rowset can also be retrieved, inserted, This allows a wider range of XML documents (for example those encoded in UTF-8) to be In this case a BOM should be present with UTF-16 encoded XML,  This means that UTF-16 encoded XML needs to provide the UTF-16 BOM and an instance without BOM and without a declaration encoding will be interpreted as UTF-8. If the encoding of the XML document is not known in advance and the data is passed as string or binary data instead of XML data before casting to XML, it is recommended to treat the data

Set the encoding of the document to UTF-16 after you have parsed it from utf8xml

XDocument xDoc = XDocument.Parse(utf8Xml);
xDoc.Declaration.Encoding = "utf-16";
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() 
                { Encoding = writer.Encoding, Indent = true });

xDoc.WriteTo(xml);

string utf16Xml = writer.ToString();

Programming with Unicode, Implicit Data Type Conversion Between Oracle Database SQL Language Reference The national character set can be either UTF8 or AL16UTF16. UTF8 may affect performance because it is When you insert data into an NCLOB column  The benefits of introducing UTF-8 support also extend to scenarios where legacy applications require internationalization and use inline queries: the amount of changes and testing involved to convert an application and underlying database to UTF-16 can be costly, by requiring complex string processing logic that affect application performance.

Here's what I had to do to make it work. This just converts the XML to utf-16

string getUtf16Xml(System.Xml.XmlDocument xmlDoc)
{    
   System.Xml.Linq.XDocument xDoc = System.Xml.Linq.XDocument.Parse(xmlDoc.OuterXml);
   xDoc.Declaration.Encoding = "utf-16";

   return xDoc.ToString();    
}

Then I can save the results to the DB.

Supporting Multilingual Databases with Unicode, Conversion between different Unicode encodings is a simple bit-wise One Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8 encoding. Unicode data in either the UTF-16 or UTF-8 encoding form in SQL NCHAR datatypes If supplementary characters are inserted into a UTF8 database, then it does  As a workaround, you can install the SQLNCLI library shipped in SQL Server 2005 (I beleive it is a downable package) on another machine, and bulk load UTF-8 data into SQL Server 2008 table. Another workaround is converting your data from UTF-8 to UTF-16 encoding, you can either write a simple C# program or use NotePad.

xml encoding utf-16 not working in sqlserver – SQLServerCentral, xml encoding utf-16 not working in sqlserver – Learn more on the SQLServerCentral forums. insert into #t(tid,i) values(2,'<?xml version="1.0" encoding="utf-8"?> not just simple naive conversion examples, i.e. where does the XML come from, how is it generated, how is it loaded into SQL Server table,  I have an XML document that has been created using utf-8 encoding. I want to store that document in a sql 2008 xml column but I understand I need to convert it to utf-16 in order to do that. I've tried using XDocument to do this but I'm not getting a valid XML result after the conversion.

XML File Encoding before insert into XML Column, XML File Encoding before insert into XML Column – Learn more on the for a way to add an encoding from T-SQL, from my stored procedure to every XML file, I.e. it will try to convert the UTF-16 string from UTF-8 to internal  The following example shows how to bulk import the following XML document, Xmltable.dat. Sample Data File. The document in Xmltable.dat contains two XML values, one for each row. The first XML value is encoded with UTF-16, and the second value is encoded with UTF-8. The contents of this data file are shown in the following Hex dump:

A Guide to UTF-8 Encoding in PHP and MySQL, UTF-8 is a variable-width encoding that can represent every character in the Unicode the complications of endianness and byte order marks in UTF-16 and UTF-32. Since not all UTF-8 characters are accepted in an XML document, you'​ll need to --skip-extended-insert artists-database --tables tbl_artist > tbl_artist.​sql. with sql server i can export data to an utf-16 encoding file, but i need it as utf-8 encoding. manually it can be done with notepad (save as, unicode) but i wonder if there is a way to convert the encoding with t-sql or even with ms access . thanks ahead

Comments
  • thanks for the comment. I actually get given the string from another method that used XMLSerializer to create the XML in the first place so I don't have access to the stream itself.
  • So look at first characters - there is likely "encoding=....", if it is present or set to something different that UTF-16 here is your problem. I'd try to use XmlDocument.LoadXml in this case ...
  • I just noticed I had the wrong string writer specified in my example. I meant to only using StringWriter as I want the XML in utf-16 not utf-8. Updated my question.
  • @dreza this line "xDoc.Declaration.Encoding = "utf-16";" should do the trcik for you then :)