Importing Large XML file into SQL 2.5Gb

import multiple xml files into sql server
importing and processing data from xml files into sql server tables
sql server import xml into multiple tables
import xml into sql server using xsd
sql query xml file
how to insert xml data into table in sql server
how to extract data from xml file using sql query
how to read xml file in sql server stored procedure

Hi I am trying to import a large XML file into a table on my sql server (2014)

I have used the code below for smaller files and thought it would be ok as this is a once off, I kicked it off yesterday and the query was still running when I came into work today so this is obviously the wrong route.

here is the code.

CREATE TABLE files_index_bulk
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)


INSERT INTO files_index_bulk(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn, 2) AS BulkColumn, GETDATE() 
FROM OPENROWSET(BULK 'c:\scripts\icecat\files.index.xml', SINGLE_BLOB) AS x;


SELECT * FROM files_index_bulk

Can anyone point out another way of doing this please ive looked around at importing large files and it keeps coming back to using bulk. which I already am.

thanks in advance.

here is the table I am using I want to pull all the data into.

USE [ICECATtesting]
GO

/****** Object:  Table [dbo].[files_index]    Script Date: 28/04/2017 20:10:44 
******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[files_index](
    [Product_ID] [int] NULL,
    [path] [varchar](100) NULL,
    [Updated] [varchar](50) NULL,
    [Quality] [varchar](50) NULL,
    [Supplier_id] [int] NULL,
    [Prod_ID] [varchar](1) NULL,
    [Catid] [int] NULL,
    [On_Market] [int] NULL,
    [Model_Name] [varchar](250) NULL,
    [Product_View] [int] NULL,
    [HighPic] [varchar](1) NULL,
    [HighPicSize] [int] NULL,
    [HighPicWidth] [int] NULL,
    [HighPicHeight] [int] NULL,
    [Date_Added] [varchar](150) NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

and here is a snippit of the xml file.

<ICECAT-interface xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://data.icecat.biz/xsd/files.index.xsd">
  <files.index Generated="20170427010009">
  <file path="export/level4/EN/11.xml" Product_ID="11" Updated="20170329110432" Quality="SUPPLIER" Supplier_id="2" Prod_ID="PS300E-03YNL-DU" Catid="151" On_Market="0" Model_Name="Satellite 3000-400" Product_View="587591" HighPic="" HighPicSize="0" HighPicWidth="0" HighPicHeight="0" Date_Added="20050627000000">
  </file>
  <file path="export/level4/EN/12.xml" Product_ID="12" Updated="20170329110432" Quality="ICECAT" Supplier_id="7" Prod_ID="91.42R01.32H" Catid="151" On_Market="0" Model_Name="TravelMate  740LF" Product_View="40042" HighPic="http://images.icecat.biz/img/norm/high/12-31699.jpg" HighPicSize="19384" HighPicWidth="170" HighPicHeight="192" Date_Added="20050627000000">
  </file>
  <file path="export/level4/EN/13.xml" Product_ID="13" Updated="20170329110432" Quality="SUPPLIER" Supplier_id="2" Prod_ID="PP722E-H390W-NL" Catid="151" On_Market="0" Model_Name="Portégé 7220CT / NW2" Product_View="37021" HighPic="http://images.icecat.biz/img/norm/high/13-31699.jpg" HighPicSize="27152" HighPicWidth="280" HighPicHeight="280" Date_Added="20050627000000">
  </file>

The max size of an XML column value in SQL Server is 2GB. It will not be possible to import a 2.5GB file into a single XML column.

UPDATE

Since your underlying objective is to transform XML elements within the file into table rows, you don't need to stage the entire file contents into a single XML column. You can avoid the 2GB limitation, reduce memory requirements, and improve performance by shredding the XML in client code and using a bulk insert technique to insert batches of multiple rows.

The example Powershell script below uses an XmlTextReader to avoid reading the entire XML into a DOM and uses SqlBulkCopy to insert batches of many rows at once. The combination of these techniques should allow you to insert millions rows in minutes rather than hours. These same techniques can be implemented in a custom app or SSIS script task.

I noticed a couple of the table columns specify varchar(1) yet the XML attribute values contain many characters. You'll need to either expand length of the columns or transform the source values.

[String]$global:connectionString = "Data Source=YourServer;Initial Catalog=YourDatabase;Integrated Security=SSPI";
[System.Data.DataTable]$global:dt = New-Object System.Data.DataTable;
[System.Xml.XmlTextReader]$global:xmlReader = New-Object System.Xml.XmlTextReader("C:\FilesToImport\files.xml");
[Int32]$global:batchSize = 10000;

Function Add-FileRow() {
    $newRow = $dt.NewRow();
    $null = $dt.Rows.Add($newRow);
    $newRow["Product_ID"] = $global:xmlReader.GetAttribute("Product_ID");
    $newRow["path"] = $global:xmlReader.GetAttribute("path");
    $newRow["Updated"] = $global:xmlReader.GetAttribute("Updated");
    $newRow["Quality"] = $global:xmlReader.GetAttribute("Quality");
    $newRow["Supplier_id"] = $global:xmlReader.GetAttribute("Supplier_id");
    $newRow["Prod_ID"] = $global:xmlReader.GetAttribute("Prod_ID");
    $newRow["Catid"] = $global:xmlReader.GetAttribute("Catid");
    $newRow["On_Market"] = $global:xmlReader.GetAttribute("On_Market");
    $newRow["Model_Name"] = $global:xmlReader.GetAttribute("Model_Name");
    $newRow["Product_View"] = $global:xmlReader.GetAttribute("Product_View");
    $newRow["HighPic"] = $global:xmlReader.GetAttribute("HighPic");
    $newRow["HighPicSize"] = $global:xmlReader.GetAttribute("HighPicSize");
    $newRow["HighPicWidth"] = $global:xmlReader.GetAttribute("HighPicWidth");
    $newRow["HighPicHeight"] = $global:xmlReader.GetAttribute("HighPicHeight");
    $newRow["Date_Added"] = $global:xmlReader.GetAttribute("Date_Added");
}

try
{

    # init data table schema
    $da = New-Object System.Data.SqlClient.SqlDataAdapter("SELECT * FROM dbo.files_index WHERE 0 = 1;", $global:connectionString);
    $null = $da.Fill($global:dt);
    $bcp = New-Object System.Data.SqlClient.SqlBulkCopy($global:connectionString);
    $bcp.DestinationTableName = "dbo.files_index";

    $recordCount = 0;

    while($xmlReader.Read() -eq $true)
    {

        if(($xmlReader.NodeType -eq [System.Xml.XmlNodeType]::Element) -and ($xmlReader.Name -eq "file"))
        {
            Add-FileRow -xmlReader $xmlReader;
            $recordCount += 1;
            if(($recordCount % $global:batchSize) -eq 0) 
            {
                $bcp.WriteToServer($dt);
                $dt.Rows.Clear();
                Write-Host "$recordCount file elements processed so far";
            }
        }

    }

    if($dt.Rows.Count -gt 0)
    {
        $bcp.WriteToServer($dt);
    }

    $bcp.Close();
    $xmlReader.Close();

    Write-Host "$recordCount file elements imported";

}
catch
{
    throw;
}

XML Source file size limit - MSDN, I need to import data from an xml file which is more than 2 gb in size. https://​docs.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-​bulk- We once had a huge product catalog provided as XML file to be processed and in that case So it is definitely doable to load 2.5 GB XML files. Step 3 – Importing the XML data file into a SQL Server Table. Now all we need is to make SQL Server read the XML file and import the data via the OPENROWSET function. This function is native to T-SQL and allows us to read data from many different file types through the BULK import feature, which allows the import from lots of file types, like XML.

Try this. Just another method that I have used for some time. It's pretty fast (could be faster). I pull a huge xml db from a gaming company every night. This is how i get it an import it.

 $xml  = new XMLReader();            
 $xml->open($xml_file); // file is your xml file you want to parse
 while($xml->read() && $xml->name != 'game') { ; } // get past the header to your first record (game in my case)

while($xml->name == 'game') { // now while we are in this record               
                $element        = new SimpleXMLElement($xml->readOuterXML());
                $gameRec        = $this->createGameRecord($element, $os); // this is my function to reduce some clutter - and I use it elsewhere too

                /* this looks confusing, but it is not. There are over 20 fields, and instead of typing them all out, I just made a string. */
                $sql = "INSERT INTO $table (";
                foreach($gameRec as $field=>$game){
                $sql .= " $field,";
                }
                $sql = rtrim($sql, ",");
                $sql .=") values (";

                foreach($gameRec as $field=>$game) {
                    $sql .= " :$field,";               
                }
                $sql = rtrim($sql,",");
                $sql .= ") ON DUPLICATE KEY UPDATE "; // online game doesn't have a gamerank - not my choice LOL, so I adjust that for here

                switch ($os) {
                    case 'pc' : $sql .= "gamerank = ".$gameRec['gamerank']        ; break;
                    case 'mac': $sql .= "gamerank = ".$gameRec['gamerank']        ; break;
                    case 'pl' : $sql .= "playercount = ".$gameRec['playercount']  ; break;
                    case 'og' :
                        $playercount = $this->getPlayerCount($gameRec['gameid']);
                        $sql .= "playercount = ".$playercount['playercount']  ;
                        break;

                }


                try {

                    $stmt = $this->connect()->prepare($sql);
                    $stmt->execute($gameRec);

                } catch (PDOException $e) {// Kludge

                    echo 'os: '.$os.'<br/>table: '.$table.'<br/>XML LINK: '.$comprehensive_xml.'<br/>Current Record:<br/><pre>'.print_r($gameRec).'</pre><br/>'.
                    'SQL: '.$sql.'<br/>';
                    die('Line:33<br/>Function: pullBFG()<BR/>Cannot add game record <br/>'.$e->getMessage());

                }

                /// VERY VERY VERY IMPORTANT do not forget these 2 lines, or it will go into a endless loop - I know, I've done it. locks up your system after a bit hahaah
                $xml->next('game');
                unset($element);
            }// while there are games

This should get you started. Obviously, adjust the "game" to your xml records. Trim out the fat I have here.

Here is the createGameRecord($element, $type='pc') Basically it turns it into an array to use elsewhere, and makes it easier to add it to the db. with a single line as seen above: $stmt->execute($gameRec); Where $gameRec was returned from this function. PDO knows gameRec is an array, and will parse it out as you INSERT IT. the "delHardReturns() is another of my fucntion that gets rid of those hard returns /r /n etc.. Seems to mess up the SQL. I think SQL has a function for that, but I have not pursed it. Hope you find this useful.

private function createGameRecord($element, $type='pc') {
            if( ($type == 'pc') || ($type == 'og') ) { // player count is handled separately
                $game = array(
                    'gamename'                  => strval($element->gamename),
                    'gameid'                    => strval($element->gameid),                
                    'genreid'                   => strval($element->genreid),
                    'allgenreid'                => strval($element->allgenreid),
                    'shortdesc'                 => $this->delHardReturns(strval($element->shortdesc)),
                    'meddesc'                   => $this->delHardReturns(strval($element->meddesc)),
                    'bullet1'                   => $this->delHardReturns(strval($element->bullet1)),
                    'bullet2'                   => $this->delHardReturns(strval($element->bullet2)),
                    'bullet3'                   => $this->delHardReturns(strval($element->bullet3)),
                    'bullet4'                   => $this->delHardReturns(strval($element->bullet4)),
                    'bullet5'                   => $this->delHardReturns(strval($element->bullet5)),
                    'longdesc'                  => $this->delHardReturns(strval($element->longdesc)),
                    'foldername'                => strval($element->foldername),
                    'hasdownload'               => strval($element->hasdownload),
                    'hasdwfeature'              => strval($element->hasdwfeature),                             
                    'releasedate'               => strval($element->releasedate)

                );

                if($type === 'pc')  {

                    $game['hasvideo']           = strval($element->hasvideo);
                    $game['hasflash']           = strval($element->hasflash);
                    $game['price']              = strval($element->price); 
                    $game['gamerank']           = strval($element->gamerank);
                    $game['gamesize']           = strval($element->gamesize);
                    $game['macgameid']          = strval($element->macgameid);
                    $game['family']             = strval($element->family);
                    $game['familyid']           = strval($element->familyid);
                    $game['productid']          = strval($element->productid);
                    $game['pc_sysreqos']        = strval($element->systemreq->pc->sysreqos);
                    $game['pc_sysreqmhz']       = strval($element->systemreq->pc->sysreqmhz);
                    $game['pc_sysreqmem']       = strval($element->systemreq->pc->sysreqmem);
                    $game['pc_sysreqhd']        = strval($element->systemreq->pc->sysreqhd);

                    if(empty($game['gamerank'])) $game['gamerank'] = 99999;

                    $game['gamesize'] = $this->readableBytes((int)$game['gamesize']);  


                }// dealing with PC type

                if($type === 'og') {
                    $game['onlineiframeheight']              = strval($element->onlineiframeheight);
                    $game['onlineiframewidth']              = strval($element->onlineiframewidth); 

                }

                $game['releasedate']            = substr($game['releasedate'],0,10);

            } else {// not type = pl

                $game['playercount']            = strval($element->playercount);
                $game['gameid']                 = strval($element->gameid);
            }// no type = pl else


            return $game;
        }/

Import large XML file into SQL Server CE, I try to import data from a XML file into SQL Server CE database. I use ErikEJ SQL Server Compact Bulk Insert Library (from NuGet) this library  You can use this format file to bulk import XML documents into the xTabletable by using a bcpcommand or a BULK INSERTor INSERT SELECT * FROM OPENROWSET(BULK)statement. Example D. This example uses the Xmltable.fmtformat file in a BULK INSERTstatement to import the contents of an XML data file named Xmltable.dat.

Updated: Much faster. I did some research, and while the above post I made shows one (slow) method, I was able to find one that works even faster - for me it does. I put this as a new answer due to the complete difference from my previous post.

LOAD XML LOCAL INFILE 'path/to/file.xlm' INTO TABLE tablename ROWS IDENTIFIED BY '<xml-identifier>'

Example

<students>
    <student>
       <name>john doe</name>
          <boringfields>bla bla bla......</boringfields>
    </student>
</students>

Then, MYSQL command would be:

LOAD XML LOCAL INFILE 'path/to/students.xlm' INTO TABLE tablename ROWS IDENTIFIED BY '<student>'

rows identified must have single quote and angle brackets. when I switched to this method, I went from 12min +/- to 30 seconds!! +/-

tips that worked for me. was use the DELETE FROM tablename otherwise it will just append to your db.

Ref: https://dev.mysql.com/doc/refman/5.5/en/load-xml.html

Split Large Xml Files?, 2.5Gb? really 2.5Gb? Can xml file might be hard to split. it depends what the tags look like, and how My understanding of the way it's being handled now is the data has been stored in SQL with Access only linking to the table, but the store, or in this case importing large files is just basic planning here. Create a VBScript program to execute the XML Bulk Load component This is the script that uses the XML Bulk Load component to insert the three records you created in the "Create the XML Data Source File" heading into the table you created in the "Create Table to Receive the Data" heading by using the mapping schema discussed in the "Create the Mapping Schema File" heading.

How can I work with a 4GB csv file?, More such tools: Text editor to open big (giant, huge, large) text files you can connect to the file with sql and run your analysis from there. import csv import gzip with gzip.open("test.csv.gz", "r") as f: reader = csv.reader(f) variety of formats (mostly log formats as that's what it was meant for, but XML and CSV are valid). Also, XML files are all about the information and data within a file and not about how to display the data or information. Thus, an XML file is often self defining although special files called XML Schema Definition files (XSD) can also be used to validate, format, and describe a particular XML file. Of course, XSD's are coded in XML. Import

Reduce DB file size, The log_* tables get huge. Also sales and customer data is unlikely needed in a development environment so you will skip the data in all sales, customer and log tables and also gzip the sql file. $xml = simplexml_load_file(dirname(__​FILE__) . '/. 3] Import the dumped file to your destination server: Bulk loading XML data. You can bulk load XML data into the server by using the bulk loading capabilities of SQL Server, such as bcp. OPENROWSET allows you to load data into an XML column from files. The following example illustrates this point. Example: Loading XML from Files. This example shows how to insert a row in table T.

Wikipedia talk:Database download/Archive 1, I don't want to make a huge database download, but just want the him to have Wouldn't it be nice to have a download of Wikipedia based on an XML language? Unfortunately all I get instead of a gzip, tar, or zip file is the text data dump to my I tried importing the en.sql db into MySQL 4.1.2-alpha and got this error:. i have to import xml document having size more than 10GB in Sql Server database. i also have to do XSD Validation before Processing and maintain transaction. Please suggest me the best way to do th

Comments
  • You could write your own program that uses a SAX parser to insert the records. It might not be much faster, but at least you could add some kind of progress meter into it, so you know it is working correctly.
  • I will look that up. I was hoping to keep it all within sql if possible. there must be a way of doing this gradually rather than loading it all into memory at once.
  • What's the point of storing 2.5GB of text data into a single cell? That's an entire database's worth of data.
  • Have you tried executing the query without SINGLE_BLOB?
  • Thank you Dan, what command would you use to skip the importing it to a single column then getting it into a table (as that is my desired result)?
  • @JohnSpencer, your desired result is unclear to me. Is it that the table in your question is a staging table and your desired result in to shred each XML value into multiple rows to insert into another table? If so, your best bet would be to parse the XML in a client app or PS script (maybe an XmlReader or similar) rather than using a staging table. Maybe we can help if you provide the final table and sample XML.
  • Hi Dan thanks for looking, I have updated the question shown my destination table I would like the data from the xml file files.index.xml to go into [dbo].[files_index]. I have a daily file with the same structure as im new to xml I am just pulling it in to a temp table the putting the data into its own table, (same structure as above) this works fine, however as this file is so big my method does not work.
  • Thankyou for all that code I really appreciate it but im getting this error when running the powershell file At C:\Scripts\test.ps1:64 char:53 + Write-Host "$recordCount file elements imported "; + ~~ The string is missing the terminator: ". At C:\Scripts\test.ps1:70 char:3 + }) + ~ Missing closing ')' in expression. + CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException + FullyQualifiedErrorId : TerminatorExpectedAtEndOfString
  • @JohnSpencer, I copy/pasted the code into a Powershell ISE window, changed the connection string, and it ran successfully. There are 68 lines in the script but I see line 70 referenced in the error message. Could there be some extraneous lines in your version?