How to convert HTML to PDF using iTextSharp

itextsharp html to pdf .net core
converting html table to pdf table using itextsharp in c#
itext7 html to pdf c#
how to convert html to pdf with image tags using itextsharp
itextsharp html to pdf with css styles c#
convert html to pdf using itext xmlworker c#
convert html to word using itextsharp
java code to convert pdf to html using itext

I want to convert the below HTML to PDF using iTextSharp but don't know where to start:

<style>
.headline{font-size:200%}
</style>
<p>
  This <em>is </em>
  <span class="headline" style="text-decoration: underline;">some</span>
  <strong>sample<em> text</em></strong>
  <span style="color: red;">!!!</span>
</p>

First, HTML and PDF are not related although they were created around the same time. HTML is intended to convey higher level information such as paragraphs and tables. Although there are methods to control it, it is ultimately up to the browser to draw these higher level concepts. PDF is intended to convey documents and the documents must "look" the same wherever they are rendered.

In an HTML document you might have a paragraph that's 100% wide and depending on the width of your monitor it might take 2 lines or 10 lines and when you print it it might be 7 lines and when you look at it on your phone it might take 20 lines. A PDF file, however, must be independent of the rendering device, so regardless of your screen size it must always render exactly the same.

Because of the musts above, PDF doesn't support abstract things like "tables" or "paragraphs". There are three basic things that PDF supports: text, lines/shapes and images. (There are other things like annotations and movies but I'm trying to keep it simple here.) In a PDF you don't say "here's a paragraph, browser do your thing!". Instead you say, "draw this text at this exact X,Y location using this exact font and don't worry, I've previously calculated the width of the text so I know it will all fit on this line". You also don't say "here's a table" but instead you say "draw this text at this exact location and then draw a rectangle at this other exact location that I've previously calculated so I know it will appear to be around the text".

Second, iText and iTextSharp parse HTML and CSS. That's it. ASP.Net, MVC, Razor, Struts, Spring, etc, are all HTML frameworks but iText/iTextSharp is 100% unaware of them. Same with DataGridViews, Repeaters, Templates, Views, etc. which are all framework-specific abstractions. It is your responsibility to get the HTML from your choice of framework, iText won't help you. If you get an exception saying The document has no pages or you think that "iText isn't parsing my HTML" it is almost definite that you don't actually have HTML, you only think you do.

Third, the built-in class that's been around for years is the HTMLWorker however this has been replaced with XMLWorker (Java / .Net). Zero work is being done on HTMLWorker which doesn't support CSS files and has only limited support for the most basic CSS properties and actually breaks on certain tags. If you do not see the HTML attribute or CSS property and value in this file then it probably isn't supported by HTMLWorker. XMLWorker can be more complicated sometimes but those complications also make it more extensible.

Below is C# code that shows how to parse HTML tags into iText abstractions that get automatically added to the document that you are working on. C# and Java are very similar so it should be relatively easy to convert this. Example #1 uses the built-in HTMLWorker to parse the HTML string. Since only inline styles are supported the class="headline" gets ignored but everything else should actually work. Example #2 is the same as the first except it uses XMLWorker instead. Example #3 also parses the simple CSS example.

//Create a byte array that will eventually hold our final PDF
Byte[] bytes;

//Boilerplate iTextSharp setup here
//Create a stream that we can write to, in this case a MemoryStream
using (var ms = new MemoryStream()) {

    //Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
    using (var doc = new Document()) {

        //Create a writer that's bound to our PDF abstraction and our stream
        using (var writer = PdfWriter.GetInstance(doc, ms)) {

            //Open the document for writing
            doc.Open();

            //Our sample HTML and CSS
            var example_html = @"<p>This <em>is </em><span class=""headline"" style=""text-decoration: underline;"">some</span> <strong>sample <em> text</em></strong><span style=""color: red;"">!!!</span></p>";
            var example_css = @".headline{font-size:200%}";

            /**************************************************
             * Example #1                                     *
             *                                                *
             * Use the built-in HTMLWorker to parse the HTML. *
             * Only inline CSS is supported.                  *
             * ************************************************/

            //Create a new HTMLWorker bound to our document
            using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc)) {

                //HTMLWorker doesn't read a string directly but instead needs a TextReader (which StringReader subclasses)
                using (var sr = new StringReader(example_html)) {

                    //Parse the HTML
                    htmlWorker.Parse(sr);
                }
            }

            /**************************************************
             * Example #2                                     *
             *                                                *
             * Use the XMLWorker to parse the HTML.           *
             * Only inline CSS and absolutely linked          *
             * CSS is supported                               *
             * ************************************************/

            //XMLWorker also reads from a TextReader and not directly from a string
            using (var srHtml = new StringReader(example_html)) {

                //Parse the HTML
                iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
            }

            /**************************************************
             * Example #3                                     *
             *                                                *
             * Use the XMLWorker to parse HTML and CSS        *
             * ************************************************/

            //In order to read CSS as a string we need to switch to a different constructor
            //that takes Streams instead of TextReaders.
            //Below we convert the strings into UTF8 byte array and wrap those in MemoryStreams
            using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_css))) {
                using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_html))) {

                    //Parse the HTML
                    iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
                }
            }


            doc.Close();
        }
    }

    //After all of the PDF "stuff" above is done and closed but **before** we
    //close the MemoryStream, grab all of the active bytes from the stream
    bytes = ms.ToArray();
}

//Now we just need to do something with those bytes.
//Here I'm writing them to disk but if you were in ASP.Net you might Response.BinaryWrite() them.
//You could also write the bytes to a database in a varbinary() column (but please don't) or you
//could pass them to another function for further PDF processing.
var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.pdf");
System.IO.File.WriteAllBytes(testFile, bytes);

2017's update

There are good news for HTML-to-PDF demands. As this answer showed, the W3C standard css-break-3 will solve the problem... It is a Candidate Recommendation with plan to turn into definitive Recommendation this year, after tests.

As not-so-standard there are solutions, with plugins for C#, as showed by print-css.rocks.

Convert HTML String To PDF Via iTextSharp Library And Download, In this article, we will see how to convert HTML strings to PDF by using a third party PDF generation library. iTextSharp, StringBuilder  Find Converting Html To Pdf. Explore Other Results at ConsumerSearch.com

@Chris Haas has explained very well how to use itextSharp to convert HTML to PDF, very helpful my add is: By using HtmlTextWriter I put html tags inside HTML table + inline CSS i got my PDF as I wanted without using XMLWorker . Edit: adding sample code: ASPX page:

<asp:Panel runat="server" ID="PendingOrdersPanel">
 <!-- to be shown on PDF-->
 <table style="border-spacing: 0;border-collapse: collapse;width:100%;display:none;" >
 <tr><td><img src="abc.com/webimages/logo1.png" style="display: none;" width="230" /></td></tr>
<tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla.</td></tr>
 <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla.</td></tr>
 <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla</td></tr>
<tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla</td></tr>
<tr style="line-height:10px;height:10px;"><td style="display:none;font-size:11px;color:#10466E;padding:0px;text-align:center;"><i>blablabla</i> Pending orders report<br /></td></tr>
 </table>
<asp:GridView runat="server" ID="PendingOrdersGV" RowStyle-Wrap="false" AllowPaging="true" PageSize="10" Width="100%" CssClass="Grid" AlternatingRowStyle-CssClass="alt" AutoGenerateColumns="false"
   PagerStyle-CssClass="pgr" HeaderStyle-ForeColor="White" PagerStyle-HorizontalAlign="Center" HeaderStyle-HorizontalAlign="Center" RowStyle-HorizontalAlign="Center" DataKeyNames="Document#" 
      OnPageIndexChanging="PendingOrdersGV_PageIndexChanging" OnRowDataBound="PendingOrdersGV_RowDataBound" OnRowCommand="PendingOrdersGV_RowCommand">
   <EmptyDataTemplate><div style="text-align:center;">no records found</div></EmptyDataTemplate>
    <Columns>                                           
     <asp:ButtonField CommandName="PendingOrders_Details" DataTextField="Document#" HeaderText="Document #" SortExpression="Document#" ItemStyle-ForeColor="Black" ItemStyle-Font-Underline="true"/>
      <asp:BoundField DataField="Order#" HeaderText="order #" SortExpression="Order#"/>
     <asp:BoundField DataField="Order Date" HeaderText="Order Date" SortExpression="Order Date" DataFormatString="{0:d}"></asp:BoundField> 
    <asp:BoundField DataField="Status" HeaderText="Status" SortExpression="Status"></asp:BoundField>
    <asp:BoundField DataField="Amount" HeaderText="Amount" SortExpression="Amount" DataFormatString="{0:C2}"></asp:BoundField> 
   </Columns>
    </asp:GridView>
</asp:Panel>

C# code:

protected void PendingOrdersPDF_Click(object sender, EventArgs e)
{
    if (PendingOrdersGV.Rows.Count > 0)
    {
        //to allow paging=false & change style.
        PendingOrdersGV.HeaderStyle.ForeColor = System.Drawing.Color.Black;
        PendingOrdersGV.BorderColor = Color.Gray;
        PendingOrdersGV.Font.Name = "Tahoma";
        PendingOrdersGV.DataSource = clsBP.get_PendingOrders(lbl_BP_Id.Text);
        PendingOrdersGV.AllowPaging = false;
        PendingOrdersGV.Columns[0].Visible = false; //export won't work if there's a link in the gridview
        PendingOrdersGV.DataBind();

        //to PDF code --Sam
        string attachment = "attachment; filename=report.pdf";
        Response.ClearContent();
        Response.AddHeader("content-disposition", attachment);
        Response.ContentType = "application/pdf";
        StringWriter stw = new StringWriter();
        HtmlTextWriter htextw = new HtmlTextWriter(stw);
        htextw.AddStyleAttribute("font-size", "8pt");
        htextw.AddStyleAttribute("color", "Grey");

        PendingOrdersPanel.RenderControl(htextw); //Name of the Panel
        Document document = new Document();
        document = new Document(PageSize.A4, 5, 5, 15, 5);
        FontFactory.GetFont("Tahoma", 50, iTextSharp.text.BaseColor.BLUE);
        PdfWriter.GetInstance(document, Response.OutputStream);
        document.Open();

        StringReader str = new StringReader(stw.ToString());
        HTMLWorker htmlworker = new HTMLWorker(document);
        htmlworker.Parse(str);

        document.Close();
        Response.Write(document);
    }
}

of course include iTextSharp Refrences to cs file

using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.html.simpleparser;
using iTextSharp.tool.xml;

Hope this helps! Thank you

Convert html file to pdf using iTextSharp, In this blog, we will see how we can convert HTML files into PDF files using itextsharp. Find Convert Html To Pdf now. Visit & Look for More Results!

As of 2018, there is also iText7 (A next iteration of old iTextSharp library) and its HTML to PDF package available: itext7.pdfhtml

Usage is straightforward:

HtmlConverter.ConvertToPdf(
    new FileInfo(@"Path\to\Html\File.html"),
    new FileInfo(@"Path\to\Pdf\File.pdf")
);

Method has many more overloads.

Update: iText* family of products has dual licensing model: free for open source, paid for commercial use.

MVC iTextSharp Example: Convert HTML to PDF using iTextSharp , Here Mudassar Ahmed Khan has explained with an example, how to use the iTextSharp HTML to PDF conversion library in ASP.Net MVC  Free Document Converter Software. Convert html, doc, docx, pdf & more. --

I use the following code to create PDF

protected void CreatePDF(Stream stream)
        {
            using (var document = new Document(PageSize.A4, 40, 40, 40, 30))
            {
                var writer = PdfWriter.GetInstance(document, stream);
                writer.PageEvent = new ITextEvents();
                document.Open();

                // instantiate custom tag processor and add to `HtmlPipelineContext`.
                var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
                tagProcessorFactory.AddProcessor(
                    new TableProcessor(),
                    new string[] { HTML.Tag.TABLE }
                );

                //Register Fonts.
                XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
                fontProvider.Register(HttpContext.Current.Server.MapPath("~/Content/Fonts/GothamRounded-Medium.ttf"), "Gotham Rounded Medium");
                CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);

                var htmlPipelineContext = new HtmlPipelineContext(cssAppliers);
                htmlPipelineContext.SetTagFactory(tagProcessorFactory);

                var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
                var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);

                // get an ICssResolver and add the custom CSS
                var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
                cssResolver.AddCss(CSSSource, "utf-8", true);
                var cssResolverPipeline = new CssResolverPipeline(
                    cssResolver, htmlPipeline
                );

                var worker = new XMLWorker(cssResolverPipeline, true);
                var parser = new XMLParser(worker);
                using (var stringReader = new StringReader(HTMLSource))
                {
                    parser.Parse(stringReader);
                    document.Close();
                    HttpContext.Current.Response.ContentType = "application /pdf";
                    if (base.View)
                        HttpContext.Current.Response.AddHeader("content-disposition", "inline;filename=\"" + OutputFileName + ".pdf\"");
                    else
                        HttpContext.Current.Response.AddHeader("content-disposition", "attachment;filename=\"" + OutputFileName + ".pdf\"");
                    HttpContext.Current.Response.Cache.SetCacheability(HttpCacheability.NoCache);
                    HttpContext.Current.Response.WriteFile(OutputPath);
                    HttpContext.Current.Response.End();
                }
            }
        }

Convert HTML string to PDF using ItextSharp, Hey! Why not try Google it will give you plenty of article to learn how to convert HTML string to PDF using ITextSharp whatever you can start  Firstly, we can convert the string of data to PDF by using Popular Library for rendering PDF in ItextSharp. Secondly, we can download / save the converted PDF by using HTTP Response Class which provides response to client and contains information about response in the form of headers and other piece of necessary information.

Here's the link I used as a guide. Hope this helps!

Converting HTML to PDF using ITextSharp

protected void Page_Load(object sender, EventArgs e)
    {
        try
        {
            string strHtml = string.Empty;
            //HTML File path -http://aspnettutorialonline.blogspot.com/
            string htmlFileName = Server.MapPath("~") + "\\files\\" + "ConvertHTMLToPDF.htm";
            //pdf file path. -http://aspnettutorialonline.blogspot.com/
            string pdfFileName = Request.PhysicalApplicationPath + "\\files\\" + "ConvertHTMLToPDF.pdf";

            //reading html code from html file
            FileStream fsHTMLDocument = new FileStream(htmlFileName, FileMode.Open, FileAccess.Read);
            StreamReader srHTMLDocument = new StreamReader(fsHTMLDocument);
            strHtml = srHTMLDocument.ReadToEnd();
            srHTMLDocument.Close();

            strHtml = strHtml.Replace("\r\n", "");
            strHtml = strHtml.Replace("\0", "");

            CreatePDFFromHTMLFile(strHtml, pdfFileName);

            Response.Write("pdf creation successfully with password -http://aspnettutorialonline.blogspot.com/");
        }
        catch (Exception ex)
        {
            Response.Write(ex.Message);
        }
    }
    public void CreatePDFFromHTMLFile(string HtmlStream, string FileName)
    {
        try
        {
            object TargetFile = FileName;
            string ModifiedFileName = string.Empty;
            string FinalFileName = string.Empty;

            /* To add a Password to PDF -http://aspnettutorialonline.blogspot.com/ */
            TestPDF.HtmlToPdfBuilder builder = new TestPDF.HtmlToPdfBuilder(iTextSharp.text.PageSize.A4);
            TestPDF.HtmlPdfPage first = builder.AddPage();
            first.AppendHtml(HtmlStream);
            byte[] file = builder.RenderPdf();
            File.WriteAllBytes(TargetFile.ToString(), file);

            iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(TargetFile.ToString());
            ModifiedFileName = TargetFile.ToString();
            ModifiedFileName = ModifiedFileName.Insert(ModifiedFileName.Length - 4, "1");

            string password = "password";
            iTextSharp.text.pdf.PdfEncryptor.Encrypt(reader, new FileStream(ModifiedFileName, FileMode.Append), iTextSharp.text.pdf.PdfWriter.STRENGTH128BITS, password, "", iTextSharp.text.pdf.PdfWriter.AllowPrinting);
            //http://aspnettutorialonline.blogspot.com/
            reader.Close();
            if (File.Exists(TargetFile.ToString()))
                File.Delete(TargetFile.ToString());
            FinalFileName = ModifiedFileName.Remove(ModifiedFileName.Length - 5, 1);
            File.Copy(ModifiedFileName, FinalFileName);
            if (File.Exists(ModifiedFileName))
                File.Delete(ModifiedFileName);

        }
        catch (Exception ex)
        {
            throw ex;
        }
    }

You can download the sample file. Just place the html you want to convert in the files folder and run. It will automatically generate the pdf file and place it in the same folder. But in your case, you can specify your html path in the htmlFileName variable.

Convert HTML to pdf in asp.net c# using itextSharp, In this article, I will show you how to convert html to pdf in c# using iTextSharp. For that, you need to Download the iTextSharp PDF library and  Then the same HTML will be converted to PDF file using the iTextSharp HTML to PDF conversion library and then later the PDF file is downloaded using iTextSharp XMLWorkerHelper library in ASP.Net MVC Razor. Download Free Files API. Download Free Files API. In this article I will explain with an example, how to use the iTextSharp HTML to PDF

ASP.NET : How to Generate PDF from HTML with iTextSharp, We can get iTextSharp reference using package manager, we just need to execute following command to download and add reference. PM>  The HTML string will be exported and downloaded as PDF file using iTextSharp XMLWorkerHelper library in ASP.Net with C# and VB.Net. TAGs: ASP.Net, iTextSharp, HTML Here Mudassar Ahmed Khan has explained with an example, how to export HTML string to PDF file using iTextSharp in ASP.Net with C# and VB.Net.

Convert HTML to PDF using iTextSharp, i hve no idea about using itextsharp..m trying a lot by searching google but not getting satisfying option.anyone please provide some gud  If you are converting html to pdf on the html server side you can use Rotativa : Install-Package Rotativa This is based on wkhtmltopdf but it has better css support than iTextSharp has and is very simple to integrate with MVC (which is mostly used) as you can simply return the view as pdf: public ActionResult GetPdf() { //

HTML to PDF using iTextSharp Library In ASP.NET, Our article about How to convert HTML to PDF using iTextSharp Library In ASP.​NET. We will show you how to Export HTML DIV contents to  Quite a long answer, but taking a look at questions here at SO tagged html, pdf, and itextsharp, as of this writing (2016-02-23) there are 776 results against 4,063 total tagged itextsharp - that's 19%.

Comments
  • Very nice example.Thanks.
  • The code declares a "new Document()" and comments that this Document type is an "iTextSharp Document." This reference should be namespaced completely as "iTextSharp.text.Document()". The project where I'm using iTextSharp already had a Document class and I had to dig through the iTextSharp namespace to correct the reference.
  • the line with "iTextSharp.text.html.simpleparser.HTMLWorker(doc))" says 'htmlWorker' is obsolete with 5.5.10. What should this be changed to?
  • iTextSharp.tool namespace give me an error that is not exists, and I also get iTextSharp.text.html.simpleparser.HTMLWorker(doc)) is obsolete Version (5.5.8.0)
  • In case anyone is looking for solution to iTextSharp.tool, you have to execute NuGet command: Install-Package itextsharp.xmlworker
  • This code works properly. but i cannot view the pdf file when it is downloaded. what am i doing wrong?
  • if this code works for you, you should be able to see your PDF after downloading. I would suggest that you post a question with your code to review and see where's the error. Also try to run your code from different browsers and see if you would be able to view the PDF or not.
  • @rst Updated. Other answers mention iText library too (without noting its commercial nature), — You might want to nitpick them as well...
  • I dont nitpick.
  • Ah, ok. It looks like some TestPDF.HtmlToPdfBuilder utility class is used here to do the actual conversion. [...] I just downloaded it. It turns out to essentially be a wrapper for the iTextSharp HTMLWorker class which meanwhile has been deprecated / obsoleted.
  • what is the TestPDF in the CreatePDFFromHTMLFile() method