How to add an altChunk element to a XWPFDocument using Apache POI

I would like to add HTML as an altChunk to a DOCX file using Apache POI. I know that doc4jx can do this with a simpler API but for technical reasons I need to use Apache POI.

Using the CT classes to do low level stuff with the xml is a little tricky. I can create an altChunk with following code:

import java.io.File;
import java.io.FileOutputStream;

import javax.xml.namespace.QName;

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.xmlbeans.impl.values.XmlComplexContentImpl;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDocument1;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTBodyImpl;

public class AltChunkTest {
    public static void main(String[] args) throws Exception  {
        XWPFDocument doc = new XWPFDocument();
        doc.createParagraph().createRun().setText("AltChunk below:");
        QName ALTCHUNK =  new QName ( "http://schemas.openxmlformats.org/wordprocessingml/2006/main" ,  "altChunk" ) ; 
        CTDocument1 ctDoc = doc.getDocument() ; 
        CTBodyImpl ctBody =  (CTBodyImpl) ctDoc. getBody(); 
        XmlComplexContentImpl xcci =  ( XmlComplexContentImpl ) ctBody.get_store().add_element_user(ALTCHUNK); 
        // what's need to now add "<b>Hello World!</b>"
        FileOutputStream out = new FileOutputStream(new File("test.docx"));
        doc.write(out);
    }
}

But how do I add the html content to 'xcci' it now?

In Office Open XML for Word (*.docx) the altChunk provides a method for using pure HTML to describe document parts.

Two important notes about altChunk:

First: It is used only for importing content. If you open the document using Word and save it, the newly saved document will not contain the alternative format content part, nor the altChunk markup that references it. Word saves all imported content as default Office Open XML elements.

Second: Most applications except Word which are able reading *.docx too will not reading the altChunk content at all. For example Libreoffice or OpenOffice Writer will not reading the altChunk content as well as apache poi will not reading the altChunk content when opening a *.docx file.

How is altChunk implemented in the *.docx ZIP file structure?

There are /word/*.html files in the *.docx ZIP file. Those are referenced by Id in /word/document.xml as <w:altChunk r:id="htmlDoc1"/> for example. The relation between the Ids and the /word/*.html files are given in /word/_rels/document.xml.rels as <Relationship Id="htmlDoc1" Target="htmlDoc1.html" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk"/> for example.

So we need at first POIXMLDocumentParts for the /word/*.html files and POIXMLRelations for the relation between the Ids and the /word/*.html files. Following code provides that by having a wrapper class which extends POIXMLDocumentPart for the /word/htmlDoc#.html files in the *.docx ZIP archive. This also provides methods for manipulating the HTML. Also it provides a method for creating the /word/htmlDoc#.html files in the *.docx ZIP archive and creating relations to it.

Code:

import java.io.*;

import org.apache.poi.*;
import org.apache.poi.ooxml.*;
import org.apache.poi.openxml4j.opc.*;

import org.apache.poi.xwpf.usermodel.*;

public class CreateWordWithHTMLaltChunk {

 //a method for creating the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive  
 //String id will be htmlDoc#.
 private static MyXWPFHtmlDocument createHtmlDoc(XWPFDocument document, String id) throws Exception {
  OPCPackage oPCPackage = document.getPackage();
  PackagePartName partName = PackagingURIHelper.createPartName("/word/" + id + ".html");
  PackagePart part = oPCPackage.createPart(partName, "text/html");
  MyXWPFHtmlDocument myXWPFHtmlDocument = new MyXWPFHtmlDocument(part, id);
  document.addRelation(myXWPFHtmlDocument.getId(), new XWPFHtmlRelation(), myXWPFHtmlDocument);
  return myXWPFHtmlDocument;
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument();

  XWPFParagraph paragraph;
  XWPFRun run;
  MyXWPFHtmlDocument myXWPFHtmlDocument;

  paragraph = document.createParagraph();
  run = paragraph.createRun();
  run.setText("Default paragraph followed by first HTML chunk.");

  myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc1");
  myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>",
   "<body><p>Simple <b>HTML</b> <i>formatted</i> <u>text</u></p></body>"));
  document.getDocument().getBody().addNewAltChunk().setId(myXWPFHtmlDocument.getId());  

  paragraph = document.createParagraph();
  run = paragraph.createRun();
  run.setText("Default paragraph followed by second HTML chunk.");

  myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc2");
  myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>",
   "<body>" +
   "<table>"+
   "<caption>A table></caption>" +
   "<tr><th>Name</th><th>Date</th><th>Amount</th></tr>" +
   "<tr><td>John Doe</td><td>2018-12-01</td><td>1,234.56</td></tr>" +
   "</table>" +
   "</body>"
   ));
  document.getDocument().getBody().addNewAltChunk().setId(myXWPFHtmlDocument.getId());  

  FileOutputStream out = new FileOutputStream("CreateWordWithHTMLaltChunk.docx");
  document.write(out);
  out.close();
  document.close();

 }

 //a wrapper class for the  htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive
 //provides methods for manipulating the HTML
 //TODO: We should *not* using String methods for manipulating HTML!
 private static class MyXWPFHtmlDocument extends POIXMLDocumentPart {

  private String html;
  private String id;

  private MyXWPFHtmlDocument(PackagePart part, String id) throws Exception {
   super(part);
   this.html = "<!DOCTYPE html><html><head><style></style><title>HTML import</title></head><body></body>";
   this.id = id;
  }

  private String getId() {
   return id;
  }

  private String getHtml() {
   return html;
  }

  private void setHtml(String html) {
   this.html = html;
  }

  @Override
  protected void commit() throws IOException {
   PackagePart part = getPackagePart();
   OutputStream out = part.getOutputStream();
   Writer writer = new OutputStreamWriter(out, "UTF-8");
   writer.write(html);
   writer.close();
   out.close();
  }

 }

 //the XWPFRelation for /word/htmlDoc#.html
 private final static class XWPFHtmlRelation extends POIXMLRelation {
  private XWPFHtmlRelation() {
   super(
    "text/html", 
    "http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk", 
    "/word/htmlDoc#.html");
  }
 }
}

Note: Because of using altChunk this code needs the full jar of all of the schemas ooxml-schemas-*.jar as mentioned in apache poi faq-N10025.

Result:

Re: Merge Word docx into another Word dox via XWPFDocument , And my solution is in using AltChunk element of OOXML format. Here is my topic about adding AltChunk throught the Apache POI library  Add a new paragraph at position of the cursor. The cursor must be on the XmlCursor.TokenType.START tag of an subelement of the documents body. When this method is done, the cursor passed as parameter points to the XmlCursor.TokenType.END of the newly inserted paragraph.

Based on Axel Richter's answer, I replaced the call to CTBody.addNewAltChunk() with CTBodyImpl.get_store().add_element_user(QName) which eliminates the added 15MB dependency on ooxml-schemas. Since this is being used in a desktop app, we are trying to keep the app size as small as possible. In case it may be of help to anyone else:

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;

import javax.xml.namespace.QName;

import org.apache.poi.ooxml.POIXMLDocumentPart;
import org.apache.poi.ooxml.POIXMLRelation;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.opc.PackagePartName;
import org.apache.poi.openxml4j.opc.PackagingURIHelper;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.xmlbeans.SimpleValue;
import org.apache.xmlbeans.impl.values.XmlComplexContentImpl;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTBodyImpl;

public class AltChunkTest {
    public static void main(String[] args) throws Exception  {
        XWPFDocument doc = new XWPFDocument();
        doc.createParagraph().createRun().setText("AltChunk below:");
        addHtml(doc,"chunk1","<!DOCTYPE html><html><head><style></style><title></title></head><body><b>Hello World!</b></body></html>");
        FileOutputStream out = new FileOutputStream(new File("test.docx"));
        doc.write(out);
    }

    static void addHtml(XWPFDocument doc, String id,String html) throws Exception {
        OPCPackage oPCPackage = doc.getPackage();
        PackagePartName partName = PackagingURIHelper.createPartName("/word/" + id + ".html");
        PackagePart part = oPCPackage.createPart(partName, "text/html");
        class HtmlRelation extends POIXMLRelation {
            private HtmlRelation() {
                super(  "text/html",
                        "http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk",
                        "/word/htmlDoc#.html");
            }
        }
        class HtmlDocumentPart extends POIXMLDocumentPart {
            private HtmlDocumentPart(PackagePart part) throws Exception {
                super(part);
            }

            @Override
            protected void commit() throws IOException {
                try (OutputStream out = part.getOutputStream()) {
                    try (Writer writer = new OutputStreamWriter(out, "UTF-8")) {
                        writer.write(html);
                    }
                }
            }
        };
        HtmlDocumentPart documentPart = new HtmlDocumentPart(part);
        doc.addRelation(id, new HtmlRelation(), documentPart);
        CTBodyImpl b = (CTBodyImpl) doc.getDocument().getBody();
        QName ALTCHUNK = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "altChunk");
        XmlComplexContentImpl altchunk = (XmlComplexContentImpl) b.get_store().add_element_user(ALTCHUNK);
        QName ID = new QName("http://schemas.openxmlformats.org/officeDocument/2006/relationships", "id");
        SimpleValue target = (SimpleValue)altchunk.get_store().add_attribute_user(ID);
        target.setStringValue(id);
    }
}

How to set the global font of word file via Apache poi?, I'm a newbie of apache poi, and am using poi to write some data to docx file. getResourceAsStream("empty.docx"); XWPFDocument document = new XWPFDocument(input); In the empty.docx can I prevent the document from resetting the font when adding a AltChunk? ShowTextAligned(​officialUseCanvas, Element. [ Natty] java How to add an altChunk element to a XWPFDocument using Apache POI By: Felipe 1.5; [ Natty ] c++ Why no push/pop in front of vector? By: Ori 1.0 ;

This feature is ok in poi-ooxml 4.0.0, where the class POIXMLDocumentPart and POIXMLRelation are in the package org.apache.poi.ooxml.*

import org.apache.poi.ooxml.POIXMLDocumentPart;
import org.apache.poi.ooxml.POIXMLRelation;

But how we can use in poi-ooxml 3.9, where the class are little different and in the org.apache.poi.*

import org.apache.poi.POIXMLDocumentPart;
import org.apache.poi.POIXMLRelation;

Add Border to Paragraph in Word Using POI, Classes / Interfaces Used. XWPFDocument. org.apache.poi.xwpf.usermodel.​XWPFDocument is used to create the MS Word Docs in the .docx formats. XWPFRun is used to add/change the text in a specified element/region. For basic text extraction, make use of org.apache.poi.xwpf.extractor.XWPFWordExtractor. It accepts an input stream or a XWPFDocument. The getText() method can be used to get the text from all the paragraphs, along with tables, headers etc.

Add Paragraph to Word Doc Using POI, In the last post we saw how to Create Word Doc file in Java Using POI. Now in this post we XWPFDocument is used to create the MS Word Docs in the .docx formats. XWPFRun is used to add/change the text in a specified element/region​. 5 Creating custom color styles for an XSSFWorkbook in Apache POI 4.0 Sep 16 '18 1 How to add an altChunk element to a XWPFDocument using Apache POI Dec 2 '18 1 JavaFx Snapshot of content on hidden Tab Dec 2 '18

Apache POI Word - Paragraph, Apache POI Word - Paragraph - In this chapter you will learn how to create a new XWPFDocument(); //Create a blank spreadsheet XWPFParagraph paragraph = document. You can enter the text or any object element, using Run. Using  Apache POI is a Java library for working with the various file formats based on the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). This tutorial focuses on the support of Apache POI for Microsoft Word, the most commonly used Office file format. It walks through steps needed to format and generate an MS

XSSF header image. Hello everyone, i want to place an image in the background header and footer in a XSSFSheet. I read the documentation which states that the parameter *&G* is to be used, but not