Export HTML to Word using docx4j in lucee CFML

It is very easy to export plain HTML text data to embed and export as word document. In CFML there is an inbuilt tag <cfcontent /> to use for exporting data to word document if there is no customization in content and it is just like Paste and Go for plain format type as shown in below example.

 

However, if there is a need to export header or footer data to embed and generate word docs then there is no inbuilt tag in CFML to get the solution. So, I have used docx4j JAVA Library that worked best to export and embed HTML data with header or footer to create word document.

 

docx4j-html-to-word

 

Now, let us see further

 

Error when loading gists from https://gist.github.com/.

<cfheader name="Content-Disposition" value="inline; filename=ExportMsWord.doc" charset="utf-8">

<cfcontent type="application/msword; charset=utf-8">

<h2 align="center">
    <span>Download Word Document</span>
</h2>
<table cellspacing="0" cellpadding="0" border="1">
    <thead>
        <tr>
            <th>
                Name
            </th>
            <th>
                Email
            </th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>
                ABC
            </td>
            <td>
                abc@gmail.com
            </td>
        </tr>
    </tbody>
</table>

 

As this stage, everything works like charm. Now I am using docx4j JAVA Library to export header or footer data to embed in word docs that are going to export.

 

What is docx4j?

 

docx4j is an open source (ASLv2) Java library for creating and manipulating Microsoft Open XML (Word docx, Powerpoint pptx, and Excel xlsx) files.  Docx4j can read and write MS word documents. 

You have to download docx4j files from docx4j website

 

I have implemented this library in Lucce CFML. It also works for ColdFusion.

 

How to install the docx4j library in Lucee?

  1. Put docx4j.jar with all dependencies files into “WEB-INF\lucee\lib”.
  2. Restart Lucee services to load docx4j library and that’s it.

 

Error when loading gists from https://gist.github.com/.

<cfscript>

var FileUtils = createObject("java","org.apache.commons.io.FileUtils");

var WordprocessingMLPackage = createObject("java","org.docx4j.openpackaging.packages.WordprocessingMLPackage");
            var NumberingDefinitionsPart = createObject("java","org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart");

 

var inputfilepath = Expandpath("/test.html");
var baseURL = Expandpath("/");

var file = createObject(‘java’,"java.io.File").init(inputfilepath)
var stringFromFile = FileUtils.readFileToString(file, "UTF-8");
            
var unescaped = stringFromFile;
 

var wordMLPackage = WordprocessingMLPackage.load(createObject("java","java.io.File").init(expandpath("/SampleDocument.docx")));

var ndp =NumberingDefinitionsPart.init();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();

XHTMLImporter.setHyperlinkStyle("Hyperlink");
wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert(unescaped, baseURL) );

wordMLPackage.save(createObject("java","java.io.File").init(expandpath("/ExportMsWord.docx") ));

</script>

 

docx4js worked well exporting HTML to Word as docx4j is mostly used for docx manipulation but can also handle export to pptx and xlsx.