Vikas Patel Feb,08 2017
Timeout is a well know issue not only with POI spreadsheet, but it applies to process which consumes resources. As the data grows, you'll need more CPU power and more memory. If you are using MS SQL or MY SQL, then you must have created a query with paging concept. Limiting number of records per request will help you to get rid of the timeout issues.
Our client runs a daily report for their business. As their business grow, report data also grows and this leads a timeout issue. On database side, we were already using paginated query and the query was fine tuned with database indexing. Our report runs that query and collect all data to be exported in excel file. We are using ColdFusion POI utility to create an excel file. POI provides numbers of features for formatting with custom tag approach.
We faced timeout issue with around 200 columns and more that 1000 rows. We thought POI is taking much time to create an excel file. So we thought we can use HTML table when we have a large amount of data. But since we had used custom tags in POI, we thought we can also create another custom tag library for generating HTML codes only. This way we will not have to change anything that we wrote for creating an excel file.
We created custom tags for document, sheet, column, rows and cell, which all just create string for HTML code. We spent half day for creating a library. Since HTML cannot have more than one sheet, our custom tag was actually creating different files for each sheet and once all files are ready we create a zip file. This way we thought we found a way to create an excel file that can support large number of rows. But when we tested it we discovered that it didn't help us.
We created an HTML file with the help of custom tag. We thought this would be a minimalist change existing code. After testing we realized that it is the custom tag that takes much amount of time. When we developed our custom tag, we also found another scope called CALLER scope. I think this can be separate story to tell, but for now let's come back to our main issue of timeout. We thought since custom tag maintains a scope for each base tags and may be this can take much amount of memory and may lead to java heap space issues.
We started thinking of different approaches. The first pops up csv in our mind. csv can be created in batch. For each page we just need to append data to main csv file. So it is definitely powerful in term of speed. But it is ugly. You can't format the data. So we eliminated this option.
Second option is to use the original cfspreadsheet tag. I am not a big fan of it. In fact it is not supported in Lucee. However we can use POI extension. So this time we run some test before we implement any changes. Surprisingly we found good performance on Lucee but we faced memory out of issues and timeout issues in ColdFusion. Our server is still running on CF9. We are actually waiting for Lucee 5 release. We thought if we can use custom cfspreadsheet extension in ColdFusion. But the extension has code syntax that do not supported in ColdFusion so we thought we should not spend time to make extension compatible with ColdFusion since we already have a plan to migrate in Lucee. This is how eliminated this option as well.
Also I wanted to mention the pros on cons for existing POI approach. POI is easy to read with tag based approach. But actual POI only supports 2003 format which is .xls file. However we've modified POI utility code to create .xlsx file. You can find it on github here. The only issue with POI is that it can't handle large amount of data.
We may use actual java code to create excel file, but this approach may not be an easy since our ColdFusion code needs to interact with java code and this may not be a time saving approach. We also thought about Google Sheet API or MS office API. As of now MS office isn't providing any api. Google does provide a REST api but I think they will shutdown to promote their google app script. With the help of Google App Script, we can create rich excel file since it provides tons of features. We only need to make a bridge between ColdFusion and Google App Script. As per my thinking if we can produce a JSON file containing all the data as well as formatting specification, we can have the Google app script to read the JSON file and create an excel file based on the formatting specified in JSON file. This can be our next micro project!
Down the line, we had to make a daily report in working condition as soon as possible. To give a quick solution we converted each cell in td tr format and made an HTML export. This is the quickest solution. Its fast. the only downside is that we can only create .xls file which has a limitation of 256 columns. It does support background colors, row height, column width etc. but it can't freeze any row or column. Personally I do not like this approach but we'll explore possibilities with Google App Script.