How to load a large xlsx file with Apache POI?

apache poi streaming write
out of memory error - java heap space while reading excel
read and write large excel file in java
apache poi tutorial
read excel file in java efficiently
apache poi sax parser example
apache poi memory issues
spring boot read large excel file

I have a large .xlsx file (141 MB, containing 293413 lines with 62 columns each) I need to perform some operations within.

I am having problems with loading this file (OutOfMemoryError), as POI has a large memory footprint on XSSF (xlsx) workbooks.

This SO question is similar, and the solution presented is to increase the VM's allocated/maximum memory.

It seems to work for that kind of file-size (9MB), but for me, it just simply doesn't work even if a allocate all available system memory. (Well, it's no surprise considering the file is over 15 times larger)

I'd like to know if there is any way to load the workbook in a way it won't consume all the memory, and yet, without doing the processing based (going into) the XSSF's underlying XML. (In other words, maintaining a puritan POI solution)

If there isn't tough, you are welcome to say it ("There isn't.") and point me the ways to a "XML" solution.


I was in a similar situation with a webserver environment. The typical size of the uploads were ~150k rows and it wouldn't have been good to consume a ton of memory from a single request. The Apache POI Streaming API works well for this, but it requires a total redesign of your read logic. I already had a bunch of read logic using the standard API that I didn't want to have to redo, so I wrote this instead: https://github.com/monitorjbl/excel-streaming-reader

It's not entirely a drop-in replacement for the standard XSSFWorkbook class, but if you're just iterating through rows it behaves similarly:

import com.monitorjbl.xlsx.StreamingReader;

InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
StreamingReader reader = StreamingReader.builder()
        .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
        .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
        .sheetIndex(0)        // index of sheet to use (defaults to 0)
        .read(is);            // InputStream or File for XLSX file (required)

for (Row r : reader) {
  for (Cell c : r) {
    System.out.println(c.getStringCellValue());
  }
}     

There are some caveats to using it; due to the way XLSX sheets are structured, not all data is available in the current window of the stream. However, if you're just trying to read simple data out from the cells, it works pretty well for that.

Anatomy of an Excel File and Large Excel File Operation With the , A tutorial on how to use the POI library, created by Apache, to write Java code as a means of optimizing Excel's ability to perform analyses and  An XLSX file is nothing but a Microsoft Excel Open XML Spreadsheet File that has been created by Microsoft Excel 2007 and later versions. Let’s see how to read XLSX file in Java using Apache POI. Reading an Excel XLSX File. In your existing code you need to first specify the location of your XLSX file.


A improvement in memory usage can be done by using a File instead of a Stream. (It is better to use a streaming API, but the Streaming API's have limitations, see http://poi.apache.org/spreadsheet/index.html)

So instead of

Workbook workbook = WorkbookFactory.create(inputStream);

do

Workbook workbook = WorkbookFactory.create(new File("yourfile.xlsx"));

This is according to : http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream

Files vs InputStreams

"When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory consumption, while an InputStream requires more memory as it has to buffer the whole file."

HSSF and XSSF Limitations, There are some inherent limits in the Excel file formats. For huge files using the default POI classes you will likely need a very large amount of memory. Split Large Excel File into smaller excel files using Apache POI. This application is a utility that is used to convert a Large Excel Spreadsheet to different smaller excel sheets which can be further used for various applications.


The Excel support in Apache POI, HSSF and XSSF, supports 3 different modes.

One is a full, DOM-Like in-memory "UserModel", which supports both reading and writing. Using the common SS (SpreadSheet) interfaces, you can code for both HSSF (.xls) and XSSF (.xlsx) basically transparently. However, it needs lots of memory.

POI also supports a streaming read-only way to process the files, the EventModel. This is much more low-level than the UserModel, and gets you very close to the file format. For HSSF (.xls) you get a stream of records, and optionally some help with handling them (missing cells, format tracking etc). For XSSF (.xlsx) you get streams of SAX events from the different parts of the file, with help to get the right part of the file and also easy processing of common but small bits of the file.

For XSSF (.xlsx) only, POI also supports a write-only streaming write, suitable for low level but low memory writing. It largely just supports new files though (certain kinds of append are possible). There is no HSSF equivalent, and due to back-and-forth byte offsets and index offsets in many records it would be pretty hard to do...

For your specific case, as described in your clarifying comments, I think you'll want to use the XSSF EventModel code. See the POI documentation to get started, then try looking at these three classes in POI and Tika which use it for more details.

Reading big Excel files with POI, The event API is separated into different parts for different Excel file types. For reading big Excel files in xlsx format introduced with Excel 2007 you can use the XSSF and SAX (Event API). Basically you will parse the underlying XML and won't rely on the POI usermodel API anymore. I'm using the Apache POI library to write an Excel file with a large data set retrieved from a ResultSet object. The data could range from a few thousand records to about 1 million; not sure how this translates into file system bytes in Excel format.


POI now includes an API for these cases. SXSSF http://poi.apache.org/spreadsheet/index.html It does not load everything on memory so it could allow you to handle such file.

Note: I have read that SXSSF works as a writing API. Loading should be done using XSSF without inputstream'ing the file (to avoid a full load of it in memory)

java How to load a large xlsx file with Apache POI?, xls HSSFWorkbook, or a .xlsx XSSFWorkbook, the Workbook can be loaded from either a File or an InputStream. Using a File object allows for lower memory  HSSF (Horrible SpreadSheet Format): HSSF implementations of POI’s high-level interfaces like HSSFWorkbook, HSSFSheet, HSSFRow and HSSFCell are used to work with excel files of the older binary file format - .xls. XSSF (XML SpreadSheet Format): XSSF implementations are used to work with the newer XML based file format - .xlsx.


Check this post. I show how to use SAX parser to process an XLSX file.

https://stackoverflow.com/a/44969009/4587961

In short, I extended org.xml.sax.helpers.DefaultHandler whih processes XML structure for XLSX filez. t is event parser - SAX.

class SheetHandler extends DefaultHandler {

    private static final String ROW_EVENT = "row";
    private static final String CELL_EVENT = "c";

    private SharedStringsTable sst;
    private String lastContents;
    private boolean nextIsString;

    private List<String> cellCache = new LinkedList<>();
    private List<String[]> rowCache = new LinkedList<>();

    private SheetHandler(SharedStringsTable sst) {
        this.sst = sst;
    }

    public void startElement(String uri, String localName, String name,
                             Attributes attributes) throws SAXException {
        // c => cell
        if (CELL_EVENT.equals(name)) {
            String cellType = attributes.getValue("t");
            if(cellType != null && cellType.equals("s")) {
                nextIsString = true;
            } else {
                nextIsString = false;
            }
        } else if (ROW_EVENT.equals(name)) {
            if (!cellCache.isEmpty()) {
                rowCache.add(cellCache.toArray(new String[cellCache.size()]));
            }
            cellCache.clear();
        }

        // Clear contents cache
        lastContents = "";
    }

    public void endElement(String uri, String localName, String name)
            throws SAXException {
        // Process the last contents as required.
        // Do now, as characters() may be called more than once
        if(nextIsString) {
            int idx = Integer.parseInt(lastContents);
            lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
            nextIsString = false;
        }

        // v => contents of a cell
        // Output after we've seen the string contents
        if(name.equals("v")) {
            cellCache.add(lastContents);
        }
    }

    public void characters(char[] ch, int start, int length)
            throws SAXException {
        lastContents += new String(ch, start, length);
    }

    public List<String[]> getRowCache() {
        return rowCache;
    }
}

And then I parse the XML presending XLSX file

private List<String []> processFirstSheet(String filename) throws Exception {
    OPCPackage pkg = OPCPackage.open(filename, PackageAccess.READ);
    XSSFReader r = new XSSFReader(pkg);
    SharedStringsTable sst = r.getSharedStringsTable();

    SheetHandler handler = new SheetHandler(sst);
    XMLReader parser = fetchSheetParser(handler);
    Iterator<InputStream> sheetIterator = r.getSheetsData();

    if (!sheetIterator.hasNext()) {
        return Collections.emptyList();
    }

    InputStream sheetInputStream = sheetIterator.next();
    BufferedInputStream bisSheet = new BufferedInputStream(sheetInputStream);
    InputSource sheetSource = new InputSource(bisSheet);
    parser.parse(sheetSource);
    List<String []> res = handler.getRowCache();
    bisSheet.close();
    return res;
}

public XMLReader fetchSheetParser(ContentHandler handler) throws SAXException {
    XMLReader parser = new SAXParser();
    parser.setContentHandler(handler);
    return parser;
}

Need help in reading large XLSX file using poi 3.16, Actually I have to read a large XLSX file haing 200 columns and 30000 rows. I have tried both SAX parsing (http://poi.apache.org/spreadsheet/how-to.html#​xssf_sax_api) //reset rowcount for loading header of new file import The code is stale with current implementation of the Apache POI API, as the endRow() api provides the current row number that has finished to be processing. With that code snippet it should be trivial for your to parse a big XLSX file cell by cell. E.g. for each sheet; for each row cell; row has ended event.


OutOfMemoryError in processing large xlsx file (167 MB) using , I am using Apache POI version3.8 to process xlsx file (size-167 MB with to stream the data partly instead of loading entire file using Workbook  Read Huge Excel file(500K rows) in java. I am trying to read a Big XLSX File. The Excel file has around 500K rows.I need to read col 2. at org.apache.poi.xssf


Parsing Huge XLSX files in fastest possible way?, Apache POI is the most common open source Java library used in the XLSX file parsing OPCPackage opcPackage = OPCPackage.open("put xlsx file path")  Consider using the Streaming version of POI. This will load a subset of the file into memory as needed. It is the recommended method when dealing with large files. My impression is the streaming version of POI only applies to writing files, not reading files.


Not able read large excel files with 1 million rows · Issue #73 , OutOfMemoryError: Java heap space I am not able attach my excel file which is around 34MB( getting errror Yowza that's a big file. Try again with a file .open(​is); // InputStream or File for XLSX file (required) Sheet sheet XmlObjectBase.​xmlText(XmlObjectBase.java:1500) at org.apache.poi.xssf.model. Reading Microsoft Excel XLSX files in Java. The XSSFWorbook is the root object modeling an Excel XLSX file in the Apache POI Library. It gives us the following code to load the file : Reading