Using Hibernate's ScrollableResults to slowly read 90 million records

java hibernate example
hibernate javatpoint
hibernate tutorial pdf
hibernate 5 tutorial
send data to database using hibernate
hibernate architecture
hibernate annotations
insert data using hibernate in spring boot

I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate:

ScrollableResults results = session.createQuery("SELECT person FROM Person person")
            .setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY);
while (results.next())
    storeInFile(results.get()[0]);

The problem is the above will try and load all 90 million rows into RAM before moving on to the while loop... and that will kill my memory with OutOfMemoryError: Java heap space exceptions :(.

So I guess ScrollableResults isn't what I was looking for? What is the proper way to handle this? I don't mind if this while loop takes days (well I'd love it to not).

I guess the only other way to handle this is to use setFirstResult and setMaxResults to iterate through the results and just use regular Hibernate results instead of ScrollableResults. That feels like it will be inefficient though and will start taking a ridiculously long time when I'm calling setFirstResult on the 89 millionth row...

UPDATE: setFirstResult/setMaxResults doesn't work, it turns out to take an unusably long time to get to the offsets like I feared. There must be a solution here! Isn't this a pretty standard procedure?? I'm willing to forgo Hibernate and use JDBC or whatever it takes.

UPDATE 2: the solution I've come up with which works ok, not great, is basically of the form:

select * from person where id > <offset> and <other_conditions> limit 1

Since I have other conditions, even all in an index, it's still not as fast as I'd like it to be... so still open for other suggestions..

Using setFirstResult and setMaxResults is your only option that I'm aware of.

Traditionally a scrollable resultset would only transfer rows to the client on an as required basis. Unfortunately the MySQL Connector/J actually fakes it, it executes the entire query and transports it to the client, so the driver actually has the entire result set loaded in RAM and will drip feed it to you (evidenced by your out of memory problems). You had the right idea, it's just shortcomings in the MySQL java driver.

I found no way to get around this, so went with loading large chunks using the regular setFirst/max methods. Sorry to be the bringer of bad news.

Just make sure to use a stateless session so there's no session level cache or dirty tracking etc.

EDIT:

Your UPDATE 2 is the best you're going to get unless you break out of the MySQL J/Connector. Though there's no reason you can't up the limit on the query. Provided you have enough RAM to hold the index this should be a somewhat cheap operation. I'd modify it slightly, and grab a batch at a time, and use the highest id of that batch to grab the next batch.

Note: this will only work if other_conditions use equality (no range conditions allowed) and have the last column of the index as id.

select * 
from person 
where id > <max_id_of_last_batch> and <other_conditions> 
order by id asc  
limit <batch_size>

Hibernate Tutorial, It performs powerful object-relational mapping and query databases using HQL and SQL. Hibernate is a great tool for ORM mappings in Java. It can cut down a lot  You can enable hibernate by logging in to Windows 10 with an administrator account and performing these steps: Click the "Start" button. Type "power options" in the search box. Select the "Power Options – Control panel" entry in the results. Click the "Choose what the power buttons do" link in

You should be able to use a ScrollableResults, though it requires a few magic incantations to get working with MySQL. I wrote up my findings in a blog post (http://www.numerati.com/2012/06/26/reading-large-result-sets-with-hibernate-and-mysql/) but I'll summarize here:

"The [JDBC] documentation says:

To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
                java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

This can be done using the Query interface (this should work for Criteria as well) in version 3.2+ of the Hibernate API:

Query query = session.createQuery(query);
query.setReadOnly(true);
// MIN_VALUE gives hint to JDBC driver to stream results
query.setFetchSize(Integer.MIN_VALUE);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
// iterate over results
while (results.next()) {
    Object row = results.get();
    // process row then release reference
    // you may need to evict() as well
}
results.close();

This allows you to stream over the result set, however Hibernate will still cache results in the Session, so you’ll need to call session.evict() or session.clear() every so often. If you are only reading data, you might consider using a StatelessSession, though you should read its documentation beforehand."

Hibernate example, When we use hibernate with JPA we are actually using the Hibernate JPA implementation. The benefit of this is that we can swap out hibernates implementation  Hibernate is a popular persistence engine that provides a simple, yet powerful, alternative to using standard entity beans. Hibernate runs in almost any application server, or even outside of an application server completely.

Set fetch size in query to an optimal value as given below.

Also, when caching is not required, it may be better to use StatelessSession.

ScrollableResults results = session.createQuery("SELECT person FROM Person person")
        .setReadOnly(true)
        .setFetchSize( 1000 ) // <<--- !!!!
        .setCacheable(false).scroll(ScrollMode.FORWARD_ONLY)

Hibernate - Examples, Let us now take an example to understand how we can use Hibernate to provide steps involved in creating a Java Application using Hibernate technology. Using Hibernate in a Web Application In this tutorial, you use the NetBeans IDE to create and deploy a web application that displays data from a database. The web application uses the Hibernate framework as the persistence layer for retrieving and storing plain old Java objects (POJOs) to a relational database.

FetchSize must be Integer.MIN_VALUE, otherwise it won't work.

It must be literally taken from the official reference: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html

Learn Hibernate Tutorial, In this hibernate tutorial for beginners and professionals with inheritance mapping, collection mapping, Hibernate using XML · Hibernate using Annotation. Hibernate is a Java framework that simplifies the development of Java application to interact with the database. It is an open source, lightweight, ORM (Object Relational Mapping) tool. Hibernate implements the specifications of JPA (Java Persistence API) for data persistence.

Actually you could have gotten what you wanted -- low-memory scrollable results with MySQL -- if you had used the answer mentioned here:

Streaming large result sets with MySQL

Note that you will have problems with Hibernate lazy-loading because it will throw an exception on any queries performed before the scroll is finished.

Hibernate with JPA Annotation Tutorial, Hibernate Example using Annotation in Eclipse. The hibernate application can be created with annotation. There are many annotations that can be used to  Tap or click Shut down or sign out and choose Hibernate. Or, swipe in from the right edge of the screen and then tap Settings. (If you're using a mouse, point to the lower-right corner of the screen, move the mouse pointer up, and then click Settings.) Tap or click Power > Hibernate.

Using Hibernate in a Java Swing Application, The tutorial demonstrates the support for the Hibernate framework included in the IDE and how to use wizards to create the necessary Hibernate files. After  For example, devices with InstantGo don't have the hibernate option. How to add hibernation option to Power menu on Windows 10. To add the hibernation option to the Power menu on Start, use these

Using Hibernate in a Web Application, Demonstrates how to add support for Hibernate to the IDE and use Hibernate with JSF components in a Web application. When to use hibernate mode. Hibernate mode is a great option for laptop and tablet users who don't know where the next power outlet will be, as you won't see you battery deplete.

Hibernate Getting Started Guide, 2.5. Take it further! 3. Tutorial Using Native Hibernate APIs and Annotation  When To Hibernate: Hibernate saves more power than sleep. If you won’t be using your PC for a while—say, if you’re going to sleep for the night—you may want to hibernate your computer to save electricity and battery power. Hibernate is slower to resume from than sleep.

Comments
  • You may be able to partition your data so you don't have to read as much at a time, ref: stackoverflow.com/questions/8325745/…
  • Using a StatelessSession is especially nice tip!
  • setFirstResult and setMaxResults is not a viable option. I was right in my guess that it would be unusably slow. Maybe that works for tiny tables, but very quickly it just takes way too long. You can test this in the MySQL console by simply running "select * from anything limit 1 offset 3000000". That might take 30 minutes...
  • Running "select * from geoplanet_locations limit 1 offset 1900000;" against the YAHOO Geoplanet dataset (5 mil rows), returns in 1.34 seconds. If you have enough RAM to keep the index in RAM then I think your 30 minutes numbers are aways off. Funnily enough "select * from geoplanet_locations where id > 56047142 limit 10;" returns in essentially no time (regular client just returns 0.00).
  • @Michael How did you find out that MySQL connector fakes the scrolling ? Is it written somewhere ? I am interested because I like to use the scroll feature with NHibernate and I am using mysql connector for .NET and I like to check if also Mysql .Net connector also fakes it, or depends on the version ?
  • Anyone know if MySQL connector faking the scroll is still the case?
  • Why would you Session#flush() with a read-only session ? Are you sure you did not mean Session#evict(row) or Session#clear() which would help keep level-1 cache size under control.
  • (for followers, the code example used to mention flush but now mentions evict or clear)
  • This is the way to go. See javaquirks.blogspot.dk/2007/12/mysql-streaming-result-set.html for additional reference.
  • So are you guys saying that for MYSql use Integer.MIN_VALUE but for Oracle or others you should set the fetch size to a reasonable number?