Why would you prefer Java 8 Stream API instead of direct hibernate/sql queries when working with the DB

jdbc resultset parallel processing
when to use java streams
java stream memory usage
spring data jpa stream
hibernate stream

Recently I see a lot of code in few projects using stream for filtering objects, like:

library.stream()
          .map(book -> book.getAuthor())
          .filter(author -> author.getAge() >= 50)
          .map(Author::getSurname)
          .map(String::toUpperCase)
          .distinct()
          .limit(15)
          .collect(toList()));

Is there any advantages of using that instead of direct HQL/SQL query to the database returning already the filtered results.

Isn't the second aproach much faster?

If the data originally comes from a DB it is better to do the filtering in the DB rather than fetching everything and filtering locally.

First, Database management systems are good at filtering, it is part of their main job and they are therefore optimized for it. The filtering can also be sped up by using indexes.

Second, fetching and transmitting many records and to unmarshal the data into objects just to throw away a lot of them when doing local filtering is a waste of bandwidth and computing resources.

In Java, what are the advantages of streams over loops?, provides a caching mechanism, which helps reduce the number of hits, as much as possible, that your application makes to the database server. This will have a considerable effect regarding the performance of your application. There is no such caching mechanism available in JDBC. The makers of Java 8 recently introduced a new feature called Stream API. The Stream API along with lambda expressions can be used to perform bulk operations on a sequence of elements, but that's not all it can (or should) be used for. Multiple Stream operations can also be chained to perform a number of sequential operations.

On a first glance: streams can be made to run in parallel; just by changing code to use parallelStream(). (disclaimer: of course it depends on the specific context if just changing the stream type will result in correct results; but yes, it can be that easy).

Then: streams "invite" to use lambda expressions. And those in turn lead to usage of invoke_dynamic bytecode instructions; sometimes gaining performance advantages compared to "old-school" kind of writing such code. (and to clarify the misunderstanding: invoke_dynamic is a property of lambdas, not streams!)

These would be reasons to prefer "stream" solutions nowadays (from a general point of view).

Beyond that: it really depends ... lets have a look at your example input. This looks like dealing with ordinary Java POJOs, that already reside in memory, within some sort of collection. Processing such objects in memory directly would definitely be faster than going to some off-process database to do work there!

But, of course: when the above calls, like book.getAuthor() would be doing a "deep dive" and actually talk to an underlying database; then chances are that "doing the whole thing in a single query" gives you better performance.

15 Reasons to Choose Hibernate Over JDBC, are still a good fit for most applications because they make it very easy to implement CRUD operations. The persistence tier of most applications is not that complex. It uses a relational database with a static domain model and requires a lot of CRUD operations. Combine advanced operations of the Stream API to express rich data processing queries. Luckily there’s a solution to this problem using the method flatMap. Let’s see step-by-step how to get to the right solution. First, we need a stream of words instead of a stream of arrays. There’s a method

The first thing is to realize, that you can't tell from just this code, what statement is issued against the database. It might very well, that all the filtering, limiting and mapping is collected, and upon the invocation of collect all that information is used to construct a matching SQL statement (or whatever query language is used) and send to the database.

With this in mind there are many reasons why streamlike APIs are used.

  1. It is hip. Streams and lambdas are still rather new to most java developers, so they feel cool when they use it.

  2. If something like in the first paragraph is used it actually creates a nice DSL to construct your query statements. Scalas Slick and .Net LINQ where early examples I know about, although I assume somebody build something like it in LISP long before I was born.

  3. The streams might be reactive streams and encapsulate a non-blocking API. While these APIs are really nice because they don't force you to block resources like threads while you are waiting for results. Using them requires either tons of callbacks or using a much nicer stream based API to process the results.

  4. They are nicer to read the imperative code. Maybe the processing done in the stream can't [easily/by the author] be done with SQL. So the alternatives aren't SQL vs Java (or what ever language you are using), but imperative Java or "functional" Java. The later often reads nicer.

So there are good reasons to use such an API.

With all that said: It is, in almost all cases, a bad idea to do any sorting/filtering and the like in your application, when you can offload it to the database. The only exception I can currently think of is when you can skip the whole roundtrip to the database, because you already have the result locally (e.g. in a cache).

Should you use Hibernate for your next project?, As a Java developer, I can get the child class and deal with it instead of using complex join queries and as mentioned in previous comments, I can change db's by  Main differences between Collection and Stream API in Java 8 are: Version: Collection API is in use since Java 1.2. Stream API is recent addition to Java in version 8. Usage: Collection API is used for storing data in different kinds of data structures. Stream API is used for computation of data on a large set of Objects.

Unless measured and proven for a specific scenario either could be good or equally bad. The reason you usually want to take these kind of queries to the database is because (among other things):

DB can handle much larger data then your java process

Queries in a database can be indexed (making them much faster)

On the other hand, if your data is small, using a Stream the way you did is effective. Writing such a Stream pipeline is very readable (once you talk Streams good enough).

Why use Hibernate over using SQL directly?, The Java EE 8 platform introduces JPA 2.2, which includes more benefits such as enhanced support for streams, repeatable annotations, and more. to use. rather than list each annotation that is available for use in each recipe, i direct you to (Use 2.7+ with java ee 8): http://www.eclipse.org/eclipselink/api/2.7/org/​eclipse/  Keeping up with the new functional paradigm of Java 8 with lambdas and streams, Hibernate 5.2 also supports handling a query result set as a stream. Admittedly a small addition to the API, streams add significant value by allowing the Hibernate user to leverage streams parallelism and functional programming without creating any custom adaptors.

Well, your question should ideally be - Is it better to do reduction / filtering operations in the DB or fetch all records and do it in Java using Streams?

The answer isn't straightforward and any stats that give a "concrete" answer will not generalize to all cases.

The operations you are talking about are better done in the DB itself, because that is what DBs are designed for, very fast handling of data. Of course usually in case of relational databases, there will be some "book-keeping and locks" being used to ensure that independent transactions don't end up making the data inconsistent, but even with that, DBs do a pretty good job in filtering data, especially large data sets.

One case where I would prefer filtering data in Java code rather than in DB would be if you need to filter different features from the same data. For example, right now you are getting only the Author's surname. If you wanted to get all books written by the author, ages of authors, children of author, place of birth etc. Then it makes sense to get only one "read-only" copy from the DB and use parallel streams to get different information from the same data set.

Java EE 8 Recipes: A Problem-Solution Approach, Check out some of the best practices for Spring Boot 2 and Hibernate 5 in this Make sure you are practicing the best performance practices in your Like (110) By default, 100 inserts will result in 100 SQL INSERT statements and this Item 8: Direct Fetching Via Spring Data / EntityManager / Session. 31 Why would you prefer Java 8 Stream API instead of direct hibernate/sql queries when working with the DB Apr 9 '17 23 Did anyone know the package of name youtube application in android? Jan 29 '13

Best Performance Practices for Hibernate 5 and Spring Boot 2 (Part , The Java application makes use of the Hibernate APIs to load, store, query, etc its For Hibernate mapping features not supported by JPA we will prefer They allow the JDBC driver to stream parts of the LOB data as needed, You can use a SQL fragment (aka formula) instead of mapping a property into a column. 21 Why would you prefer Java 8 Stream API instead of direct hibernate/sql queries when working with the DB Apr 9 '17 7 how to pass List from controller to jsp? May 14 '17

Hibernate ORM 5.2.18.Final User Guide, Hibernate not only takes care of the mapping from Java classes to database tables (and from Java data types to SQL data types), but also provides data query​  In general probably yes, you should be able to run the Hibernate in JDK 8. Although if you want to use of any of this features then the answer is no, sorry. The extension was targeted only to specific versions of Hibernate. Although at the moment you can consider it obsolete, since Hibernate 5.2 has been released, which has similar functionality.

Hibernate ORM 5.4.18.Final User Guide, You can use Java 8's Optional with Hibernate to indicate potential null values of optional entity team starts to use Java 8 Streams and Optional in their query APIs in Hibernate 5.2. But there is still no direct support for it in Hibernate 5.2. Choose this if you have no experience with Hibernate and want to get started. Even before Hibernate 5.2, our running example could be ported to the Java 8 domain of streams by just adding a simple method call in the chain of operations since the List itself has a stream method.

Comments
  • One is maintainability. While HQL/SQL queries may often be faster, typos in these queries may only be caught at runtime. E.g. author.getAge() > t0 will most likely be flagged by the compiler, this typo could slip into production when using a query (AGE > t0). Also, some people may be familiar with the methods above and can use them to build complicated things. If things change, it's easy to rewrite in place. Building an equivalent query may be hard, esp. without SQL experience. One additional thing: what if everything's in memory anyway?
  • actually it depends there are libraries that use Java8 streams and convert that in the end to SQL. So without knowing which libraries are in use it is actually hard to tell.
  • @Eugene, because fetchAll, filter later operation implies a waste of system resources for each entity that got discarded by filter operation, since we actually forced database and our mapping framework to do more work than is necessary for our needs. This should be explained in the answer though.
  • @Eugene, agree with you completely, this answer should be expanded with more explanation.
  • One interesting possibility is that the call chain above is actually a fluent builder that lazily constructs a query, IOW it could be that it is actually #1 and #2: using the stream API, but doing everything in the DB. You can't tell from the snippet in the question without knowing the implementations of those methods.
  • @JörgWMittag, that said, a query builder would work in, for example, .NET with Linq, but not in Java: there is no way in plain java to tell what a lambda does, especially the part with author.getAge() > 50, where no query builder can examine and gain access to > 50 part - that is available only at compile time, basically, where query builder sees the uninterpreted Java text. Interpretable query would've been possible if that line looked like author.getAge().greaterThan(50) or something.
  • @JörgWMittag, I wouldn't say they are broken per se - Java just doesn't allow you to override them, which can be seen as a good thing as well (because it relieves you of the mental excercise of guessing what will that operator do when executed and whether or not it was overridden at any point, will it do side effects and what perf it has for any given input).
  • @JörgWMittag, well, operators aren't truly methods because they have a notion of precedence which methods don't - at least in Java. There are languages which did away with operator precedence to reinforce that "operator is a method with funny name", and that move in my eyes (and many others') violated principle of least astonishment. We are unlikely to reach definite agreement of this, So yes, that would be a matter to discuss at another time.
  • @JörgWMittag But remember that some libraries do do weird stuff involving parsing bytecode. It's not impossible or even implausible.
  • Note that some of my java processes are handling Gigabytes of data. Streams, when used correctly are exactly suitable to handle huge data volumes. The idea is to setup a pipeline that processes and write data as it comes in. And where you run aground in complexity, you can switch over to something like Apache Spark that evolves this idea further into RDD’s, not to mention how you can harness multiple servers as a single distributed computational engine ...