Thursday, December 27, 2012

Hibernate setMaxResults on join tables


The problem:
Let’s say I have a class: Author that has children of multiple books. I would like to retrieve the authors and books, but only want to retrieve the first 10 authors.
The hibernate api for this is to use the setMaxResults function.
The naïve thought would be that if I set the setMaxResults to 10, I will get 10 authors, but this is not true. Since all hibernate commands in the end are sql statements, what the setMaxResults does is to add a LIMIT/TOP (depending on the database) to the sql statement. What then happens is if you have a join in your query like we do, you will not get 10 authors, but actually you will get 10 books (maybe only 5 authors).

Solution, Option One:
The simple solution for this is to create two sql statements. One that will retrieve the first 10 authors id’s according to the criteria you want (including on the books). Then add the id’s to a basic search and fetch the books with the authors.

Solution, Option Two:
If you need to create the join and fetch already in the query stage, but you only want to retrieve the first 10 records that have unique authors. The next solution I propose for this is to use a cursor. Using a standard query will return all records from the db to the application. Using a cursor you can iterate over the records without bringing them all to the client. Cursors lock the db for more time, but give the option to iterate over the records without retrieving them all. To do this you call .scroll() on the query object or criteria and get a ScrollableResults object. If you are only iterating of the list use the ScrollMode.FORWARD_ONLY option to use less memory and speed the process.
Note: you need to make sure that your database supports this, since some databases like mysql fake it and bring the whole data to the client (http://stackoverflow.com/questions/2826319/using-hibernates-scrollableresults-to-slowly-read-90-million-records)

Pagination:
Some queries use a pagination scheme to get data. The standard way is to call setFirstResult.
In the first solution, you need to add the setFirstResult to the query on the id’s, and not the query with the fetch.
In the second solution this won’t work since we are back to the problem that we are counting the rows by ourselves so the setFirstResult won’t work either in hibernate query or scroll.

Dialect solution:
Currently solutions to this issue is on the dialect of the specific database. For example sql server has a row count feature where you can define the field that the sql will add the row count field (a column that has an integer of count according to another field- http://msdn.microsoft.com/en-us/library/ms186734.aspx). Then on this field you can use the standard hibernate setMaxResults and setFirstResult.