Pagination — Offset vs Cursor in MySQL
What is Pagination?
With the amount of information held by organizations, it is impractical to retrieve all of it with a single API call. Breaking similar content into a sequence of data sets which are called pages is pagination or paging. A real-life example is to think of an ecommerce site such as Amazon or eBay, where the products are split into a sequence of pages. The two mainstream types of pagination are offset based pagination and cursor-based pagination.
Offset Based Pagination
Offset based pagination is one of the most widely used methods of pagination where clients provide the number of results they want per page (called the limit) along with the offset. The API will skip the number of items indicated in the offset and return items from that point up to the specified limit. It is the simplest way to achieve pagination, basically skipping an offset amount of data from the front. Offset pagination is generally used with static data.
When using offset based pagination, there are two important parameters which the client needs to provide:
- Offset: The starting point of the list of items / How many records need to be skipped
- Limit (aka page size): Number of records per page.
Assume you have a thousand products, and you wish to display 10 per page;
- First page: Offset: 0, Limit: 10.
- Second page: Offset: 10, Limit: 10.
- Third page: Offset: 20, Limit: 10…………….and so on.
Cursor Based Pagination
Uses a cursor instead of an offset value. Cursor based pagination returns a pointer to a specific item in the dataset in its response. On a subsequent request when that pointer is presented to the API, the server returns the results from that pointer onwards. The field used for the cursor must be a unique and sequential column, based on which the dataset will be ordered. Cursor based pagination is used when dealing with real time data.
When using cursor-based pagination, there are also two important parameters the client needs to provide. Which are, the limit and the cursor. The very first request will not include a cursor. But the subsequent request will carry the cursor of the last record retrieved from the first request. In the subsequent request we ask the database to provide records which are greater than (or less than, which will be discussed later) the cursor, up to the limit.
Assume you have a thousand products, all having unique names (therefore the name can act as the cursor) and we want to display 10 per page;
- First page: Cursor: Null, Limit: 10. Assume the last (10th) record fetched has a name of “Bells”.
- Second page: Cursor: fetch records whose name > “Bells”, Limit: 10. Assume the last record (20th) has the name “Cactus”.
- Third page: Cursor: fetch records whose name > “Cactus”, Limit: 10………and so on.
Why you should switch to cursor-based pagination
One of the main drawbacks of offset based pagination is when using it with large datasets. As the offset increases the further you go within the dataset, the database still has to read up to offset, then count rows from disk, before discarding the offset and returning only the count rows. For example; if your dataset has 1 million records, split into pages with a page size of 100 records per page, then to get the final page (page 10,000 -> containing records 999,900–1,000,000), the database would still need to read records from 0 to 999,900 and discard them. The higher the offset, the slower the load times.
Secondly, if the dataset changes at a high frequency (real time data), the items returned become unreliable, potentially skipping or duplicating results. An offset acts as an index which requires the data to be static, if not, data may be skipped or returned twice, whereas a cursor acts as a pointer to where the last request left off.
Implementing the infinite scroll feature (where you keep on scrolling to the bottom for new posts to appear. Ex: Reddit, Facebook, Twitter) on the front end is better with cursor pagination, although it can be implemented with offset pagination.
Why you shouldn’t switch to cursor-based pagination
Offset pagination is very simple to implement. Although cursor pagination is not that difficult either, it is definitely more work than offset pagination, requiring WHERE and ORDER BY clauses. Additionally, your use case may have static data, not requiring a cursor implementation.
The cursor must be a unique attribute which can be sequentially ordered, and your data will always be sorted by this attribute. This creates issues in sorting data by other attributes. If you have a requirement to sort, then cursors aren’t the best option. Although it can still be done, the overhead might negate the performance benefits.
With offset pagination, users can arbitrarily jump pages as they require. For example, if the page size is 10 records, then the 10th page will have an offset of 90, so jumping to the 10th page is possible. However, this is not possible with cursor-based pagination as the cursor is only retrieved with the request of the previous page. Therefore, the pages must be traveled sequentially.
- Offset based site: Amazon product pages has page numbers which we can use to skip pages.
- Cursor based site: GitHub Stargazer site only has next and previous buttons.
Query Cost Evaluation
Below is a brief comparison of offset pagination with cursor-based pagination in MySQL.
Note:
- UM_USER_NAME is selected as the cursor as it is unique and can be sequentially ordered.
- Attributes of the UM_USER table:
- The table has two indexes. One on the UM_USER_NAME attribute and another on UM_USER_ID.
- The UM_USER table contains a total of 11,879 records.
1. Get UM_USER_NAME and UM_USER_ID. Offset: 0, Limit:10.
- Using offset based pagination
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UGROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME ASC LIMIT 10 OFFSET 0;
- Using cursor-based pagination
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UGROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME ASC LIMIT 10;
2. Get UM_USER_NAME and UM_USER_ID. Offset: 5000, Limit:10.
- Using offset based pagination
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UGROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME ASC LIMIT 10 OFFSET 5000;
- Using cursor-based pagination
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UWHERE U.UM_USER_NAME > ‘Elijah_998’GROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME ASC LIMIT 10;
Note: Elijah_998 is the cursor of the last record of the previous request.
3. Get UM_USER_NAME and UM_USER_ID. Offset: 11850, Limit:10.
- Using offset based pagination
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UGROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME ASC LIMIT 10 OFFSET 11850;
- Using cursor-based pagination
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UWHERE U.UM_USER_NAME > ‘Lakmal_925’GROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME ASC LIMIT 10;
Summary
Observations
- As expected, when the offset value increases, query performance is higher when using cursors. Although in this scenario we are unable to observe a picture-perfect increase (such as when moving from offset 5000 to 10,000, there is no increase in cost of the offset query cost). This is because of the way that the MySQL query optimizer selects the query plan and because 5000 to 10,000 isn’t a massive difference in records for the database. But we can clearly see a better performance with cursor-based pagination than offset pagination. And this performance gap only becomes wider with a larger dataset. In this example, I have only entered 11,879 records, but in a database with millions of entries, the performance implications will be more profound.
- The smaller the dataset, the closer the performance difference between the two types.
ORDER BY ASC AND ORDER BY DESC
In the above example, we have ordered the data by ASC. But the dataset may also be ordered by DESC. How would this affect performance?
1. Get UM_USER_NAME and UM_USER_ID. Offset: 0, Limit:10.
- Using cursor-based pagination — ASC
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UGROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME ASC LIMIT 10;
- Using cursor-based pagination — DESC
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UWHERE U.UM_USER_NAME < ‘Achala_107’GROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME DESC LIMIT 10;
2. Get UM_USER_NAME and UM_USER_ID. Offset: 11,850, Limit:10.
- Using cursor-based pagination — ASC
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UWHERE U.UM_USER_NAME > ‘Lakmal_925’GROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME ASC LIMIT 10;
- Using cursor-based pagination — DESC
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UWHERE U.UM_USER_NAME < ‘Lakmal_935’GROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME DESC LIMIT 10;
Observations
- ORDER BY ASC is more efficient when we are retrieving records towards the end of the dataset. ORDER BY DESC is more efficient when retrieving records at the start of the dataset but fails to maintain the natural order of the output. Obviously, the records will be from bottom to top when ordered by DESC.
- But if you wish to order the result in ASC, we simply need to cover the code in;
WITH result AS( SELECT U.UM_USER_ID, U.UM_USER_NAME FROM UM_USER U WHERE U.UM_TENANT_ID = -1234 AND U.UM_USER_NAME < ‘Achala_107’ GROUP BY U.UM_USER_NAME, U.UM_USER_ID ORDER BY UM_USER_NAME DESC LIMIT 10;)SELECT * from result ORDER BY UM_USER_NAME ASC;
Although this allows us to get an ASC order output it doesn’t allow for cursor-based pagination as the next cursor cannot be identified from the response. Therefore, if you want to order by DESC, all your pages from start to end need to be in descending order.
Previous Page
So far, we’ve looked at how to go from one page to the next, but cursor-based pagination also supports going to the previous page.
- When requesting the next page, we ask the database to send us the records which are GREATER > than the cursor (in our case the UM_USER_NAME) of the LAST record of the retrieved result set.
- When requesting the previous page, we ask the database to send us the records which are LESS < than the cursor of the FIRST record of the retrieved result set.
Given below are the first 20 results of a database. Assume we want each page to display 5 records.
Starting Page 1: No cursor. Limit = 5.
Next Page 2: Cursor: Achala_101. Limit 5.
Query:
SELECT U.UM_USER_ID, U.UM_USER_NAMEFROM UM_USER UWHERE U.UM_USER_NAME > ‘Achala_101’GROUP BY U.UM_USER_NAME, U.UM_USER_IDORDER BY UM_USER_NAME ASC LIMIT 5;
Now assume we are on page 2 and wish to go back to page 1.
The cursor will now be the UM_USER_NAME of the first record of the retrieved result set.
Previous Page 1: Cursor: Achala_102. Limit 5.
Query:
WITH results AS( SELECT U.UM_USER_ID, U.UM_USER_NAME FROM UM_USER U WHERE U.UM_USER_NAME < ‘Achala_102’ GROUP BY U.UM_USER_NAME, U.UM_USER_ID ORDER BY UM_USER_NAME DESC LIMIT 5)SELECT * from results ORDER BY UM_USER_NAME ASC;
We cannot order the inner query by ASC because the results would be inconsistent with the required page. As the inner query is retrieving the results in DESC, cover the code as above to order it in ascending order.
Alright. That‘s all folks. I hope this can help you to get a better idea on which approach you should take for your application. Thanks for reading.
References
Web API Pagination: Offset based vs Cursor Based
The best database pagination technique is…
These two articles from Megan Chang are must reads:
Is offset pagination dead? Why cursor pagination is taking over