Skip to content

Slow pagination for composite primary keys #1669

@coding-chimp

Description

@coding-chimp

We have several tables with composite primary keys (shop_id, id). We noticed that queries used to determine the next chunk can sometimes run very slowly. Here's an example:

explain analyze select `shop_id`, `id`
    -> from `orders`
    -> where ((`shop_id` > 12345) or (((`shop_id` = 12345)) AND (`id` > 23456))) and ((`shop_id` < 65432) or (((`shop_id` = 65432)) AND (`id` < 98765)) or ((`shop_id` = 65432) and (`id` = 98765)))
    -> order by `shop_id` asc, `id` asc
    -> limit 1 offset 999\G
*************************** 1. row ***************************
EXPLAIN: -> Limit/Offset: 1/999 row(s)  (cost=142 rows=1) (actual time=7785..7785 rows=1 loops=1)
    -> Filter: (((orders.shop_id > 12345) or ((orders.shop_id = 12345) and (orders.id > 23456))) and ((orders.shop_id < 65432) or ((orders.shop_id = 65432) and (orders.id < 98765)) or ((orders.id = 98765) and (orders.shop_id = 65432))))  (cost=142 rows=1352) (actual time=7785..7785 rows=1000 loops=1)
        -> Covering index scan on orders using PRIMARY  (cost=142 rows=1352) (actual time=0.0621..6706 rows=21.4e+6 loops=1)

1 row in set (7.79 sec)

We tried simplifying the query with a row constructor expression, but that did not change the performance:

explain analyze select `shop_id`, `id`
    -> from `orders`
    -> where (`shop_id`, `id`) > (12345, 23456) and (`shop_id`, `id`) <= (65432, 98765)
    -> order by `shop_id` asc, `id` asc
    -> limit 1 offset 999\G
*************************** 1. row ***************************
EXPLAIN: -> Limit/Offset: 1/999 row(s)  (cost=45.4 rows=1) (actual time=7520..7520 rows=1 loops=1)
    -> Filter: (((orders.shop_id,orders.id) > (12345,23456)) and ((orders.shop_id,orders.id) <= (65432,98765)))  (cost=45.4 rows=1000) (actual time=7520..7520 rows=1000 loops=1)
        -> Covering index scan on orders using PRIMARY  (cost=45.4 rows=1000) (actual time=0.0616..6661 rows=21.4e+6 loops=1)

1 row in set (7.52 sec)

Asking one of the robots, it told us the following:

MySQL's range optimizer has a fundamental limitation: it cannot produce a composite B-tree tuple seek for an open range on a multi-column key. The storage engine (InnoDB) can seek to any composite key position (X, Y) — but the SQL layer doesn't generate that access path for expressions like (shop_id, id) > (X, Y).

What MySQL does instead: it resolves the range access as shop_id >= 31593988140 (first column only), then applies the id > 4972880265485 condition as a post-scan filter. If shop 31593988140 has, say, 20M orders where most have id ≤ 4972880265485, MySQL scans all 20M of them and discards the non-matching ones.

I'm not certain this is 100% correct, but it appears to match what we're seeing. So, we came up with a UNION query to work around this limitation.

EXPLAIN ANALYZE SELECT shop_id, id FROM (
    ->   -- 1. Start boundary shop: composite key seek to (start_shop, start_id)
    ->   (SELECT shop_id, id
    ->     FROM orders
    ->     WHERE shop_id = 12345 AND id > 23456
    ->     ORDER BY shop_id ASC, id ASC
    ->     LIMIT 1000)
    ->
    ->   UNION ALL
    ->
    ->   -- 2. Inner shops: efficient range on first column only
    ->   (SELECT shop_id, id
    ->     FROM orders
    ->     WHERE shop_id > 12345 AND shop_id < 65432
    ->     ORDER BY shop_id ASC, id ASC
    ->     LIMIT 1000)
    ->
    ->   UNION ALL
    ->
    ->   -- 3. End boundary shop: composite key scan within (end_shop, ≤ end_id)
    ->   (SELECT shop_id, id
    ->     FROM orders
    ->     WHERE shop_id = 65432 AND id <= 98765
    ->     ORDER BY shop_id ASC, id ASC
    ->     LIMIT 1000)
    -> ) t
    -> ORDER BY shop_id ASC, id ASC
    -> LIMIT 1 OFFSET 999\G
*************************** 1. row ***************************
EXPLAIN: -> Limit/Offset: 1/999 row(s)  (cost=38.7e+6..38.7e+6 rows=1) (actual time=29.4..29.4 rows=1 loops=1)
    -> Sort: t.shop_id, t.id, limit input to 1000 row(s) per chunk  (cost=38.7e+6..38.7e+6 rows=1000) (actual time=29.3..29.4 rows=1000 loops=1)
        -> Table scan on t  (cost=38.7e+6..38.7e+6 rows=3000) (actual time=29..29.2 rows=3000 loops=1)
            -> Union all materialize  (cost=38.7e+6..38.7e+6 rows=3000) (actual time=29..29 rows=3000 loops=1)
                -> Limit: 1000 row(s)  (cost=25.1e+6 rows=1000) (actual time=1.77..1.92 rows=1000 loops=1)
                    -> Filter: ((orders.shop_id = 12345) and (orders.id > 23456))  (cost=25.1e+6 rows=55.2e+6) (actual time=1.77..1.87 rows=1000 loops=1)
                        -> Covering index range scan on orders using PRIMARY over (shop_id = 12345 AND 23456 < id)  (cost=25.1e+6 rows=55.2e+6) (actual time=1.77..1.81 rows=1000 loops=1)
                -> Limit: 1000 row(s)  (cost=13.6e+6 rows=1000) (actual time=1.71..1.86 rows=1000 loops=1)
                    -> Filter: ((orders.shop_id > 12345) and (orders.shop_id < 65432))  (cost=13.6e+6 rows=55.2e+6) (actual time=1.71..1.81 rows=1000 loops=1)
                        -> Covering index range scan on orders using PRIMARY over (12345 < shop_id < 65432)  (cost=13.6e+6 rows=55.2e+6) (actual time=1.71..1.75 rows=1000 loops=1)
                -> Limit: 1000 row(s)  (cost=16041 rows=1000) (actual time=24.9..25 rows=1000 loops=1)
                    -> Sort: orders.id, limit input to 1000 row(s) per chunk  (cost=16041 rows=62777) (actual time=24.9..25 rows=1000 loops=1)
                        -> Filter: ((orders.shop_id = 65432) and (orders.id <= 98765))  (cost=0.256..16041 rows=62777) (actual time=0.0091..20.6 rows=97777 loops=1)
                            -> Covering index skip scan on orders using index_orders_on_shop_id_and_mor_app_id over shop_id = 65432, id <= 98765  (cost=0.256..16041 rows=62777) (actual time=0.00863..14.8 rows=97777 loops=1)

1 row in set (0.03 sec)

Each subquery can use the index efficiently, and we cap the max number of rows examined at 3x chunk_size compared to the 21.4 million we saw in our example. Additionally, the example isn't even the worst one we've seen. We've seen the query take up to 16 minutes in some cases.

We'd be looking to introduce this query only for migrations where gh-ost uses two columns to paginate the table, since the query would grow exponentially complex with more than two columns. We could even limit it to numeric columns.

We'd love to hear your thoughts before we open a PR. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions