Feature Description
Today, the vtgate planner falls back to a full Scatter for two query shapes that could in principle be served by consulting the lookup vindex(es) already defined in the VSchema:
- WHERE a = 'x' OR b = 'y', where a and b each have their own lookup vindex.
- WHERE a BETWEEN 'x' AND 'y' (or a >= 'x' AND a <= 'y'), where a has a lookup vindex. (non - sequential index )
For workloads with many shards, scatter is significantly more expensive than necessary in both cases — we have all the information at planning/execution time to narrow the routing.
Proposed behavior
(1) OR across vindex-backed columns
For WHERE a = 'x' OR b = 'y' where both columns are vindex-backed:
- Resolve KSIDs for a = 'x' via its vindex.
- Resolve KSIDs for b = 'y' via its vindex.
- Union the KSID sets.
- Route to only the shards covering that union; dedupe rows at vtgate.
(2) Range predicates on lookup-vindex-backed columns
For WHERE a BETWEEN 'x' AND 'y':
- Range-scan the lookup table for entries with from BETWEEN 'x' AND 'y'.
- Collect the KSIDs, route only to shards intersecting them.
- Optionally: bail out to scatter if the lookup-side result set exceeds a configurable threshold, so the optimization can't pathologically degrade.
For sequential/ordered primary vindexes the same logic can apply directly to the keyspace-id range without a lookup-table read.
Use Case(s)
We are using vitess with a very high number of shards and wanted to minimize the number of scatter. OR and RANGE queries are very genuine use cases but using vitess version 23, vexplain shows that the queries are going to scatter.
Feature Description
Today, the vtgate planner falls back to a full Scatter for two query shapes that could in principle be served by consulting the lookup vindex(es) already defined in the VSchema:
For workloads with many shards, scatter is significantly more expensive than necessary in both cases — we have all the information at planning/execution time to narrow the routing.
Proposed behavior
(1) OR across vindex-backed columns
For WHERE a = 'x' OR b = 'y' where both columns are vindex-backed:
(2) Range predicates on lookup-vindex-backed columns
For WHERE a BETWEEN 'x' AND 'y':
For sequential/ordered primary vindexes the same logic can apply directly to the keyspace-id range without a lookup-table read.
Use Case(s)
We are using vitess with a very high number of shards and wanted to minimize the number of scatter. OR and RANGE queries are very genuine use cases but using vitess version 23, vexplain shows that the queries are going to scatter.