Optimizer, Indexes, EXPLAIN
The optimizer translates declarative SQL into a physical plan. Your job is to make the intent clear, provide useful statistics and access paths, and verify the chosen plan.
Mental Model
Section titled “Mental Model”flowchart LR Query[SQL text] --> Parse[Parse and validate] Parse --> Logical[Logical algebra] Logical --> Rewrite[Semantic rewrites] Rewrite --> Cost[Cost-based search] Cost --> Plan[Physical plan] Stats[Stats and constraints] --> Cost Indexes[Indexes / partitions / files] --> Cost Plan --> Execute[Execution]
What Cost-Based Optimizers Estimate
Section titled “What Cost-Based Optimizers Estimate”| Estimate | Why it matters |
|---|---|
| table cardinality | full scan cost and join order |
| predicate selectivity | whether an index/filter is useful |
| join cardinality | memory, spills, join order, broadcast decisions |
| row width | network, memory, disk I/O |
| ordering | whether sorts can be avoided |
| distribution | skew and partitioning in distributed engines |
Bad estimates produce bad plans.
Access Paths
Section titled “Access Paths”Common physical access paths:
- full table scan
- partition-pruned scan
- index seek/range scan
- index-only scan
- bitmap index scan
- file metadata skip / data skipping
Generic principle:
The engine can only avoid work when the storage layout, index, partitioning, or file metadata aligns with the predicate.
Sargability
Section titled “Sargability”A predicate is sargable when the engine can use an access path efficiently.
Less friendly:
WHERE DATE(order_ts) = DATE '2026-01-01'
Better:
WHERE order_ts >= TIMESTAMP '2026-01-01 00:00:00'
AND order_ts < TIMESTAMP '2026-01-02 00:00:00'
Avoid wrapping indexed/filter columns in functions when a range predicate expresses the same logic.
Composite Index Order
Section titled “Composite Index Order”For a conceptual index on (customer_id, order_ts), this is useful:
WHERE customer_id = 42
AND order_ts >= TIMESTAMP '2026-01-01 00:00:00'
This may be less useful:
WHERE order_ts >= TIMESTAMP '2026-01-01 00:00:00'
Leading columns matter in many row-store indexes. Column stores, data skipping, clustered files, and distributed systems have different mechanics, but the same idea remains: layout must match filters.
Join Algorithms
Section titled “Join Algorithms”| Join | Good when | Risk |
|---|---|---|
| nested loop | outer side small, inner side indexed | catastrophic if both sides large |
| hash join | equality join, enough memory | spills if build side too large |
| merge join | both sides sorted | sort cost if not already ordered |
| broadcast join | one side small in distributed execution | executor memory blowup if estimate wrong |
| shuffle hash/sort join | large distributed inputs | network and skew dominate |
Reading EXPLAIN
Section titled “Reading EXPLAIN”When you see a plan, ask:
- Which table is read first?
- Are filters pushed down?
- Are partitions/files pruned?
- What join order did the optimizer choose?
- Which join algorithm is used?
- Are row estimates close to actuals?
- Are there sorts, exchanges, repartitions, or spills?
- Is the output cardinality plausible at every step?
Generic pseudo-plan:
HashAggregate group by customer_id
HashJoin orders.customer_id = customers.customer_id
Filter orders.status = 'paid'
Scan orders partitions=2026-01
Scan customers
Strong read:
The paid filter is applied before the join, which is good. I would check whether the order-date predicate prunes partitions, whether customer scan is dimension-sized, and whether the post-join cardinality matches expected paid order rows.
Statistics and Constraints
Section titled “Statistics and Constraints”Optimizers need facts.
- row counts
- distinct counts
- histograms or frequency stats
- NULL fraction
- min/max metadata
- primary/foreign key constraints
- partition/file stats
If estimates are wrong, fixes may include updated stats, better predicates, materialized intermediates, constraints, data layout changes, or query rewrites.
Index Tradeoffs
Section titled “Index Tradeoffs”Indexes speed reads but cost writes and storage.
Good index candidates:
- high-selectivity filters
- frequent joins
- stable dimensions
- common order-by/limit access paths
- uniqueness constraints
Bad index candidates:
- low-selectivity booleans alone
- write-heavy ephemeral columns
- columns rarely filtered or joined
- every column “just in case”
Column Stores and Lakehouse Layouts
Section titled “Column Stores and Lakehouse Layouts”In columnar analytics engines, performance often comes from:
- reading only needed columns
- partition pruning
- file-level min/max skipping
- clustering/sorting/Z-order-like layouts
- compression
- vectorized execution
- avoiding tiny files
The equivalent of an “index conversation” may be a partitioning, clustering, file-size, and compaction conversation.
Practice
Section titled “Practice”1. A query filters WHERE DATE(order_ts) = DATE '2026-06-01' and scans the whole table. What rewrite do you propose?
Use a half-open timestamp range.
WHERE order_ts >= TIMESTAMP '2026-06-01 00:00:00'
AND order_ts < TIMESTAMP '2026-06-02 00:00:00'
This can use indexes, partition pruning, or file metadata on order_ts more effectively than wrapping the column in a function.
2. EXPLAIN shows estimated rows 1,000 but actual rows 200,000,000 after a join. What do you suspect?
Stale or missing stats, correlated predicates, many-to-many join amplification, missing uniqueness constraints, data skew, NULL-heavy join keys, or predicates that the optimizer cannot estimate well. Validate join cardinality and update stats before forcing hints.
3. Why can an index make a query slower?
If the predicate is not selective, many random index lookups can be slower than a sequential scan. Indexes also add write overhead and may mislead the optimizer when stats are bad.
4. What is the staff answer to "add an index"?
First define the query pattern, selectivity, write cost, storage cost, freshness of stats, and whether the engine is row-store, column-store, or distributed. Then choose an access path or layout that improves the dominant workload, and verify with EXPLAIN plus runtime metrics.