Grouping and Windowing
Grouping collapses rows. Windowing annotates rows without collapsing them.
Aggregation Changes Grain
Section titled “Aggregation Changes Grain”SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
Output grain: one row per customer.
Every selected non-aggregate expression must be functionally determined by the group keys. If your database allows otherwise, treat it as unsafe unless you can prove determinism.
Conditional Aggregation
Section titled “Conditional Aggregation”SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END) AS paid_orders,
SUM(CASE WHEN status = 'paid' THEN total_amount ELSE 0 END) AS paid_revenue
FROM orders
GROUP BY customer_id;
This is portable and often clearer than multiple joins.
Window Functions Preserve Grain
Section titled “Window Functions Preserve Grain”SELECT
order_id,
customer_id,
order_ts,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_ts DESC, order_id DESC
) AS order_recency_rank
FROM orders;
Output grain: one row per order.
Ranking Semantics
Section titled “Ranking Semantics”| Function | Ties | Gaps |
|---|---|---|
ROW_NUMBER | breaks ties arbitrarily unless order is deterministic | no gaps |
RANK | same rank for ties | gaps |
DENSE_RANK | same rank for ties | no gaps |
Always include deterministic tie breakers when you need reproducible output.
Running Aggregates
Section titled “Running Aggregates”SELECT
customer_id,
order_id,
order_ts,
total_amount,
SUM(total_amount) OVER (
PARTITION BY customer_id
ORDER BY order_ts, order_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS lifetime_revenue_so_far
FROM orders
WHERE status = 'paid';
Use explicit window frames. Defaults vary and can surprise you, especially with duplicate order keys.
First and Latest Rows
Section titled “First and Latest Rows”WITH ranked AS (
SELECT
o.*,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_ts DESC, order_id DESC
) AS rn
FROM orders o
)
SELECT *
FROM ranked
WHERE rn = 1;
Practice
Section titled “Practice”1. Find each customer's first paid order.
WITH ranked AS (
SELECT
o.*,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_ts ASC, order_id ASC
) AS rn
FROM orders o
WHERE status = 'paid'
)
SELECT customer_id, order_id, order_ts, total_amount
FROM ranked
WHERE rn = 1;
Tie breaker order_id makes the result deterministic.
2. Compute daily revenue and 7-day rolling revenue.
Date truncation syntax varies by engine; assume order_date is already present or derive it with your engine’s date function.
WITH daily AS (
SELECT
order_date,
SUM(total_amount) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY order_date
)
SELECT
order_date,
revenue,
SUM(revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS revenue_7d
FROM daily;
This is seven rows, not necessarily seven calendar days if dates are missing. For strict calendar windows, join to a date spine first.
3. What is the difference between ROWS and RANGE window frames?
ROWS counts physical rows relative to the current row. RANGE groups peers by order-key value and uses value ranges. With duplicate timestamps or prices, RANGE can include more rows than expected. Use explicit ROWS when you need row-count behavior.
4. Top 3 products by revenue per category.
WITH product_revenue AS (
SELECT
p.category,
oi.product_id,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p
ON p.product_id = oi.product_id
GROUP BY p.category, oi.product_id
),
ranked AS (
SELECT
*,
DENSE_RANK() OVER (
PARTITION BY category
ORDER BY revenue DESC
) AS revenue_rank
FROM product_revenue
)
SELECT category, product_id, revenue
FROM ranked
WHERE revenue_rank <= 3;
Use DENSE_RANK if tied products should all be included. Use ROW_NUMBER if exactly three rows per category are required.