Query Order, NULLs, Joins
SQL is written top-down but logically processed in a different order.
Logical Order
Section titled “Logical Order”Conceptually:
FROMJOINandONWHEREGROUP BY- aggregates
HAVING- window functions
SELECTDISTINCTORDER BYLIMIT/ fetch
Optimizers can reorder physical operations when semantics allow it, but this logical order explains visibility and NULL behavior.
WHERE vs HAVING
Section titled “WHERE vs HAVING”WHERE filters rows before aggregation.
SELECT customer_id, COUNT(*) AS paid_orders
FROM orders
WHERE status = 'paid'
GROUP BY customer_id;
HAVING filters groups after aggregation.
SELECT customer_id, COUNT(*) AS paid_orders
FROM orders
WHERE status = 'paid'
GROUP BY customer_id
HAVING COUNT(*) >= 3;
Three-Valued Logic
Section titled “Three-Valued Logic”Comparisons can be TRUE, FALSE, or UNKNOWN. WHERE keeps only TRUE.
-- Does not return rows where country is NULL.
SELECT *
FROM customers
WHERE country <> 'US';
If you want non-US including missing:
SELECT *
FROM customers
WHERE country <> 'US' OR country IS NULL;
NOT IN Trap
Section titled “NOT IN Trap”NOT IN behaves badly when the subquery can return NULL.
-- Risky if banned_users.user_id contains NULL.
SELECT user_id
FROM customers
WHERE user_id NOT IN (SELECT user_id FROM banned_users);
Prefer NOT EXISTS.
SELECT c.user_id
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM banned_users b
WHERE b.user_id = c.user_id
);
LEFT JOIN Filter Trap
Section titled “LEFT JOIN Filter Trap”A WHERE condition on the right table can turn a left join into an inner join.
-- Drops customers with no orders.
SELECT c.customer_id, o.order_id
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.customer_id
WHERE o.status = 'paid';
Move the condition into ON if you want to preserve all customers.
SELECT c.customer_id, o.order_id
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.customer_id
AND o.status = 'paid';
Semi and Anti Joins
Section titled “Semi and Anti Joins”Use EXISTS when you only need existence.
SELECT c.customer_id
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.status = 'paid'
);
This avoids row multiplication and often enables efficient semijoin plans.
Practice
Section titled “Practice”1. Return customers with no paid orders.
Use an anti join.
SELECT c.customer_id
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.status = 'paid'
);
This handles duplicates in orders and avoids NOT IN/NULL issues.
2. Why does COUNT(*) differ from COUNT(column)?
COUNT(*) counts rows. COUNT(column) counts non-NULL values in that column.
SELECT
COUNT(*) AS rows,
COUNT(country) AS rows_with_country
FROM customers;
3. Preserve all customers and count their paid orders, including zero.
Filter paid orders before or inside the join, then count right-side keys.
SELECT
c.customer_id,
COUNT(o.order_id) AS paid_orders
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.customer_id
AND o.status = 'paid'
GROUP BY c.customer_id;
4. Why is WHERE o.status = 'paid' wrong after a LEFT JOIN when counting zero-order customers?
Customers without matching orders have NULL in o.status. The WHERE predicate removes those rows, so they disappear instead of showing count zero.