Skip to content

Modeling and Temporal SQL

Staff-level SQL interviews often hide a modeling problem inside a query problem.

TypeGrainExamples
transaction factone row per business eventorder, payment, shipment
periodic snapshot factone row per entity-periodaccount balance by day
accumulating snapshotone row per process instanceorder lifecycle milestones
dimensionone row per entity or entity-versioncustomer, product, store

The right model makes the query obvious. The wrong model forces every query to re-derive business meaning.

Type 2 dimension pattern:

customer_dim(
	customer_sk,
	customer_id,
	country,
	plan,
	valid_from,
	valid_to,
	is_current
)

Join facts to the dimension version valid at event time.

SELECT
	o.order_id,
	o.order_ts,
	d.plan,
	o.total_amount
FROM orders o
JOIN customer_dim d
	ON d.customer_id = o.customer_id
	AND o.order_ts >= d.valid_from
	AND o.order_ts < d.valid_to;

Use half-open intervals: [valid_from, valid_to). They avoid boundary overlap.

Late events force a choice:

  • recompute impacted partitions
  • maintain correction tables
  • use watermarking and accept bounded lateness
  • publish both provisional and finalized metrics

Interview phrase:

I would separate event-time correctness from processing-time availability, then define a watermark and a backfill policy.

Use a date spine when you need zero rows to appear as zero metrics.

SELECT
	d.calendar_date,
	COUNT(o.order_id) AS paid_orders
FROM date_spine d
LEFT JOIN orders o
	ON o.order_date = d.calendar_date
	AND o.status = 'paid'
GROUP BY d.calendar_date;

Event log:

  • append-only
  • good for audits and replay
  • harder for current state

Snapshot:

  • easier current-state querying
  • can hide history
  • needs freshness and reconciliation checks

Staff answer: keep immutable events as source of truth and derived snapshots for serving, with tests that reconcile the two.

1. A product's category can change over time. How do you compute revenue by category as of the order date?

Use a type 2 product dimension and join by validity interval.

SELECT
	p.category,
	SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN orders o
	ON o.order_id = oi.order_id
JOIN product_dim p
	ON p.product_id = oi.product_id
	AND o.order_ts >= p.valid_from
	AND o.order_ts < p.valid_to
WHERE o.status = 'paid'
GROUP BY p.category;

Do not join only to the current product category unless the question explicitly wants current taxonomy.

2. How do you detect overlapping SCD intervals?
WITH ordered AS (
	SELECT
		customer_id,
		valid_from,
		valid_to,
		LAG(valid_to) OVER (
			PARTITION BY customer_id
			ORDER BY valid_from
		) AS prev_valid_to
	FROM customer_dim
)
SELECT *
FROM ordered
WHERE prev_valid_to > valid_from;

This finds intervals where the previous row ends after the current row starts.

3. Why is using processing date for business metrics risky?

Processing date measures when the pipeline saw the data, not when the business event happened. Late data, retries, backfills, and outages can distort event-time metrics. Use processing date for pipeline operations; use event date for business reporting unless explicitly stated otherwise.