SQL-driven retention analysis of a UK online retailer — from cohort analysis and churn detection to RFM segmentation and customer lifetime value, with a Tableau Public dashboard for stakeholder insights.
An online gift retailer based in the UK sells across 40 countries but has limited visibility into customer retention patterns. High single-purchase rates, unclear churn drivers, and no customer segmentation make it difficult to prioritise marketing spend or identify at-risk customers. The goal: build a comprehensive SQL-based retention analysis that turns raw transaction data into actionable customer intelligence.
Nearly two-thirds of customers make a single purchase and never return. Yet the remaining 35% who do return generate over 80% of total revenue. This stark concentration makes repeat-purchase conversion the highest-leverage growth opportunity.
Cohort analysis reveals that only about 1 in 4 customers active in a given month returns the following month. Early-stage retention (months 1–3) shows the steepest drop, suggesting the onboarding window is critical for long-term retention.
CLV decile analysis shows extreme value concentration at the top. These "Champions" average 10+ orders and £2,000+ in lifetime spend — they're worth dedicated loyalty programmes and personalised outreach.
By scoring customers on Recency, Frequency, and Monetary value, 8 distinct segments emerge. "At Risk" customers — high-value buyers who haven't purchased recently — represent recoverable revenue. "New Customers" with high recency but low frequency are ripe for nurture campaigns.
All queries are written in PostgreSQL syntax, showcasing intermediate-to-advanced SQL patterns:
-- Cohort retention with window functions
WITH customer_cohort AS (
SELECT customer_id,
DATE_TRUNC('month', MIN(invoice_date)) AS cohort_date
FROM transactions
GROUP BY customer_id
)
SELECT cohort_month, month_offset,
COUNT(DISTINCT customer_id) AS customers,
ROUND(
COUNT(DISTINCT customer_id) * 100.0 /
FIRST_VALUE(COUNT(DISTINCT customer_id))
OVER (PARTITION BY cohort_month
ORDER BY month_offset), 1
) AS retention_pct
...
Key patterns include CTEs for multi-step logic, NTILE() for quintile/decile scoring, INTERVAL-based self-joins for retention calculation, EXTRACT with AGE for precise month offsets, and ::NUMERIC casting for controlled rounding.
This project pushed me into CTEs, window functions (NTILE, FIRST_VALUE), and self-joins — the patterns that turn SQL from a data retrieval tool into an analytics engine.
Cohort analysis, churn detection, and RFM segmentation aren't isolated metrics — they're interconnected views of the same customer lifecycle. Designing queries that build on each other revealed this clearly.
Writing queries in PostgreSQL syntax while executing locally on SQLite taught me the practical differences: date functions, type casting, and window function support vary significantly between dialects.
Structuring SQL outputs as Tableau-ready CSVs forced me to think about downstream consumption — column naming, granularity, and pre-computed metrics that make dashboard building efficient.
Browse the SQL queries, view the Tableau dashboard, or clone the repository.