E-commerce Customer Retention

SQL-driven retention analysis of a UK online retailer — from cohort analysis and churn detection to RFM segmentation and customer lifetime value, with a Tableau Public dashboard for stakeholder insights.

Type
Personal Project
Dataset
407,664 transactions (4,312 customers)
Revenue
~£8.8M analysed
Tools
PostgreSQL, Python, Tableau

The Problem

An online gift retailer based in the UK sells across 40 countries but has limited visibility into customer retention patterns. High single-purchase rates, unclear churn drivers, and no customer segmentation make it difficult to prioritise marketing spend or identify at-risk customers. The goal: build a comprehensive SQL-based retention analysis that turns raw transaction data into actionable customer intelligence.

Results at a Glance

7
SQL Analysis Modules
14
Tableau-Ready Exports
8
RFM Customer Segments
4,312
Customers Profiled

Key Findings

65% are one-time buyers — but repeat customers drive the business

Nearly two-thirds of customers make a single purchase and never return. Yet the remaining 35% who do return generate over 80% of total revenue. This stark concentration makes repeat-purchase conversion the highest-leverage growth opportunity.

Month-over-month retention averages ~25%

Cohort analysis reveals that only about 1 in 4 customers active in a given month returns the following month. Early-stage retention (months 1–3) shows the steepest drop, suggesting the onboarding window is critical for long-term retention.

Top 10% of customers contribute ~50% of revenue

CLV decile analysis shows extreme value concentration at the top. These "Champions" average 10+ orders and £2,000+ in lifetime spend — they're worth dedicated loyalty programmes and personalised outreach.

RFM segmentation reveals actionable customer groups

By scoring customers on Recency, Frequency, and Monetary value, 8 distinct segments emerge. "At Risk" customers — high-value buyers who haven't purchased recently — represent recoverable revenue. "New Customers" with high recency but low frequency are ripe for nurture campaigns.

Analysis Approach

1
Business Overview
Established baselines: total revenue, monthly trends, geographic breakdown. UK drives 83% of revenue across 40 countries.
2
Cohort Analysis
Grouped customers by first-purchase month and tracked 12-month retention curves using DATE_TRUNC and AGE functions.
3
Retention Rates
Calculated month-over-month retention via self-joins with INTERVAL. Segmented customers by purchase frequency.
4
Churn Detection
Classified Active / At Risk / Churned using recency thresholds (30/90 days). Quantified revenue at risk per segment.
5
Customer Lifetime Value
Estimated projected annual CLV (AOV × frequency × 12). Built revenue deciles with NTILE(10) to show value concentration.
6
RFM Segmentation
Scored R/F/M on 1–5 scale using NTILE window functions. Mapped to 8 business segments (Champions → Lost).

SQL Techniques Used

All queries are written in PostgreSQL syntax, showcasing intermediate-to-advanced SQL patterns:

-- Cohort retention with window functions WITH customer_cohort AS ( SELECT customer_id, DATE_TRUNC('month', MIN(invoice_date)) AS cohort_date FROM transactions GROUP BY customer_id ) SELECT cohort_month, month_offset, COUNT(DISTINCT customer_id) AS customers, ROUND( COUNT(DISTINCT customer_id) * 100.0 / FIRST_VALUE(COUNT(DISTINCT customer_id)) OVER (PARTITION BY cohort_month ORDER BY month_offset), 1 ) AS retention_pct ...

Key patterns include CTEs for multi-step logic, NTILE() for quintile/decile scoring, INTERVAL-based self-joins for retention calculation, EXTRACT with AGE for precise month offsets, and ::NUMERIC casting for controlled rounding.

Technical Stack

PostgreSQL Python 3.10+ pandas SQLite Tableau Public openpyxl

What I Learned

SQL Beyond SELECT *

This project pushed me into CTEs, window functions (NTILE, FIRST_VALUE), and self-joins — the patterns that turn SQL from a data retrieval tool into an analytics engine.

Retention Is a System

Cohort analysis, churn detection, and RFM segmentation aren't isolated metrics — they're interconnected views of the same customer lifecycle. Designing queries that build on each other revealed this clearly.

PostgreSQL vs SQLite

Writing queries in PostgreSQL syntax while executing locally on SQLite taught me the practical differences: date functions, type casting, and window function support vary significantly between dialects.

Analysis-to-Dashboard Pipeline

Structuring SQL outputs as Tableau-ready CSVs forced me to think about downstream consumption — column naming, granularity, and pre-computed metrics that make dashboard building efficient.

Explore the Full Analysis

Browse the SQL queries, view the Tableau dashboard, or clone the repository.