[Data Sprint] A machine learning system that predicts Singapore HDB resale flat prices using structural, locational, and temporal features — enabling WOW! Real Estate Agency to provide data-driven pricing recommendations with an interactive calculator for buyers and sellers.
WOW! Real Estate Agency operates across Singapore's HDB resale market — the largest public housing market in Southeast Asia with over 80% of the population living in HDB flats. Agents need accurate price estimates to advise buyers on fair market value and sellers on optimal listing prices. Manual valuations are inconsistent and slow. The challenge: build a model that can predict resale prices within ~8% accuracy across 26 towns, 7 flat types, and a decade of market dynamics.
SHAP analysis confirms floor area as the strongest predictor, with each additional square metre adding approximately SGD 3,800 to the resale price. This structural relationship holds consistently across all towns and flat types, making it the single most important feature for valuation.
Central Area commands the highest premium at ~SGD 120K above average, while estates like Sembawang and Woodlands sit at the lower end. The mature vs non-mature estate distinction alone accounts for SGD 40-60K of this gap, driven by established amenities, school proximity, and transport connectivity.
Flats with leases starting in the 2000s command 20-30% premiums over 1980s-era flats. With Singapore's 99-year leasehold model, every additional year of remaining lease adds ~SGD 1,200 in value — critical intelligence for buyers weighing older flats in prime locations vs newer flats in developing towns.
Each 100 metres closer to an MRT station adds approximately SGD 1,500 to resale value. Flats within 500m of an MRT interchange station show even stronger premiums, reflecting Singapore's transit-oriented development pattern.
Singapore's HDB market has unique dynamics (99-year leases, mature vs non-mature estates, MRT-driven development) that require domain knowledge to encode effectively. Generic feature engineering misses these signals.
The amenity proximity columns taught me that blank values can carry real meaning — "no mall within 500m" is information, not a gap. Treating it as missing data would have biased the model.
For a real estate agency, knowing that "LightGBM predicts SGD 450K" isn't enough. SHAP values explain why — enabling agents to justify valuations to clients with data-driven reasoning.
Building the Streamlit calculator forced me to think beyond the notebook — how does the model get deployed? What inputs do end users need? This full-stack perspective strengthened the entire project.
View the complete notebook, interactive calculator, or browse the source code.