2026

SoleMate

A data-driven running shoe recommendation engine that matches runners with the best shoes from 300+ lab-tested options using nearest-neighbor search in biomechanical feature space.

Machine LearningD3.jsNext.jsRecommendation Systems

Overview

Most running shoe recommendations come from editorial reviews or brand marketing. SoleMate takes a different approach: it treats shoe selection as a nearest-neighbor search in biomechanical feature space. The app scrapes lab-tested data for 300+ running shoes from RunRepeat (weight, stack height, drop, midsole softness, flexibility, width, energy return, traction), builds a feature space, and matches runners to shoes based on their actual running profile rather than what a reviewer liked. Users answer questions about their mileage, pace, terrain, injury history, and comfort preferences. The system classifies them into blended archetypes using sigmoid-smoothed affinities, then scores every shoe against that profile. The result is a ranked list of recommendations, each with an explainability breakdown showing the top contributing factors.

Challenge

Shoe recommendation has a few tricky properties. The feature space is modest (8 dimensions) but the interactions matter: a heavy, cushioned shoe might be perfect for a long-distance runner with knee issues but terrible for a trail sprinter. Hard clustering doesn't work because shoes blend categories. A "daily trainer" might also be a decent long-run shoe. Users also don't know what they want in technical terms; they know "my knees hurt" or "I want something light," not "I need 28mm stack height with 60% energy return."

Approach

The pipeline has several stages. A Python scraper collects shoe data from RunRepeat, then a preparation step normalizes features using quantile transforms (not min-max, for outlier robustness) and expands the 8D feature space to 36D using polynomial interaction terms. UMAP projects this into 2D for visualization, and a Gaussian Mixture Model assigns soft cluster memberships so each shoe gets probabilities across categories rather than a single hard label. On the frontend, an interactive D3.js scatter plot lets users explore the full shoe space, zooming and filtering by brand. A pulsing marker shows where the user's preferences land, with nearby recommendations highlighted. The scoring algorithm uses weighted L1 similarity with archetype-specific feature weights, terrain gating, and price dampening. A rotation builder uses greedy optimization to suggest complementary multi-shoe sets that maximize role coverage while minimizing redundancy.

SoleMate Shoe Space explorer showing UMAP projection of 300+ shoes color-coded by cluster — The Shoe Space explorer: every dot is a shoe, color-coded by cluster, with your position and recommendation zone highlighted

Impact

The whole thing runs in the browser with no server, so anyone can use it. Every recommendation comes with a plain-language explanation of why that shoe was picked, not just a score. The rotation builder is probably the most useful piece: instead of buying one shoe and hoping for the best, it suggests a complementary set that covers your different running needs.

SoleMate comparison dashboard with spider charts overlaying three shoes — Side-by-side comparison with spider charts, per-feature bars, and biggest differences

SoleMate recommendations page showing ranked shoe matches with explainability — Ranked recommendations with match scores, expert verdicts, and "why this works" breakdowns

Links

GitHub →Live App →