2026
SoleMate
A data-driven running shoe recommendation engine that matches runners with the best shoes from 300+ lab-tested options using nearest-neighbor search in biomechanical feature space.

Overview
Most running shoe recommendations come from editorial reviews or brand marketing. SoleMate takes a different approach: it treats shoe selection as a nearest-neighbor search in biomechanical feature space. The app scrapes lab-tested data for 300+ running shoes from RunRepeat (weight, stack height, drop, midsole softness, flexibility, width, energy return, traction), builds a feature space, and matches runners to shoes based on their actual running profile rather than what a reviewer liked. Users answer questions about their mileage, pace, terrain, injury history, and comfort preferences. The system classifies them into blended archetypes using sigmoid-smoothed affinities, then scores every shoe against that profile. The result is a ranked list of recommendations, each with an explainability breakdown showing the top contributing factors.
Challenge
Shoe recommendation has a few tricky properties. The feature space is modest (8 dimensions) but the interactions matter: a heavy, cushioned shoe might be perfect for a long-distance runner with knee issues but terrible for a trail sprinter. Hard clustering doesn't work because shoes blend categories. A "daily trainer" might also be a decent long-run shoe. Users also don't know what they want in technical terms; they know "my knees hurt" or "I want something light," not "I need 28mm stack height with 60% energy return."
Approach
The pipeline has several stages. A Python scraper collects shoe data from RunRepeat, then a preparation step normalizes features using quantile transforms (not min-max, for outlier robustness) and expands the 8D feature space to 36D using polynomial interaction terms. UMAP projects this into 2D for visualization, and a Gaussian Mixture Model assigns soft cluster memberships so each shoe gets probabilities across categories rather than a single hard label. On the frontend, an interactive D3.js scatter plot lets users explore the full shoe space, zooming and filtering by brand. A pulsing marker shows where the user's preferences land, with nearby recommendations highlighted. The scoring algorithm uses weighted L1 similarity with archetype-specific feature weights, terrain gating, and price dampening. A rotation builder uses greedy optimization to suggest complementary multi-shoe sets that maximize role coverage while minimizing redundancy.

Impact
The whole thing runs in the browser with no server, so anyone can use it. Every recommendation comes with a plain-language explanation of why that shoe was picked, not just a score. The rotation builder is probably the most useful piece: instead of buying one shoe and hoping for the best, it suggests a complementary set that covers your different running needs.

