Model Validation | Project NERVE

95.6%

CRITICAL Precision

Extended (3mo grace)

93.8%

HIGH Precision

Extended (3mo grace)

92.8%

Combined Precision

All tiers, extended

70.9%

Persistent Precision

Negative every month

75.2%

Transition Recall

Caught deteriorating

ℹ Definitions

Precision: Of lockers flagged by the model, what % actually turned/stayed negative? Higher = fewer false positives.
Extended (3mo grace): A prediction counts as correct if the locker is negative at exit month OR any of the next 3 months.
Persistent Precision: Flagged AND negative every single month from the exit onward. Strictest measure.
Transition Recall: Of lockers that genuinely went from positive to persistently negative, what % did the model catch?
Walk-forward: Train on 6 months of history, predict at exit month, validate against outcome. No future data leakage.
R²: Coefficient of determination of the OLS slope fit. Higher = more confident the trend is real, not noise.
Dual-path: Model classifies via EITHER slope deterioration OR maintenance cost spikes. Covers both failure modes.

Validation Methodology

Quarterly walk-forward (5 windows): Train on 6-month history, validate against exit month + 3-month grace period. The model never sees future data during training. Each window is independent.

Precision Definitions

• Strict: Flagged AND negative at exit month only
• Extended (3mo grace): Flagged AND negative at exit OR any of next 3 months
• Persistent: Flagged AND negative EVERY month from exit onward

Recall Definitions

• Transition recall: Of lockers that went from positive → negative (persistent), what % did we flag?
• Retention recall: Of already-negative lockers that stayed negative, what % did we flag?
• Overall recall: All negatives at exit / flagged (~37% — model targets deterioration, not cataloging)

Quarterly Backtest Results

Window	CRITICAL (ext)	HIGH (ext)	ALL (ext)	Persist Prec	Trans Recall
Jan-Jun25	91.1%	91.4%	89.9%	63.6%	80.2%
Apr-Sep25	98.1%	96.2%	95.6%	64.9%	82.2%
Jul-Dec25	97.9%	97.2%	96.6%	62.5%	75.0%
Oct25-Mar26	95.9%	93.9%	92.5%	74.2%	83.3%
Jan-May26	94.9%	90.5%	89.4%	89.4%	55.5%

Average: CRITICAL 95.6% | HIGH 93.8% | ALL 92.8%

Miss Analysis

Understanding why the model misses helps calibrate expectations and identify improvement vectors.

Primary Miss Causes

• 46% — Maintenance spikes: One-off high-cost repairs that don't show as slope changes. Now partially captured via maintenance path.
• 23% — Seasonal rescues: Holiday volume temporarily pushes declining lockers positive. Structural limitation of any trend-based predictor.
• 18% — New lockers: Insufficient history for slope estimation (<4 months). Model requires 6 data points.
• 13% — Abrupt external events: Store closure, construction, partner dispute. No leading indicator available.

NERVE AI Assistant📌 Pins📋 History⬇ Save+ New×