Machine Learning Models
Overview
BetIntel uses an ensemble of ML models to generate accurate football predictions.
Available Models
PRODUCTION_V3_GBM (Recommended)
The most advanced model combining Production V3 with GBM Ensemble.
| Metric | Value |
|---|---|
| Accuracy 1X2 | 62-65% |
| Over/Under 2.5 | 64% |
| BTTS | 52% |
| RPS | 0.160 |
| Features | 140+ |
Composition:
- 55% PRODUCTION_V3 (stability)
- 45% GBM Ensemble (precision)
python
# Combination weights
PRODUCTION_WEIGHT = 0.55
GBM_WEIGHT = 0.45
combined_probs = PRODUCTION_WEIGHT * prod_probs + GBM_WEIGHT * gbm_probsGBM Ensemble
Ensemble of gradient boosting machines:
| Model | RPS | Weight |
|---|---|---|
| LightGBM | 0.1612 | 50% |
| CatBoost | 0.1609 | 50% |
Files:
models/advanced/lightgbm_optimized.pklmodels/advanced/catboost_optimized.pklmodels/advanced/best_params.json(feature columns)
PRODUCTION_V3
Base model with good stability.
| Metric | Value |
|---|---|
| Accuracy 1X2 | 60.1% |
| Over/Under 2.5 | 64.1% |
| BTTS | 52.4% |
| RPS | ~0.18 |
Files:
models/ensemble_v3.pkl- 1X2 predictionsmodels/over25_model.pkl- Over/Undermodels/btts_model.pkl- BTTS
Pi-Rating System
Dynamic rating system superior to ELO for football.
python
class PiRatingSystem:
def __init__(self, initial_rating=0.0, lambda_decay=0.035):
self.lambda_decay = lambda_decay
self.home_ratings = {}
self.away_ratings = {}Based on: Constantinou & Fenton (2012) - "Determining the level of ability of football teams by dynamic ratings"
Prediction Pipeline
Match Data
│
▼
┌─────────────────────┐
│ Feature Extractor │ 140+ features
└─────────────────────┘
│
▼
┌─────────────────────┐
│ PRODUCTION_V3 │ Base predictions
└─────────────────────┘
│
▼
┌─────────────────────┐
│ GBM Predictor │ LightGBM + CatBoost
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Combine (55/45) │ Weighted average
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Calibration │ Temperature scaling
└─────────────────────┘
│
▼
Final PredictionSupported Markets
1X2 (Match Result)
prob_home_win: Home win probabilityprob_draw: Draw probabilityprob_away_win: Away win probability
Over/Under 2.5
prob_over_25: Probability > 2.5 goalsprob_under_25: Probability ≤ 2.5 goals
BTTS (Both Teams To Score)
prob_btts_yes: Both teams scoreprob_btts_no: At least one doesn't score
Double Chance
prob_1x: Home or drawprob_x2: Draw or awayprob_12: Home or away
Asian Handicap
asian_handicap_line: Handicap line (e.g., -0.5, -1.0)asian_handicap_home_odds: Home oddsasian_handicap_away_odds: Away odds
Evaluation Metrics
RPS (Ranked Probability Score)
Measures probability calibration.
python
def rps(probs, actual):
"""
RPS = 0: perfect prediction
RPS = 1: worst possible prediction
"""
cum_probs = np.cumsum(probs)
cum_actual = np.cumsum(actual)
return np.mean((cum_probs - cum_actual) ** 2)| Benchmark | RPS |
|---|---|
| Random (33/33/33) | ~0.22 |
| SOTA literature | 0.195 |
| BetIntel | 0.160 |
Accuracy
Percentage of correct predictions for 1X2 market.
Calibration
Verifies that predicted probabilities match actual frequencies.
Training
Dataset
- 5+ seasons of historical data
- 5 top European leagues (Serie A, Premier League, La Liga, Bundesliga, Ligue 1)
- ~10,000+ matches for training
Cross-Validation
python
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5)
for train_idx, val_idx in tscv.split(X):
# Train and evaluateHyperparameter Tuning
python
# LightGBM best params
params = {
'n_estimators': 500,
'max_depth': 6,
'learning_rate': 0.05,
'num_leaves': 31,
'feature_fraction': 0.8,
'bagging_fraction': 0.8,
}Future Development
- Error Analysis Notebook - Error pattern analysis
- Probability Calibration - Fix for low confidence
- League-Specific Models - Per-league models