Machine Learning Models

Overview

BetIntel uses an ensemble of ML models to generate accurate football predictions.

Available Models

PRODUCTION_V3_GBM (Recommended)

The most advanced model combining Production V3 with GBM Ensemble.

Metric	Value
Accuracy 1X2	62-65%
Over/Under 2.5	64%
BTTS	52%
RPS	0.160
Features	140+

Composition:

55% PRODUCTION_V3 (stability)
45% GBM Ensemble (precision)

python

# Combination weights
PRODUCTION_WEIGHT = 0.55
GBM_WEIGHT = 0.45

combined_probs = PRODUCTION_WEIGHT * prod_probs + GBM_WEIGHT * gbm_probs

GBM Ensemble

Ensemble of gradient boosting machines:

Model	RPS	Weight
LightGBM	0.1612	50%
CatBoost	0.1609	50%

Files:

models/advanced/lightgbm_optimized.pkl
models/advanced/catboost_optimized.pkl
models/advanced/best_params.json (feature columns)

PRODUCTION_V3

Base model with good stability.

Metric	Value
Accuracy 1X2	60.1%
Over/Under 2.5	64.1%
BTTS	52.4%
RPS	~0.18

Files:

models/ensemble_v3.pkl - 1X2 predictions
models/over25_model.pkl - Over/Under
models/btts_model.pkl - BTTS

Pi-Rating System

Dynamic rating system superior to ELO for football.

python

class PiRatingSystem:
    def __init__(self, initial_rating=0.0, lambda_decay=0.035):
        self.lambda_decay = lambda_decay
        self.home_ratings = {}
        self.away_ratings = {}

Based on: Constantinou & Fenton (2012) - "Determining the level of ability of football teams by dynamic ratings"

Prediction Pipeline

Match Data
    │
    ▼
┌─────────────────────┐
│  Feature Extractor  │  140+ features
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│   PRODUCTION_V3     │  Base predictions
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│   GBM Predictor     │  LightGBM + CatBoost
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│   Combine (55/45)   │  Weighted average
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│   Calibration       │  Temperature scaling
└─────────────────────┘
    │
    ▼
Final Prediction

Supported Markets

1X2 (Match Result)

prob_home_win: Home win probability
prob_draw: Draw probability
prob_away_win: Away win probability

Over/Under 2.5

prob_over_25: Probability > 2.5 goals
prob_under_25: Probability ≤ 2.5 goals

BTTS (Both Teams To Score)

prob_btts_yes: Both teams score
prob_btts_no: At least one doesn't score

Double Chance

prob_1x: Home or draw
prob_x2: Draw or away
prob_12: Home or away

Asian Handicap

asian_handicap_line: Handicap line (e.g., -0.5, -1.0)
asian_handicap_home_odds: Home odds
asian_handicap_away_odds: Away odds

Evaluation Metrics

RPS (Ranked Probability Score)

Measures probability calibration.

python

def rps(probs, actual):
    """
    RPS = 0: perfect prediction
    RPS = 1: worst possible prediction
    """
    cum_probs = np.cumsum(probs)
    cum_actual = np.cumsum(actual)
    return np.mean((cum_probs - cum_actual) ** 2)

Benchmark	RPS
Random (33/33/33)	~0.22
SOTA literature	0.195
BetIntel	0.160

Accuracy

Percentage of correct predictions for 1X2 market.

Calibration

Verifies that predicted probabilities match actual frequencies.

Training

Dataset

5+ seasons of historical data
5 top European leagues (Serie A, Premier League, La Liga, Bundesliga, Ligue 1)
~10,000+ matches for training

Cross-Validation

python

from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
for train_idx, val_idx in tscv.split(X):
    # Train and evaluate

Hyperparameter Tuning

python

# LightGBM best params
params = {
    'n_estimators': 500,
    'max_depth': 6,
    'learning_rate': 0.05,
    'num_leaves': 31,
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
}

Future Development

Error Analysis Notebook - Error pattern analysis
Probability Calibration - Fix for low confidence
League-Specific Models - Per-league models

Machine Learning Models ​

Overview ​

Available Models ​

PRODUCTION_V3_GBM (Recommended) ​

GBM Ensemble ​

PRODUCTION_V3 ​

Pi-Rating System ​

Prediction Pipeline ​

Supported Markets ​

1X2 (Match Result) ​

Over/Under 2.5 ​

BTTS (Both Teams To Score) ​

Double Chance ​

Asian Handicap ​

Evaluation Metrics ​

RPS (Ranked Probability Score) ​

Accuracy ​

Calibration ​

Training ​

Dataset ​

Cross-Validation ​

Hyperparameter Tuning ​

Future Development ​

Machine Learning Models

Overview

Available Models

PRODUCTION_V3_GBM (Recommended)

GBM Ensemble

PRODUCTION_V3

Pi-Rating System

Prediction Pipeline

Supported Markets

1X2 (Match Result)

Over/Under 2.5

BTTS (Both Teams To Score)

Double Chance

Asian Handicap

Evaluation Metrics

RPS (Ranked Probability Score)

Accuracy

Calibration

Training

Dataset

Cross-Validation

Hyperparameter Tuning

Future Development