Skip to content

Machine Learning Models

Overview

BetIntel uses an ensemble of ML models to generate accurate football predictions.

Available Models

The most advanced model combining Production V3 with GBM Ensemble.

MetricValue
Accuracy 1X262-65%
Over/Under 2.564%
BTTS52%
RPS0.160
Features140+

Composition:

  • 55% PRODUCTION_V3 (stability)
  • 45% GBM Ensemble (precision)
python
# Combination weights
PRODUCTION_WEIGHT = 0.55
GBM_WEIGHT = 0.45

combined_probs = PRODUCTION_WEIGHT * prod_probs + GBM_WEIGHT * gbm_probs

GBM Ensemble

Ensemble of gradient boosting machines:

ModelRPSWeight
LightGBM0.161250%
CatBoost0.160950%

Files:

  • models/advanced/lightgbm_optimized.pkl
  • models/advanced/catboost_optimized.pkl
  • models/advanced/best_params.json (feature columns)

PRODUCTION_V3

Base model with good stability.

MetricValue
Accuracy 1X260.1%
Over/Under 2.564.1%
BTTS52.4%
RPS~0.18

Files:

  • models/ensemble_v3.pkl - 1X2 predictions
  • models/over25_model.pkl - Over/Under
  • models/btts_model.pkl - BTTS

Pi-Rating System

Dynamic rating system superior to ELO for football.

python
class PiRatingSystem:
    def __init__(self, initial_rating=0.0, lambda_decay=0.035):
        self.lambda_decay = lambda_decay
        self.home_ratings = {}
        self.away_ratings = {}

Based on: Constantinou & Fenton (2012) - "Determining the level of ability of football teams by dynamic ratings"

Prediction Pipeline

Match Data


┌─────────────────────┐
│  Feature Extractor  │  140+ features
└─────────────────────┘


┌─────────────────────┐
│   PRODUCTION_V3     │  Base predictions
└─────────────────────┘


┌─────────────────────┐
│   GBM Predictor     │  LightGBM + CatBoost
└─────────────────────┘


┌─────────────────────┐
│   Combine (55/45)   │  Weighted average
└─────────────────────┘


┌─────────────────────┐
│   Calibration       │  Temperature scaling
└─────────────────────┘


Final Prediction

Supported Markets

1X2 (Match Result)

  • prob_home_win: Home win probability
  • prob_draw: Draw probability
  • prob_away_win: Away win probability

Over/Under 2.5

  • prob_over_25: Probability > 2.5 goals
  • prob_under_25: Probability ≤ 2.5 goals

BTTS (Both Teams To Score)

  • prob_btts_yes: Both teams score
  • prob_btts_no: At least one doesn't score

Double Chance

  • prob_1x: Home or draw
  • prob_x2: Draw or away
  • prob_12: Home or away

Asian Handicap

  • asian_handicap_line: Handicap line (e.g., -0.5, -1.0)
  • asian_handicap_home_odds: Home odds
  • asian_handicap_away_odds: Away odds

Evaluation Metrics

RPS (Ranked Probability Score)

Measures probability calibration.

python
def rps(probs, actual):
    """
    RPS = 0: perfect prediction
    RPS = 1: worst possible prediction
    """
    cum_probs = np.cumsum(probs)
    cum_actual = np.cumsum(actual)
    return np.mean((cum_probs - cum_actual) ** 2)
BenchmarkRPS
Random (33/33/33)~0.22
SOTA literature0.195
BetIntel0.160

Accuracy

Percentage of correct predictions for 1X2 market.

Calibration

Verifies that predicted probabilities match actual frequencies.

Training

Dataset

  • 5+ seasons of historical data
  • 5 top European leagues (Serie A, Premier League, La Liga, Bundesliga, Ligue 1)
  • ~10,000+ matches for training

Cross-Validation

python
from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
for train_idx, val_idx in tscv.split(X):
    # Train and evaluate

Hyperparameter Tuning

python
# LightGBM best params
params = {
    'n_estimators': 500,
    'max_depth': 6,
    'learning_rate': 0.05,
    'num_leaves': 31,
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
}

Future Development

  1. Error Analysis Notebook - Error pattern analysis
  2. Probability Calibration - Fix for low confidence
  3. League-Specific Models - Per-league models

Released under the MIT License.