https://github.com/torvaney/mezzala
Models for estimating football (soccer) team-strength
https://github.com/torvaney/mezzala
dixon-coles poisson-regression soccer soccer-analytics team-strength
Last synced: 9 months ago
JSON representation
Models for estimating football (soccer) team-strength
- Host: GitHub
- URL: https://github.com/torvaney/mezzala
- Owner: Torvaney
- License: apache-2.0
- Created: 2021-05-14T10:20:22.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-10-19T22:08:57.000Z (over 4 years ago)
- Last Synced: 2025-05-24T05:55:21.338Z (about 1 year ago)
- Topics: dixon-coles, poisson-regression, soccer, soccer-analytics, team-strength
- Language: Python
- Homepage: https://torvaney.github.io/mezzala/
- Size: 280 KB
- Stars: 36
- Watchers: 4
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Mezzala
> Models for estimating football (soccer) team-strength
## Install
`pip install mezzala`
## How to use
```python
import mezzala
```
Fitting a Dixon-Coles team strength model:
First, we need to get some data
```python
import itertools
import json
import urllib.request
# Use 2016/17 Premier League data from the openfootball repo
url = 'https://raw.githubusercontent.com/openfootball/football.json/master/2016-17/en.1.json'
response = urllib.request.urlopen(url)
data_raw = json.loads(response.read())
# Reshape the data to just get the matches
data = list(itertools.chain(*[d['matches'] for d in data_raw['rounds']]))
data[0:3]
```
[{'date': '2016-08-13',
'team1': 'Hull City AFC',
'team2': 'Leicester City FC',
'score': {'ft': [2, 1]}},
{'date': '2016-08-13',
'team1': 'Everton FC',
'team2': 'Tottenham Hotspur FC',
'score': {'ft': [1, 1]}},
{'date': '2016-08-13',
'team1': 'Crystal Palace FC',
'team2': 'West Bromwich Albion FC',
'score': {'ft': [0, 1]}}]
### Fitting a model
To fit a model with mezzala, you need to create an "adapter". Adapters are used to connect a model to a data source.
Because our data is a list of dicts, we are going to use a `KeyAdapter`.
```python
adapter = mezzala.KeyAdapter( # `KeyAdapter` = datum['...']
home_team='team1',
away_team='team2',
home_goals=['score', 'ft', 0], # Get nested fields with lists of fields
away_goals=['score', 'ft', 1], # i.e. datum['score']['ft'][1]
)
# You'll never need to call the methods on an
# adapter directly, but just to show that it
# works as expected:
adapter.home_team(data[0])
```
'Hull City AFC'
Once we have an adapter for our specific data source, we can fit the model:
```python
model = mezzala.DixonColes(adapter=adapter)
model.fit(data)
```
DixonColes(adapter=KeyAdapter(home_goals=['score', 'ft', 0], away_goals=['score', 'ft', 1], home_team='team1', away_team='team2'), blocks=[TeamStrength(), BaseRate(), HomeAdvantage()]), weight=UniformWeight()
### Making predictions
By default, you only need to supply the home and away team to get predictions. This should be supplied in the same format as the training data.
`DixonColes` has two methods for making predictions:
* `predict_one` - for predicting a single match
* `predict` - for predicting multiple matches
```python
match_to_predict = {
'team1': 'Manchester City FC',
'team2': 'Swansea City FC',
}
scorelines = model.predict_one(match_to_predict)
scorelines[0:5]
```
[ScorelinePrediction(home_goals=0, away_goals=0, probability=0.023625049697587167),
ScorelinePrediction(home_goals=0, away_goals=1, probability=0.012682094432376022),
ScorelinePrediction(home_goals=0, away_goals=2, probability=0.00623268833779594),
ScorelinePrediction(home_goals=0, away_goals=3, probability=0.0016251514235046444),
ScorelinePrediction(home_goals=0, away_goals=4, probability=0.00031781436109636405)]
Each of these methods return predictions in the form of `ScorelinePredictions`.
* `predict_one` returns a list of `ScorelinePredictions`
* `predict` returns a list of `ScorelinePredictions` for each predicted match (i.e. a list of lists)
However, it can sometimes be more useful to have predictions in the form of match _outcomes_. Mezzala exposes the `scorelines_to_outcomes` function for this purpose:
```python
mezzala.scorelines_to_outcomes(scorelines)
```
{Outcomes('Home win'): OutcomePrediction(outcome=Outcomes('Home win'), probability=0.8255103334702835),
Outcomes('Draw'): OutcomePrediction(outcome=Outcomes('Draw'), probability=0.11615659853961693),
Outcomes('Away win'): OutcomePrediction(outcome=Outcomes('Away win'), probability=0.058333067990098304)}
### Extending the model
It's possible to fit more sophisticated models with mezzala, using **weights** and **model blocks**
#### Weights
You can weight individual data points by supplying a function (or callable) to the `weight` argument to `DixonColes`:
```python
mezzala.DixonColes(
adapter=adapter,
# By default, all data points are weighted equally,
# which is equivalent to:
weight=lambda x: 1
)
```
DixonColes(adapter=KeyAdapter(home_goals=['score', 'ft', 0], away_goals=['score', 'ft', 1], home_team='team1', away_team='team2'), blocks=[TeamStrength(), BaseRate(), HomeAdvantage()]), weight= at 0x123067488>
Mezzala also provides an `ExponentialWeight` for the purpose of time-discounting:
```python
mezzala.DixonColes(
adapter=adapter,
weight=mezzala.ExponentialWeight(
epsilon=-0.0065, # Decay rate
key=lambda x: x['days_ago']
)
)
```
DixonColes(adapter=KeyAdapter(home_goals=['score', 'ft', 0], away_goals=['score', 'ft', 1], home_team='team1', away_team='team2'), blocks=[TeamStrength(), BaseRate(), HomeAdvantage()]), weight=ExponentialWeight(epsilon=-0.0065, key= at 0x122f938c8>)
#### Model blocks
Model "blocks" define the calculation and estimation of home and away goalscoring rates.
```python
mezzala.DixonColes(
adapter=adapter,
# By default, only team strength and home advantage,
# is estimated:
blocks=[
mezzala.blocks.HomeAdvantage(),
mezzala.blocks.TeamStrength(),
mezzala.blocks.BaseRate(), # Adds "average goalscoring rate" as a distinct parameter
]
)
```
DixonColes(adapter=KeyAdapter(home_goals=['score', 'ft', 0], away_goals=['score', 'ft', 1], home_team='team1', away_team='team2'), blocks=[TeamStrength(), HomeAdvantage(), BaseRate()]), weight=UniformWeight()
To add custom parameters (e.g. per-league home advantage), you need to add additional model blocks.