https://github.com/raphaelsenn/playervectors
Implementation of the paper "Player Vectors: Characterizing Soccer Players Playing Style from Match Event Streams".
https://github.com/raphaelsenn/playervectors
data-science
Last synced: 3 months ago
JSON representation
Implementation of the paper "Player Vectors: Characterizing Soccer Players Playing Style from Match Event Streams".
- Host: GitHub
- URL: https://github.com/raphaelsenn/playervectors
- Owner: raphaelsenn
- License: mit
- Created: 2024-12-20T18:35:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-08-05T18:11:33.000Z (10 months ago)
- Last Synced: 2025-12-26T20:15:40.103Z (5 months ago)
- Topics: data-science
- Language: Python
- Homepage:
- Size: 70.2 MB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# playervectors
Implementation of [Player Vectors: Characterizing Soccer Players Playing Style from Match Event Streams](https://ecmlpkdd2019.org/downloads/paper/701.pdf) in python.
## Install
```bash
pip install playervectors
```
## Usage
## Expected Format for `df_events` (SPADL Format)
The `df_events` DataFrame used in `PlayerVectors.fit()` must follow the **SPADL format**, with the following required column names:
| Column Name | Description |
|-------------|------------|
| **player_id** | Unique identifier for the player. |
| **action_type** | Type of action or event (e.g., shot, pass, cross, dribble). |
| **x_start** | X-coordinate where the action starts. |
| **y_start** | Y-coordinate where the action starts. |
| **x_end** | X-coordinate where the action ends. |
| **y_end** | Y-coordinate where the action ends. |
If not, change the mapping in `PlayerVectors.fit(column_names=new_column_names)` such that:
```python
new_column_names = {
'player_id': 'your_player_id',
'action_type': 'your_action_type',
'x_start': 'your_x_start',
'y_start': 'your_y_start',
'x_end': 'your_x_end',
'y_end': 'your_y_end'
}
```
### Fitting PlayerVectors
Building **18**-component **PlayerVectors** with selected actions **shot**, **cross**, **dribble** and **pass** with respective components **4**, **4**, **5** and **5**.
```python
from playervectors import PlayerVectors
pvs = PlayerVectors(
grid=(50, 50),
actions=['shot', 'cross', 'dribble', 'pass'],
components=[4, 4, 5, 5]
)
pvs.fit(
df_events=df_events,
minutes_played=minutes_played,
player_names=player_names
)
```
| Parameter | Description |
|-------------|------------|
| **df_events** | Event Stream Data in SPADL-Format. |
| **minutes_played** | A dictionary that maps each player_id to the total minutes they played across all events in df_events|
| **player_names** | Mapping player_id to player_name. |
### Plotting Principle Components
```python
import matplotlib.pyplot as plt
pvs.plot_principle_components()
plt.show()
```

Output of: pvs.plot_principle_components()
### Plotting Weight Distribution
```python
import matplotlib.pyplot as plt
pvs.plot_distribution()
plt.show()
```

Output of: pvs.plot_distribution()
### Plotting Weights of a Player
```python
import matplotlib.pyplot as plt
# wy_id of Kevin De Bruyne (Central midfielder)
pvs.plot_weights(player_id=38021)
plt.show()
```

Output of: pvs.plot_weights(player_id=38021)
## Building Player Vectors
### 1. Selecting Relevant Action Types
Let $k_t$ be the number of principal components chosen to compress heatmaps of action type $t$.
According to the paper, $k_t$ with $t \in$ {shot, cross, dribble, pass} with corresponding components {4, 4, 5, 5} is the minimal number of components needed to explain 70% of the variance in the heatmaps of action type $t$.
This parameter setting
was empirically found to work well because of the high variability of players
positions in their actions (see Challenge 1 in Section 2 in the paper).
Ignoring 30% of the variance allows to summarize a player’s playstyle only by his dominant regions
on the field rather than model every position on the field he ever occupied.
### 2. Constructing Heatmaps
#### 2.1 Counting

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
#### 2.2 Normalizing

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
#### 3.3 Smoothing

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
### 3. Compressing Heatmaps to Vectors
#### 3.1 Reshaping

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
#### 3.2 Construct the matrix M

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
#### 3.3 Compress matrix M by applying non-negative matrix factorization (NMF)

Source: Tom Decroos and Jesse Davis, September 19th, 2019 ECMLPKDD
### 4. Assembling Player Vectors
The player vector v of a player p is the concatenation of his compressed vectors
for the relevant action types.
## Detailed Algorithm

## Running `demo.ipynb`
#### 1. Download this [Dataset](https://www.kaggle.com/datasets/aleespinosa/soccer-match-event-dataset) on Kaggle
#### 2. Create a folder named `data` in this Repository
```bash
mkdir data
```
#### 3. Copy all .csv files from the Dataset in the folder `data`
#### 4. Run notebook `demo.ipynb`
## About the Datasets
This dataset contains European football team stats.
Only teams of Premier League, Ligue 1, Bundesliga, Serie A and La Liga are listed.
All the credit is to Luca Pappalardo and Emmanuele Massucco.
[https://www.kaggle.com/datasets/aleespinosa/soccer-match-event-dataset](https://www.kaggle.com/datasets/aleespinosa/soccer-match-event-dataset)
## Citations
```bibtex
@article{ecmlpkdd2019,
title = {Player Vectors: Characterizing Soccer Players’
Playing Style from Match Event Streams},
author = {Tom Decroos, Jesse Davis},
journal = {ecmlpkdd2019},
year = {2019},
}
```