https://github.com/paulj1989/player-similarities
Using FB Ref player data to measure player similarity within positions, using clustering methods
https://github.com/paulj1989/player-similarities
cluster-analysis dimensionality-reduction football positions python soccer sports-analytics
Last synced: 3 months ago
JSON representation
Using FB Ref player data to measure player similarity within positions, using clustering methods
- Host: GitHub
- URL: https://github.com/paulj1989/player-similarities
- Owner: Paulj1989
- License: mit
- Created: 2020-12-23T05:14:24.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-11-26T11:10:15.000Z (over 1 year ago)
- Last Synced: 2025-10-14T20:04:14.315Z (8 months ago)
- Topics: cluster-analysis, dimensionality-reduction, football, positions, python, soccer, sports-analytics
- Language: Jupyter Notebook
- Homepage:
- Size: 6.16 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Player Roles/Types & Similarities
Using [FB Ref](https://fbref.com/) player data to measure player roles/types and identify similar players within positions, using clustering and nearest neighbors algorithms.
## Contents
- [Requirements](#requirements)
- [Project Plans](#project-plans)
- [TODOs](#todos)
- [Feature Selection](#feature-selection)
- [Clustering](#clustering)
- [Similarities](#similarities)
- [License](#license)
- [Contact](#contact)
## Requirements
This project is managed in a virtual environment, using pipenv. All packages and their dependencies can be found in Pipfile and Pipfile.lock. To create a pipenv environment and install all the packages needed to run the code in this repository, run the following in a terminal:
````bash
# install pipenv
pip install pipenv
# navigate to the repository directory
cd ~/path/to/player-similarity-clusters
# install virtual environment and dependencies
pipenv install
````
The packages required are:
- pandas
- ipykernel
- matplotlib
- yellowbrick
- scikit-learn
There are two notebooks containing the code for the project. They have to be run sequentially for both to work, so the clustering models (contained in the aptly named clustering notebook) have to be computed first, before running the nearest neighbors algorithm (in the similarities notebook) to compute player similarities.
## Project Plans
This project is still in development.
### TODOs
#### Clustering
- [ ] Consider lasso & weighted k-means feature selection
- [ ] Look at clustering for defenders & goalkeepers
- [ ] Think about features needed for goalkeepers
#### Nearest Neighbors
- [ ] Stop the output pulling the target player when identifying similar players
## License
The data for this project is provided by [FB Ref](https://fbref.com/) and the code used to train the clustering and the nearest neighbors algorithms is licensed under the [MIT license](LICENSE.md).
## Contact
If you have any questions or comments, feel free to contact [me](https://github.com/paulj1989) by [email](mailto:paul@paulrjohnson.net), on [Twitter](https://twitter.com/paul_johnson89), or in the [repository discussions](https://github.com/Paulj1989/player-similarity-clusters/discussions).