https://github.com/jeremias-ventura/la-liga-match-predictor
A machine learning project predicting La Liga match outcomes for the 24/25 season using data from the last five seasons
https://github.com/jeremias-ventura/la-liga-match-predictor
machine-learning python random-forest soccer sports-analytics
Last synced: 8 months ago
JSON representation
A machine learning project predicting La Liga match outcomes for the 24/25 season using data from the last five seasons
- Host: GitHub
- URL: https://github.com/jeremias-ventura/la-liga-match-predictor
- Owner: Jeremias-Ventura
- Created: 2025-01-09T07:12:16.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-02-06T06:31:17.000Z (10 months ago)
- Last Synced: 2025-04-09T19:11:42.551Z (8 months ago)
- Topics: machine-learning, python, random-forest, soccer, sports-analytics
- Language: Jupyter Notebook
- Homepage:
- Size: 8.15 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **La Liga Match Predictor**
---
This project predicts the outcomes of La Liga 24/25 matches based on data from the past five seasons (this specific code is for matchdays 19 and 20). Built using JupyterLab and Python, it uses machine learning with scikit-learn's `RandomForestClassifier` to train a predictive model.
---
## **Overview**
1. **Scraping**: Match data is scraped from [Fbref](https://fbref.com/en/comps/12/La-Liga-Stats) using `requests`, `BeautifulSoup`, and `pandas`.
2. **Preprocessing**: : The raw data is cleaned, transformed, and enhanced with rolling averages and other features for training.
3. **Training**: A `RandomForestClassifier` is trained on historical match data to learn patterns and predict outcomes.
4. **Prediction**: The trained model predicts outcomes for Matchdays 19 and 20 of the 24/25 season, using features such as team form, venue, and other factors.
---
## **Future Plans**
This is a **WORK IN PROGRESS**. While this project currently focuses on Matchdays 19 and 20, I plan to expand predictions for the remainder of the 24/25 season and so on. Planned improvements and features include:
### Upcoming Predictions
- **Remainder of the Season**: As new matches are played, I plan to update the dataset and generate predictions for future matchdays.
### Enhancements
As the project grows, I plan to implement the following enhancements to improve the predictions and functionality:
- **Model Improvements**:
- Incorporating new features, such as:
- **Expected Statistics**: Scraping and integrating expected goals (xG), assists (xA), etc., to help predictions.
- **Player Statistics**: Adding player data to capture individual impacts on match outcomes.
- **Score Predictions**: Extending the model to predict match scores rather than just match outcomes.
- **Visualization**:
- Developing visual tools to better interpret and display results, including:
- Dashboards showing match probabilities, predicted outcomes, and predicted scores.
- Charts highlighting trends in team and player performance across the season.
- **Automation**:
- Streamlining the workflow to automatically:
- Scrape new match data after each gameweek.
- Update rolling averages and other features for predictions.
- Retrain the model periodically as new data becomes available.
---
## **Documentation of Updates**
Progress and changes will be documented in an `UPDATES.md` file in the repository. This file will include:
- **Prediction Updates**: Matchdays and outcomes added to the model.
- **Feature Enhancements**: New features or improvements made to the training process.
---
## **Note to Readers**
This project is a **WORK IN PROGRESS**, and the current model has been designed and validated for Matchdays 19 and 20. As the 24/25 season progresses, I will continue to make predictions, improve the model, and share updates here.
> **Stay Tuned!** Check the `UPDATES.md` file for the latest developments, including new predictions and features.