https://github.com/soham2002/premier-league-predictor
A Random Forest-based machine learning model forecasting English Premier League match outcomes, enhancing precision through rolling averages and dynamic retraining for combined home and away predictions.
https://github.com/soham2002/premier-league-predictor
Last synced: 15 days ago
JSON representation
A Random Forest-based machine learning model forecasting English Premier League match outcomes, enhancing precision through rolling averages and dynamic retraining for combined home and away predictions.
- Host: GitHub
- URL: https://github.com/soham2002/premier-league-predictor
- Owner: soham2002
- Created: 2024-01-21T13:37:52.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-01-25T20:00:22.000Z (over 2 years ago)
- Last Synced: 2025-03-03T01:18:00.018Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 62.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# English Premier League Match Outcome Forecasting with Random Forest
## Overview
This repository contains a machine learning model based on the Random Forest algorithm for forecasting English Premier League match outcomes. The model utilizes various features, including rolling averages and dynamic retraining, to enhance precision in predicting both home and away outcomes.
## Project Structure
The project is organized into different components:
1. **Data Collection and Preprocessing:** The `data` folder contains scripts for collecting and preprocessing the EPL match data. To scrape the data, run the `scrape.py` script. Otherwise, use the `matches.csv` file directly to train and test the model by running the `prediction_model.ipynb` file.
2. **Feature Engineering:** The `features` folder includes tools for creating relevant features for the machine learning model. This involves calculating rolling averages and other dynamic features that capture the teams' recent performances.
3. **Model Training:** The `models` folder contains the main machine learning model implemented using the Random Forest algorithm.
4. **Prediction:** The `predict` folder provides utilities for making predictions on new match data using the trained model.
5. **Evaluation:** The `evaluation` folder includes scripts and notebooks for evaluating the model's performance on historical data. The evaluation process helps fine-tune the model and understand its strengths and weaknesses.
6. **Dynamic Retraining:** The `dynamic_retraining` folder contains scripts for implementing dynamic retraining. This involves updating the model periodically with new data to ensure that it stays relevant and accurate over time.
## Requirements
- Python 3.x
- Required Python packages are listed in the `requirements.txt` file. Install them using:
```bash
pip install -r requirements.txt
```
## Usage
1. **Data Collection and Preprocessing:**
- To scrape EPL match data, run the `scrape.py` script in the `data` folder.
- Alternatively, use the pre-existing `matches.csv` file to train and test the model.
2. **Feature Engineering:**
- Utilize tools in the `features` folder to generate relevant features for the machine learning model.
3. **Model Training:**
- Train the Random Forest model using the implementation in the `models` folder.
4. **Prediction:**
- Use utilities in the `predict` folder for making predictions for a specific set of matches.
5. **Evaluation:**
- Utilize scripts and notebooks in the `evaluation` folder to assess the model's performance on historical data.
6. **Dynamic Retraining:**
- Periodically implement dynamic retraining strategies from the `dynamic_retraining` folder to update the model with new data.
## Contributing
If you would like to contribute to this project, please follow the standard GitHub flow: fork the repository, create a branch, make your changes, and submit a pull request.