https://github.com/soham2002/premier-league-predictor

A Random Forest-based machine learning model forecasting English Premier League match outcomes, enhancing precision through rolling averages and dynamic retraining for combined home and away predictions.
https://github.com/soham2002/premier-league-predictor

Last synced: 15 days ago
JSON representation

Host: GitHub
URL: https://github.com/soham2002/premier-league-predictor
Owner: soham2002
Created: 2024-01-21T13:37:52.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2024-01-25T20:00:22.000Z (over 2 years ago)
Last Synced: 2025-03-03T01:18:00.018Z (over 1 year ago)
Language: Jupyter Notebook
Size: 62.5 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# English Premier League Match Outcome Forecasting with Random Forest

## Overview

This repository contains a machine learning model based on the Random Forest algorithm for forecasting English Premier League match outcomes. The model utilizes various features, including rolling averages and dynamic retraining, to enhance precision in predicting both home and away outcomes.

## Project Structure

The project is organized into different components:

1. **Data Collection and Preprocessing:** The `data` folder contains scripts for collecting and preprocessing the EPL match data. To scrape the data, run the `scrape.py` script. Otherwise, use the `matches.csv` file directly to train and test the model by running the `prediction_model.ipynb` file.

2. **Feature Engineering:** The `features` folder includes tools for creating relevant features for the machine learning model. This involves calculating rolling averages and other dynamic features that capture the teams' recent performances.

3. **Model Training:** The `models` folder contains the main machine learning model implemented using the Random Forest algorithm.

4. **Prediction:** The `predict` folder provides utilities for making predictions on new match data using the trained model.

5. **Evaluation:** The `evaluation` folder includes scripts and notebooks for evaluating the model's performance on historical data. The evaluation process helps fine-tune the model and understand its strengths and weaknesses.

6. **Dynamic Retraining:** The `dynamic_retraining` folder contains scripts for implementing dynamic retraining. This involves updating the model periodically with new data to ensure that it stays relevant and accurate over time.

## Requirements

- Python 3.x
- Required Python packages are listed in the `requirements.txt` file. Install them using:

```bash
pip install -r requirements.txt
```

## Usage

1. **Data Collection and Preprocessing:**
- To scrape EPL match data, run the `scrape.py` script in the `data` folder.
- Alternatively, use the pre-existing `matches.csv` file to train and test the model.

2. **Feature Engineering:**
- Utilize tools in the `features` folder to generate relevant features for the machine learning model.

3. **Model Training:**
- Train the Random Forest model using the implementation in the `models` folder.

4. **Prediction:**
- Use utilities in the `predict` folder for making predictions for a specific set of matches.

5. **Evaluation:**
- Utilize scripts and notebooks in the `evaluation` folder to assess the model's performance on historical data.

6. **Dynamic Retraining:**
- Periodically implement dynamic retraining strategies from the `dynamic_retraining` folder to update the model with new data.

## Contributing

If you would like to contribute to this project, please follow the standard GitHub flow: fork the repository, create a branch, make your changes, and submit a pull request.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/soham2002/premier-league-predictor

Awesome Lists containing this project

README