Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nafisalawalidris/recipe-site-traffic-prediction
Recipe Site Traffic Prediction: Utilising machine learning to forecast high traffic recipes on a recipe website. Improve user engagement and traffic with data-driven decisions.
https://github.com/nafisalawalidris/recipe-site-traffic-prediction
data-driven-decisions data-science forecast high-traffic machine-learning python recipe-site traffic-prediction user-engagement
Last synced: about 1 month ago
JSON representation
Recipe Site Traffic Prediction: Utilising machine learning to forecast high traffic recipes on a recipe website. Improve user engagement and traffic with data-driven decisions.
- Host: GitHub
- URL: https://github.com/nafisalawalidris/recipe-site-traffic-prediction
- Owner: nafisalawalidris
- Created: 2023-08-05T21:55:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-08-05T22:42:41.000Z (over 1 year ago)
- Last Synced: 2024-01-27T00:45:47.415Z (11 months ago)
- Topics: data-driven-decisions, data-science, forecast, high-traffic, machine-learning, python, recipe-site, traffic-prediction, user-engagement
- Language: Python
- Homepage:
- Size: 31.3 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Recipe Site Traffic Prediction
Welcome to the Recipe Site Traffic Prediction project repository. This project aims to predict high traffic recipes on a recipe website using machine learning techniques. The predictions will help the website's product manager make data-driven decisions to improve user engagement and overall traffic on the website.
## Project Overview
The goal of this project is to build a predictive model that can accurately identify recipes that are likely to generate high traffic. By leveraging historical data on recipe features and traffic patterns, we can train a machine learning model to predict the likelihood of a recipe being popular and generating high traffic.## Data Validation and Cleaning
The dataset contains 947 rows and 8 columns. To ensure data integrity and consistency, we performed data validation and cleaning steps for each column:- Handled missing values in the "calories," "carbohydrate," "sugar," and "protein" columns by filling them with the mean of their respective groups based on "category" and "servings".
- Unified the "Chicken Breast" category with the "Chicken" category to maintain consistency in categories.
- Unified extra values "4 as a snack" and "6 as a snack" in the "servings" column with "4" and "6," respectively, and changed the column type to integer.
- Replaced null values in the "high_traffic" column with "Low" to ensure all recipes have a traffic label.## Exploratory Analysis
We conducted exploratory data analysis to gain insights into the data and answer specific questions:- Explored two different types of graphics showing single variables, such as histograms and box plots.
- Created graphics showing two or more variables, like scatter plots and bar charts.
- Identified key findings, such as the most popular recipe categories and the correlation between recipe attributes and traffic.## Model Development
We built two models to predict high traffic recipes:- Fitted a baseline Logistic Regression model to establish a benchmark.
- Fitted a comparison Linear Support Vector Classification (SVC) model to evaluate its performance against the baseline.## Model Evaluation
Based on evaluation metrics, the Logistic Regression model outperforms the Linear SVC model in predicting high traffic recipes. The Logistic Regression model achieves a Precision of 0.82, Recall of 0.80, and F1 Score of 0.81, while the Linear SVC model achieves a Precision of 0.80, Recall of 0.77, and F1 Score of 0.79.## Metric for Monitoring
The primary business goal is to predict recipes with high traffic accurately. The chosen metric for monitoring is the accuracy of predictions for high traffic recipes. The Logistic Regression model achieves an accuracy of 76%, indicating a better-performing model compared to the Linear SVC model, which has an accuracy of 74%.## Recommendations
Based on the analysis and model evaluation, we recommend the following actions to the business:- Deploy the Logistic Regression model into production to predict high traffic recipes in real-time.
- Collect additional data, such as time to make, cost per serving, ingredients, and site duration time, to improve model performance.
- Implement feature engineering techniques to create more meaningful features and increase the number of categories.
- Monitor the accuracy of predictions regularly to ensure the model's performance remains consistent.
- Investigate further to identify other factors influencing recipe popularity and traffic.## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.## Contributing
Feel free to contribute to this project by creating pull requests or raising issues in the repository. Your contributions are valuable and appreciated!## Get Started
To get started with the project, follow these steps:1. Clone the repository: `git clone https://github.com/elfeenah/recipe-traffic-prediction.git`
2. Install the required dependencies: `pip install -r requirements.txt`
3. Run the project: `python main.py`## Project Team
This project was completed by:- Nafisa Lawal Idris