https://github.com/lkethridge/integrated_project_2
Integrated Project 2 from TripleTen
https://github.com/lkethridge/integrated_project_2
anomaly-detection cross-validation data-analytics data-cleaning-and-preprocessing data-science feature-engineering gold-recovery machine-learning metal-purification model-evaluation pandas portfolio-project python scikit-learn smape supervised-learning
Last synced: 2 months ago
JSON representation
Integrated Project 2 from TripleTen
- Host: GitHub
- URL: https://github.com/lkethridge/integrated_project_2
- Owner: LKEthridge
- License: cc0-1.0
- Created: 2025-01-21T06:00:30.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-22T16:00:17.000Z (over 1 year ago)
- Last Synced: 2025-03-20T15:14:01.310Z (over 1 year ago)
- Topics: anomaly-detection, cross-validation, data-analytics, data-cleaning-and-preprocessing, data-science, feature-engineering, gold-recovery, machine-learning, metal-purification, model-evaluation, pandas, portfolio-project, python, scikit-learn, smape, supervised-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 15 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Integrated_Project_2
## *This was an Integrated skill project for TripleTen. ๐ฉ๐ฝโ๐ป*
This project developed a machine learning solution for predicting gold recovery at the rougher and final stages of ore processing using datasets with over 80 parameters. A Multi-Output Random Forest Regression model provided the most accurate predictions during training, with Linear Regression as a viable, less computationally intensive alternative. Despite underperforming compared to constant benchmarks on the test set, the models demonstrate the potential for data-driven optimization of industrial processes.
## Skills Highlighted
๐ Python
๐ฉ๐ฝโ๐ป Data Science
๐ค Machine Learning
๐งช Scikit Learn
โ Cross Validation
๐ผ pandas
๐ Data Analytics
๐ Supervised Learning
โ๏ธ Feature Engineering
๐ฏ Model Evaluation
๐ต๐ฝโโ๏ธ Anomaly Detection
๐งผ Data Cleaning and Preprocessing
## Installation & Usage
* This project uses pandas, numpy, RandomForestRegressor, MultiOutputRegressor, LinearRegression, mean_squared_error, mean_absolute_error, make_scorer, matplotlib.pyplot, shuffle, StandardScaler, seaborn, SimpleImputer, cross_val_score, KFold, and RandomizedSearchCV. It requires python 3.9.6. There is one additional file containing the full, unsplit test set that I was unable to upload due to upload limitations.