Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abeusher/autogluon-test-iris
https://github.com/abeusher/autogluon-test-iris
Last synced: about 24 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/abeusher/autogluon-test-iris
- Owner: abeusher
- Created: 2024-06-12T12:47:23.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-06-13T11:18:20.000Z (5 months ago)
- Last Synced: 2024-06-13T19:00:45.053Z (5 months ago)
- Language: Python
- Size: 386 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# autogluon-test-iris
This repository includes two sample projects:
- /iris_project/ - and example of how to use the autogluon framework in python to create predictive models
- /baseball_project/ - a slightly more advanced example of how to use the autogluon framework to create predictive models___
#### Iris data set & code
I recommend starting with the iris_project. Make 5-minutes to read about [the iris data set on wikipedia.](https://en.wikipedia.org/wiki/Iris_flower_data_set) After that, briefly look over notes about the [iris data set on the UC Irvine website.](https://archive.ics.uci.edu/dataset/53/iris)
In the case of the iris data set, our objective is to create a predictive model so that given different quantitative characteristics about different iris flowers, we can predict which iris species a given observation is about. The inputs (x variables) also so-called independent variables are the measurements of (sepal length, sepal width, petal length, petal width). The output (y variable) also called "depedent" variable is the class/category name of the flower, in this specific case it should be one of these values [Iris Setosa, Iris Versicolour, or Iris Virginica].
The data for this sub-project is contained in /data/iris_train.csv and /data/iris_test.csv
The code uses the python-based [autogluon framework](https://auto.gluon.ai/stable/index.html) which simplifies making machine learning models like Random Forests.
#### Description of code in /iris_project/
**01_build_autogluon_model.py** - this file loads training data (line 6) used to build a model, test data (line 7) used to evaluate the new model, and creates a new predictive model (line 12). Autogluon automatically tries a bunch of different model types including two types of random forests, a neural network, and gradient boosted trees. More details about what the different predictive model types are is included in the markdown file [model_descriptions.md](model_descriptions.md).
**02_evaluate_model.py** - this file shows how to print out the performance of different model types and also creates a "model leaderboard" that ranks the quality of the different models produced.
**03_use_model_for_prediction.py** - this file shows how to use a trained predictive model to make a prediction for a single input value (e.g., if we have the quantitative attributes of a flower, predict which species/category of flower that it is).
**04_load_specific_model_and_predict.py** - this file is similar to (03_use_model_for_prediction.py) but on line 52 it configures the predictor object to use a specific named model (in this case 'RandomForestGini' which is defined on line 6).
#### baseball data set & code
Instead of data on flowers, this project includes attributes of different baseball players. There are multiple columns that are "dependent" values (Y variables) that relate to players having a specific position. Rich L can provide additional details on the data.
The data for this sub-project is contained in /data/raw_data.csv and /data/raw_data_test.csv
#### Description of code in /baseball_project/
The code is organized similar to the iris project.
**00_unzip_data.py** - this file unzips the compressed data found in /data/raw_data.csv.gz
**01_build_autogluon_model.py** - this file loads training data (line 6) used to build a model, and drops some columns that we do not want to be used for analysis (lines 9-10 and lines 13-15). Autogluon automatically tries a bunch of different model types including two types of random forests, a neural network, and gradient boosted trees. More details about what the different predictive model types are is included in the markdown file [model_descriptions.md](model_descriptions.md).
**02_evaluate_model.py** - this file shows how to print out the performance of different model types and also creates a "model leaderboard" that ranks the quality of the different models produced.
**03_use_model_for_prediction.py** - this file shows how to use a trained predictive model to make a prediction for a single input value (e.g., if we have the quantitative attributes of a flower, predict which species/category of flower that it is).
**04_load_specific_model_and_predict.py** - this file is similar to (03_use_model_for_prediction.py) but on line 52 it configures the predictor object to use a specific named model (in this case 'RandomForestGini' which is defined on line 6).
___
## installing autogluon pre-requisited on macos
Getting autogluon setup on a Mac is kind of a pain. I recommend running it on Linux or Windows if possible. But if you want to try out the code on a mac, here are some installation instructions that may help.
### Uninstall "LLVM OpenMP library" (aka "libomp") if it was previous installed
brew uninstall -f libomp### install specific version of libomp
wget https://raw.githubusercontent.com/Homebrew/homebrew-core/fb8323f2b170bd4ae97e1bac9bf3e2983af3fdb0/Formula/libomp.rb### install libomp
brew install libomp.rb### remove the installer script
rm libomp.rbSee also: https://auto.gluon.ai/stable/install.html
### installing autogluon
python3 -m pip install -r requirements.txt