https://github.com/hiejulia/ml-pipeline
End to end production ML pipeline
https://github.com/hiejulia/ml-pipeline
Last synced: 6 months ago
JSON representation
End to end production ML pipeline
- Host: GitHub
- URL: https://github.com/hiejulia/ml-pipeline
- Owner: hiejulia
- Created: 2020-09-12T08:16:54.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-09-30T16:32:15.000Z (about 5 years ago)
- Last Synced: 2025-02-08T21:46:21.096Z (8 months ago)
- Language: Python
- Size: 122 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![]()
# ML pipeline project
- End to end ML pipeline- Data ingest
- Data validation
- Data processing
- Model training
- Model analysis and validation
- Model deployment
- Data governance - security
- Beam & Airflow
- Kubeflow pipeline# Install
- Python 3.6+
- Tensorflow 2
- TFX pipeline `pip install tfx`
- Airflow ` pip install apache-airflow`
- Kubeflow
- Apache Beam `pip install apache-beam`#### Data Ingestion
#### Data Validation
- `pip install tensorflow-data-validation`
- Data anomalies
- Data schema
- Statistics with new dataset vs prev training dataset
- Missing values, correlation
-#### Data Preprocessing
- `pip install tensorflow-transform`
- Apache Beam
- pandas, numpy
- normalize feature data
- Scale#### Model Training - Analysis - Validation
- `pip install tensorflow-model-analysis`#### Model Deploy
- API
- TensorFlow Serving
- Deploy to cloud
- Model optimization for deployment#### Kubenetes for deployment
#### Beam & Airflow
#### Kubeflow#### Model API
- API endpoint for serving ML model
- Persistent datastore(redis) for store model prediction
- Kafka integration for push model prediction results to monitor topic
- Build docker