{"id":17760893,"url":"https://github.com/satyampurwar/ml-engineering","last_synced_at":"2026-04-12T06:34:05.628Z","repository":{"id":259574559,"uuid":"878804550","full_name":"satyampurwar/ml-engineering","owner":"satyampurwar","description":"Developing and deploying machine learning models while adhering to engineering best practices.","archived":false,"fork":false,"pushed_at":"2024-10-26T09:58:23.000Z","size":13684,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-01T13:18:12.083Z","etag":null,"topics":["api","api-testing","conda-environment","configuration-management","docker-container","docker-image","dockerfile","jupyter-notebooks","logging","machine-learning-algorithms","mlflow","pytest","python","quality-assurance","reproducibility","shell-scripting","sphinx-documentation"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/satyampurwar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-26T06:38:44.000Z","updated_at":"2024-10-26T10:06:35.000Z","dependencies_parsed_at":"2024-10-26T23:05:59.918Z","dependency_job_id":null,"html_url":"https://github.com/satyampurwar/ml-engineering","commit_stats":null,"previous_names":["satyampurwar/ml-engineering"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/satyampurwar/ml-engineering","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/satyampurwar%2Fml-engineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/satyampurwar%2Fml-engineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/satyampurwar%2Fml-engineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/satyampurwar%2Fml-engineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/satyampurwar","download_url":"https://codeload.github.com/satyampurwar/ml-engineering/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/satyampurwar%2Fml-engineering/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31706764,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-12T06:22:27.080Z","status":"ssl_error","status_checked_at":"2026-04-12T06:21:52.710Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","api-testing","conda-environment","configuration-management","docker-container","docker-image","dockerfile","jupyter-notebooks","logging","machine-learning-algorithms","mlflow","pytest","python","quality-assurance","reproducibility","shell-scripting","sphinx-documentation"],"created_at":"2024-10-26T19:14:17.665Z","updated_at":"2026-04-12T06:34:05.598Z","avatar_url":"https://github.com/satyampurwar.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Machine Learning Engineering\n\nThis repository contains the code and documentation for developing and deploying machine learning models while adhering to engineering best practices.\n\n## Environment Setup\n\n### Virtual Environment\n\n- Navigate to the project directory:\n\n```bash\ncd \u003cbase\u003e/ml-engineering\n```\n\n- Create and activate the conda environment:\n\n```bash\nconda env create --file deploy/conda/linux_py312.yml\nconda activate mle\n```\n\n- Manage dependencies:\n  - Install additional dependencies using conda or pip as needed.\n  - Update environment file: `conda env export --name mle \u003e deploy/conda/linux_py312.yml`\n  - Deactivate environment: `conda deactivate`\n  - Remove environment (if necessary): `conda remove --name mle --all`\n\n## Development Workflow\n\n### Research \u0026 Development\n\n- Reference code: `\u003cbase\u003e/ml-engineering/reference/nonstandardcode`\n- Working notebooks: `\u003cbase\u003e/ml-engineering/notebooks/working`\n\n### Script Development\n\nScripts are derived from working notebooks in `\u003cbase\u003e/ml-engineering/notebooks/working`.\n\n### Setting PYTHONPATH\n\nEnsure the directory containing `housing_value` is in PYTHONPATH:\n\n```bash\nconda env config vars set PYTHONPATH=$(pwd)/src\nconda deactivate\nconda activate mle\necho $PYTHONPATH\n```\n\n### Integrated Features in Scripts\n\n- Argument Parsing: Uses `argparse` for command-line arguments.\n- Configuration Management: Implements `configparser` with `setup.cfg`.\n- Logging: Incorporates `logging` for execution tracking and debugging.\n\n### Code Quality Tools\n\nInstall required tools:\n\n```bash\nsudo apt install black isort flake8\n```\n\n| Tool   | Description     | Usage             |\n|--------|-----------------|-------------------|\n| Black  | Code formatter  | `black \u003cscript.py\u003e` |\n| isort  | Import sorter   | `isort \u003cscript.py\u003e` |\n| flake8 | Linter          | `flake8 \u003cscript.py\u003e` |\n\n**Note:** Configurations are specified in `setup.cfg` and `.vscode/settings.json` (for VS Code users).\n\n### Maintaining Code Quality\n\n```bash\nchmod +x shell/src_quality.sh\n./shell/src_quality.sh\n```\n\n### Script Execution\n\nView available options for each script using the `--help` flag:\n\n```bash\npython src/housing_value/ingest_data.py --help\npython src/housing_value/train.py --help\npython src/housing_value/score.py --help\n```\n\n## Testing \n\nInstall pytest:\n\n```bash\nsudo apt install python3-pytest\n```\n\n**Note:** Configurations are specified in `setup.cfg`.\n\nMaintain test code quality:\n\n```bash\nchmod +x shell/tests_quality.sh\n./shell/tests_quality.sh\n```\n\nRun tests:\n\n```bash\npytest\npytest \u003ctest_directory\u003e/\u003ctest.py\u003e\n```\n\n## Documentation\n\nUsing Sphinx for documentation generation.\n\n### Prerequisites\n\n1. Install the package:\n   - Option 1: Editable mode (dependent on pyproject.toml): produces egg-info folder.\n\n```bash\npip install -e .\n```\n\n   - Option 2: Build and install: produces egg-info folder as well as dist folder containing tar.gz and whl file.\n\n```bash\npython3 -m pip install --upgrade build\npython3 -m build\npip install dist/housing_value-0.0.0-py3-none-any.whl\n```\n\n2. Install Sphinx \u0026 Packages for building documentation:\n\n```bash\nsudo apt install python3-sphinx\npip install sphinx sphinx-rtd-theme matplotlib\npip install sphinxcontrib-napoleon\n```\n\n### Generating Documentation\n\n1. Navigate to the docs directory:\n\n```bash\ncd docs\n```\n\n2. Check configuration files:\n   - Make sure to create Makefile.\n\n3. Generate Sphinx project:\n\n```bash\nsphinx-quickstart\n```\n\n4. Update configuration files:\n   - Modify `source/conf.py` and `source/index.rst` as needed.\n   - Reference files are available in the `reference` directory.\n\n5. Generate API documentation:\n\n```bash\nsphinx-apidoc -o ./source ../src/housing_value\n```\n\n6. Update configuration files:\n   - Modify `source/housing_value.rst` and `source/index.rst` as needed.\n   - Reference files are available in the `reference` directory.\n\n7. Build HTML documentation:\n\n```bash\nmake clean\nmake html\n```\n\n8. Return to the project root:\n\n```bash\ncd ..\n```\n\n**Note:** The documentation file hierarchy in the `source` directory is: `index.rst \u003e modules.rst \u003e housing_value.rst`.\n\n## Application Packaging with MLflow\n\n**Note:** The file hierarchy for MLflow is structured as follows: `MLproject \u003e app.py`.\n\n1. **Maintaining Code Quality**\n\n```bash\nchmod +x shell/app_quality.sh\n./shell/app_quality.sh\n   ```\n\n2. **Tracking UI**: Launch the MLflow tracking server using the command.\n\n```bash\nmlflow server --backend-store-uri mlruns/ --default-artifact-root mlruns/ --host 127.0.0.1 --port 5000\n   ```\n\n3. **Run Experiment**: Execute an experiment to generate a model artifact with the following command.\n\n```bash\nmlflow run . -P \u003cparameters\u003e\n```\n\nThe optional parameter `split_size` defaults to `0.2`.\n\n4. **Python Version Management**: Install `pyenv` for managing Python versions and ensuring reproducibility, which facilitates selecting a specific Python version for the project.\n\n```bash\nchmod +x shell/pyenv.sh\n./shell/pyenv.sh\n```\n\n5. **Activate Conda Environment**: Activate the conda environment created during the experiment execution.\n\n6. **Dependency Installation**: Install the required dependency in activated environment.\n\n```bash\npip install virtualenv\n```\n\n7. **API Endpoint Generation**: Create an API endpoint to serve the model using -\n\n```bash\nmlflow models serve -m mlruns/\u003cexperiment_id\u003e/\u003crun_id\u003e/artifacts/model/ -h 127.0.0.1 -p 1234\n```\n\n8. **Testing API Endpoint**: Test the API endpoint from another terminal with the following formats.\n\n- **Datasplit Format**:\n\n```bash\ncurl -X POST -H \"Content-Type: application/json\" --data '{\"dataframe_split\": {\"columns\": [\"longitude\", \"latitude\", \"housing_median_age\", \"total_rooms\", \"total_bedrooms\", \"population\", \"households\", \"median_income\", \"ocean_proximity\"], \"data\": [[-118.39, 34.12, 29.0, 6447.0, 1012.0, 2184.0, 960.0, 8.2816, \"\u003c1H OCEAN\"]]}}' http://127.0.0.1:1234/invocations \n```\n\n- **Inputs/Instances Format**:\n\n```bash \ncurl -X POST -H \"Content-Type: application/json\" --data '{\"inputs\": [{\"longitude\": -118.39, \"latitude\": 34.12, \"housing_median_age\": 29.0, \"total_rooms\": 6447.0, \"total_bedrooms\": 1012.0, \"population\": 2184.0, \"households\": 960.0, \"median_income\": 8.2816, \"ocean_proximity\": \"\u003c1H OCEAN\"}]}' http://127.0.0.1:1234/invocations \n```\n\n### Deployment Readiness\n\nTo facilitate deployment, Docker images are created by aggregating necessary artifacts and configurations.\n\n1. **Artifact Aggregation:** \n\n- Copy model artifacts (`MLmodel` and `model.pkl`) from `mlruns/\u003cexperiment_id\u003e/\u003crun_id\u003e/artifacts/model` to `\u003cbase\u003e/ml-engineering/deploy/docker/mlruns`. Ensure unnecessary metadata is cleaned from the `MLmodel`.\n\n- Transfer the `requirements.txt` file from `mlruns/\u003cexperiment_id\u003e/\u003crun_id\u003e/artifacts/model` to `\u003cbase\u003e/ml-engineering/deploy/docker`.\n\n- Move the wheel file (`housing_value-0.0.0-py3-none-any.whl`) from the dist directory to `\u003cbase\u003e/ml-engineering/deploy/docker`.\n\n- Copy the `setup.cfg` from the project root to `\u003cbase\u003e/ml-engineering/deploy/docker`, ensuring it contains only data required for inference.\n\n2. **Script and Configuration Creation:**\n\n- Develop script `run.sh` to execute MLflow models serve command.\n\n- Create `.dockerignore` file to ignore copying files in WORKDIR of image/container.\n\n- Construct Dockerfile to package all components into a Docker image, ensuring efficient deployment and scalability.\n\n3. **Image Development:**\n   \n```bash \ncd deploy/docker \n```\n   \n- **Build With Root User:**\n   \n```bash \ndocker build . -t \u003cdockerhub_username\u003e/mle:rootuser -f Dockerfile.rootuser \n```\n   \n- **Build Without Root User for Security:** Enhance security by building an image that does not use the root user.\n   \n```bash \ndocker build . -t \u003cdockerhub_username\u003e/mle:nonrootuser -f Dockerfile.nonrootuser \n```\n   \n- **Use Buildkit for Multistage Builds:** Optimize your image size and build time using Docker Buildkit for multistage builds.\n   \n```bash \nDOCKER_BUILDKIT=1 docker build . -t \u003cdockerhub_username\u003e/mle:multistage -f Dockerfile.multistage \n```\n\n## Container Management\n\nThis section provides detailed instructions for containerizing your application using Docker and testing endpoints.\n\n### Starting and Testing a Container\n\n1. **Start the Container:** Use the following command to start a Docker container named `rootuser` and map port 8080 on your host to port 5000 in the container.\n   \n```bash \ndocker run -dit -p 8080:5000 --name rootuser \u003cdockerhub_username\u003e/mle:rootuser \n```\n\n2. **Test the Endpoint:** Verify that your application is running correctly by sending a POST request to the endpoint using curl.\n   \n```bash \ncurl -X POST -H \"Content-Type: application/json\" --data '{\"dataframe_split\": {\"columns\": [\"longitude\", \"latitude\", \"housing_median_age\", \"total_rooms\", \"total_bedrooms\", \"population\", \"households\", \"median_income\", \"ocean_proximity\"], \"data\": [[-118.39, 34.12, 29.0, 6447.0, 1012.0, 2184.0, 960.0, 8.2816, \"\u003c1H OCEAN\"]]}}' http://127.0.0.1:8080/invocations \n```\n\n### Managing Docker Images\n\n1. **Push Image to Docker Hub:** First, log in to Docker Hub and then push images.\n   \n```bash \ndocker login -u \u003cdockerhub_username\u003e\ndocker push \u003cdockerhub_username\u003e/mle:rootuser \ndocker push \u003cdockerhub_username\u003e/mle:nonrootuser \ndocker push \u003cdockerhub_username\u003e/mle:multistage \n```\n\n2. **List Images and Containers:** To view all Docker images and containers on system.\n   \n- **Images:** \n\n```bash \ndocker image ls \n```\n   \n- **Containers:** \n\n```bash \ndocker ps --all \n```\n\n3. **View Logs:** Access the logs of a running container.\n   \n```bash \ndocker logs \u003ccontainer_name\u003e \n```\n\n4. **Delete Containers and Images:** Remove a specific container or image using these commands:\n\n- **Containers:** \n\n```bash \ndocker rm -f \u003ccontainer_name\u003e \n```\n  \n- **Images:** \n\n```bash \ndocker rmi \u003cimage_name\u003e \n```\n\n### Retesting in a New Environment\n\nTo test your application in a new environment:\n\n1. **Pull Image from Docker Hub:**\n   \n```bash \ndocker pull \u003cdockerhub_username\u003e/mle:rootuser \n```\n\n2. **Start the Container Again:**\n   \n```bash \ndocker run -dit -p 8080:5000 --name rootuser \u003cdockerhub_username\u003e/mle:rootuser \n```\n\n3. **Re-test the Endpoint:** Use the same curl command as before to verify functionality.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsatyampurwar%2Fml-engineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsatyampurwar%2Fml-engineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsatyampurwar%2Fml-engineering/lists"}