{"id":23429889,"url":"https://github.com/chuvalniy/mlops-practices","last_synced_at":"2026-04-11T01:41:28.765Z","repository":{"id":197434063,"uuid":"695294198","full_name":"chuvalniy/mlops-practices","owner":"chuvalniy","description":"Target classification with MLOps practices (CI/CD, Docker, Cloud services, etc...)","archived":false,"fork":false,"pushed_at":"2024-01-30T08:55:54.000Z","size":1139,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-15T08:14:56.653Z","etag":null,"topics":["ci-cd","classifier","click","cloud-services","data-analysis-python","data-science","dvc","fastapi","git","jupyter-notebook","machine-learning","mlops","pandas","python","scripting","sklearn","unit-testing"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chuvalniy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-09-22T19:41:10.000Z","updated_at":"2024-01-28T04:09:26.000Z","dependencies_parsed_at":"2024-01-29T07:53:52.110Z","dependency_job_id":"23bb9d7b-a23d-4cfa-b3f5-01ddf5629061","html_url":"https://github.com/chuvalniy/mlops-practices","commit_stats":null,"previous_names":["chuvalniy/star-trek-script-generator","chuvalniy/mlops-practices"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chuvalniy%2Fmlops-practices","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chuvalniy%2Fmlops-practices/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chuvalniy%2Fmlops-practices/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chuvalniy%2Fmlops-practices/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chuvalniy","download_url":"https://codeload.github.com/chuvalniy/mlops-practices/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248054195,"owners_count":21039952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ci-cd","classifier","click","cloud-services","data-analysis-python","data-science","dvc","fastapi","git","jupyter-notebook","machine-learning","mlops","pandas","python","scripting","sklearn","unit-testing"],"created_at":"2024-12-23T08:13:38.723Z","updated_at":"2026-04-11T01:41:22.350Z","avatar_url":"https://github.com/chuvalniy.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Overview\nMachine Learning application of classifying people for bad habits based on medical indicators with extensive use of MLOps practices.\n\n## Installation\n### Prerequisites\nThe project is structured according to microservice architecture so make sure you have Docker installed on your computer.\n\n### Clone repository \u0026 install dependencies\n```sh\ngit clone https://github.com/chuvalniy/mlops-practices.git\npip install -r requirements.txt\n```\n### Create \u0026 update credentials\nCreate an *.env* file you project root directory and copy variables from *.env-example* file. By default, *.env-example* has settings to run project locally, so no need to update credentials.\n\nThe next step is to create credentials for S3 storage. Go to the to your user's directory (i.e. C:\\Users\\MyUser) and create a folder called *.aws*. In this directory create a file called *credentials* and put this into file.\n```sh\n[default]\naws_access_key_id=minioadmin\naws_secret_access_key=minioadmin\naws_bucket_name=arts\n\n[admin]\naws_access_key_id=minioadmin\naws_secret_access_key=minioadmin\n```\n\nCaution: aws_bucket_name should have the same content as **AWS_S3_BUCKET**\n\nAfter all these steps you should have the following directory path C:\\Users\\MyUser\\.aws\\credentials.\n\nThese are default credentials in case if you're running this project locally and didn't make any changes in *.env* file.\n\n### Run Docker\nNavigate to project root directory and run docker containers.\n```sh\ndocker-compose up -d --build\n```\n### Create S3 Bucket in Minio\nTo make mlflow be able to store model artifacts in S3 we need to make a bucket in S3 storage. \n\nNavigate to Minio console, by default the link is http://localhost:9001/. \n\nIn the console you can see Buckets tab so open it. Click Create new bucket and call it *arts*. The name should be the same as your **AWS_S3_BUCKET** variable in the *.env* file.\n\n### Attention (Windows)\nThis step is only necessary if you intend to use your experiments to deploy an ML service in the future. The project should work without it, but the solution below may solve some of your problems.\n\nIf you want to serve mlflow models locally on you machine, you have to set **MLFLOW_S3_ENDPOINT_URL** additionally in your PowerShell so mlflow can connect to Minio S3.\n```sh\n$env:MLFLOW_S3_ENDPOINT_URL = \"http://localhost:9000\"\nmlflow models serve \n```\n\n## How to use\nIf you installed everything correctly, then this step will be simple.\n\n### Execute pipeline\nExecute this in project's root directory.\n```sh\ndvc pull\n```\n\nRun machine learning training pipeline.\n```sh\ndvc repro\n```\n### [Optional] Change model \u0026 tune hypeparameters.\nYou can choose your own hyperparameters or change the model (Random Forest by default) by modifying  **train.py** file. \n```sh\n# Define parameters and model.\nparams = {\n    \"max_depth\": 3,\n    \"n_estimators\": 100,\n    \"random_state\": RANDOM_STATE\n}\nmodel = RandomForestClassifier(**params)\n```\n\n## Documenation\nIn general, all the code is covered with docstrings and comments about what each component does, but there are some points that cannot be particularly described. Below is a description of the architecture, tech stack used and the data source.\n### Architecture\nIf you want to check app architecture I suggest you to visit [this](docs/architecture.png) link.\n\n### Stack\nA more detailed description of each library that was used to create this application can be found [here](docs/stack.md).\n\n### Data\nTraining data can be found on [Kaggle](https://www.kaggle.com/datasets/sooyoungher/smoking-drinking-dataset). If you are interested in exploratory data analysis, you can find it at this [link](notebooks/) in two Jupyter Notebooks.\n## Testing\nAlmost every function is provided with unit test via [pytest](https://docs.pytest.org/en/stable/contents.html) and [Click](https://github.com/pallets/click) libraries.\n\nExecute the following command in your project directory to run the tests. \n\n```python\npytest -v\n```\n\n## License\n// add\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchuvalniy%2Fmlops-practices","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchuvalniy%2Fmlops-practices","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchuvalniy%2Fmlops-practices/lists"}