{"id":19423328,"url":"https://github.com/ml-tooling/ml-project-template","last_synced_at":"2025-10-20T11:22:31.196Z","repository":{"id":98822160,"uuid":"197639389","full_name":"ml-tooling/ml-project-template","owner":"ml-tooling","description":"ML project template facilitating both research and production phases.","archived":false,"fork":false,"pushed_at":"2019-08-05T15:50:39.000Z","size":70,"stargazers_count":111,"open_issues_count":0,"forks_count":30,"subscribers_count":4,"default_branch":"develop","last_synced_at":"2025-09-12T03:12:47.378Z","etag":null,"topics":["docker","machine-learning","reproducibility"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ml-tooling.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-18T18:41:21.000Z","updated_at":"2025-08-11T21:21:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"c23bed83-2a9d-4bed-91c2-c8c3e82db7bc","html_url":"https://github.com/ml-tooling/ml-project-template","commit_stats":null,"previous_names":[],"tags_count":0,"template":true,"template_full_name":null,"purl":"pkg:github/ml-tooling/ml-project-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-tooling%2Fml-project-template","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-tooling%2Fml-project-template/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-tooling%2Fml-project-template/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-tooling%2Fml-project-template/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ml-tooling","download_url":"https://codeload.github.com/ml-tooling/ml-project-template/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-tooling%2Fml-project-template/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280081629,"owners_count":26268588,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-20T02:00:06.978Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","machine-learning","reproducibility"],"created_at":"2024-11-10T13:38:03.303Z","updated_at":"2025-10-20T11:22:31.132Z","avatar_url":"https://github.com/ml-tooling.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ML Project Template\n\nThis repository contains a template project that can be easily adapted for all kinds of Machine Learning tasks. \nTypically, solving such task entails two main phases, _research_ and _production_ with very different focuses. The template intends to faciliatate work on ML projects by guiding practitioners to adopt some best practices.\n\n[`research`](./research): exploratory data analyses, model prototyping and experiments are dumped here in a structured way\n\n[`production`](./production): distilled utils lib, training job and inference service are implemented here\n\nIt is recommended to simply clone this repo and customize it to the specific use-case at hand.\n\n---\n\n## Repository Structure\n\n- **[research](./research)**: Scripts and Notebooks for experimentation.\n  - **[develop](./research/develop)** (Python): Experimental code to try out new ideas and experiments. Use Jupyter notebooks wherever you can. Naming convention: `YYYY-MM-DD_userid_short-description`. If you cannot use a notebook and have multiple scripts/files for an experiment, create a folder with the same naming convention. Each file should be handled by one person only.\n  - **[deliver](./research/deliver)** (Python): Refactored notebooks that contain valuable insights or results (e.g. visualizations, training runs). Notebooks should be refactored, documented, contain outputs, and use the following naming schema: `YYYY-MM-DD_short-description`. Notebooks in deliver should not be changed or rerun. If you want to rerun a deliver Notebook, please duplicate it into the develop folder.\n  - **[templates](./research/templates)** (Python): Refactored Notebooks that are reusable for a specific task (e.g. model training, data exploration). Notebooks should be refactored, documented, not contain any output, and use the following naming schema: `short-description`. If you like to make use of a template Notebook, duplicate the notebook into develop folder.\n- **[production](./production)**: The production-ready solution(s) composed of libraries, services, and jobs.\n  - **[python-utils-lib](./production/python-utils-lib)** (Python): Utility functions that are distilled from the research phase and used across multiple scripts. Should only contain refactored and tested Python scripts/modules. Installable via pip.\n  - **[training-job](./production/training-job)** (Python/Docker): Combines required data exports, preprocessing and training scripts into a Docker container. This makes results reproducible and the production model retrainable in _any_ ennvironment.\n  - **[inference-service](./production/inference-service)** (Python/Docker): Docker container that provides the final model prediction capabilities via a REST API.\n\n## Naming Conventions\n\n### Code Artifacts\n\n- develop notebooks/scripts: `YYYY-MM-DD_userid_short-description`\n- deliver notebooks/scripts: `YYYY-MM-DD_short-description`\n- template notebooks/scripts: `short-description`\n- services: `-service` suffix\n- jobs: `-job` suffix\n- libraries: `-lib` suffix\n\n### Files\n\n`\u003cdataset-desc\u003e_\u003cpreprocessing-desc\u003e_\u003ctraining-desc\u003e.\u003cfiletype\u003e`\n\n#### Examples:\n\n- `blogs-metadata.csv`\n- `blogs-metadata_cl-rs_ft-vec.vectors`\n- `categories2blogs_cl-rs-lm_tfidf-lsvm.model.zip`\n- `categories2blogs-questions_cl-rs-lm_tfidf-lsvm.model.zip`\n\n#### Name Identifier Descriptions: \n\n\u003ctable\u003e\n    \u003ctr\u003e\n        \u003cth\u003eName\u003c/th\u003e\n        \u003cth\u003eDescription\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd colspan=\"2\"\u003e\u003cb\u003eDataset Identifiers:\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003ecategories2blogs\u003c/td\u003e\n        \u003ctd\u003eDataset containing blogs with the text content, blogs item URI, and connected primary tags.\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eblogs-metadata\u003c/td\u003e\n        \u003ctd\u003eDataset containing all blogs and related metadata (properties).\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd colspan=\"2\"\u003e\u003cb\u003ePreprocessing Identifiers:\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n     \u003ctr\u003e\n        \u003ctd\u003ecl\u003c/td\u003e\n        \u003ctd\u003eDefault text cleaning (lowercasing, regex cleaning).\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003ers\u003c/td\u003e\n        \u003ctd\u003eRemove Stopwords.\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003elm\u003c/td\u003e\n        \u003ctd\u003eText lemmatization.\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd colspan=\"2\"\u003e\u003cb\u003eTraining Identifiers:\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003eft-vec\u003c/td\u003e\n        \u003ctd\u003eText vectorizer using Fasttext.\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003etfidf\u003c/td\u003e\n        \u003ctd\u003eText vectorizer using TFIDF.\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003elsvm\u003c/td\u003e\n        \u003ctd\u003eClassifier using linear SVM.\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd colspan=\"2\"\u003e\u003cb\u003eFiletype Identifiers:\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e.model\u003c/td\u003e\n        \u003ctd\u003eModel file.\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e.vectors\u003c/td\u003e\n        \u003ctd\u003eBinary vectors file.\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-tooling%2Fml-project-template","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fml-tooling%2Fml-project-template","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-tooling%2Fml-project-template/lists"}