{"id":16391335,"url":"https://github.com/queirozfcom/flask-sklearn-seed","last_synced_at":"2025-10-26T13:31:45.618Z","repository":{"id":37596614,"uuid":"138946265","full_name":"queirozfcom/flask-sklearn-seed","owner":"queirozfcom","description":"Template for a simple API server for serving a scikit-learn model using flask.","archived":false,"fork":false,"pushed_at":"2018-08-02T03:08:24.000Z","size":29,"stargazers_count":4,"open_issues_count":1,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-31T19:29:28.518Z","etag":null,"topics":["flask","machine-learning","machine-learning-production"],"latest_commit_sha":null,"homepage":"http://queirozf.com/entries/example-project-template-serve-a-scikit-learn-model-via-a-flask-api","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/queirozfcom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-28T00:19:34.000Z","updated_at":"2022-06-21T21:47:56.000Z","dependencies_parsed_at":"2022-08-25T19:10:45.154Z","dependency_job_id":null,"html_url":"https://github.com/queirozfcom/flask-sklearn-seed","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/queirozfcom%2Fflask-sklearn-seed","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/queirozfcom%2Fflask-sklearn-seed/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/queirozfcom%2Fflask-sklearn-seed/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/queirozfcom%2Fflask-sklearn-seed/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/queirozfcom","download_url":"https://codeload.github.com/queirozfcom/flask-sklearn-seed/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238337614,"owners_count":19455342,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flask","machine-learning","machine-learning-production"],"created_at":"2024-10-11T04:45:47.276Z","updated_at":"2025-10-26T13:31:45.295Z","avatar_url":"https://github.com/queirozfcom.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"flask-sklearn-seed\n==============================\n\u003e View full post here: http://queirozf.com/entries/example-project-template-serve-a-scikit-learn-model-via-a-flask-api\n\nThis is a **full** template for building a simple flask-based API and server that serves a trained Scikit-learn model.\n\n\u003e It is not meant for production, just for development purposes\n\n- Includes:\n\n - API and code-level tests\n\n - Logging\n\n - Error handling\n\n - CLI for training the model\n\n - Input validation using JSON Schema\n\n## Quickstart\n\n- Clone the project\n\n    ```\n    $ git clone git@github.com:queirozfcom/flask-sklearn-seed.git\n    Cloning into 'flask-sklearn-seed'...\n    ```\n\n- create Python 3 virtualenv, activate virtualenv\n\n    ```\n    $ cd flask-sklearn-seed\n    $ virtualenv -p python3 venv3\n    $ source venv3/bin/activate\n    ```\n\n- install requirements-dev\n\n    ```\n    $ pip install -r requirements-dev.txt\n    ```\n\n- train the model using the dummy data:\n\n    ```\n    $ python -m app.models.train_model data/raw/training.csv v0\n\n    Will train model v0 using the file at: /home/felipe/flask-sklearn-seed/data/raw/training.csv\n\n    training set has 7500 rows\n    validation set has 2500 rows\n    0.985957111012551\n    Successfully saved model at /home/felipe/flask-sklearn-seed/trained-models/trained-model-v0.p\n    ```\n\n- start the server\n\n    ```\n    $ python -m app.app\n    * Serving Flask app \"app\" (lazy loading)\n    * Environment: production\n        WARNING: Do not use the development server in a production environment.\n        Use a production WSGI server instead.\n    * Debug mode: off\n    * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)\n    ```\n\n## Using the app\n\n- Training via the CLI\n\n    - To train a model: `$ python -m app.models.train_model \u003cpath/to/training_set.csv\u003e \u003cversion-number\u003e`\n\n- Tests\n\n    - To run utils tests: `$ python -m tests.utils_tests`\n\n    - To run API tests: `$ python -m tests.web_tests`\n\n- Starting the server\n\n    ```\n    $ python -m app.app\n     * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)\n    ```\n\n## Code Organization\n\nThis is how this project's code is structured.\n\nLoosely based on [Queirozf.com: How to Structure Software Projects: Python Examples](http://queirozf.com/entries/how-to-structure-software-projects-python-example)\nand [Cookie Cutter Data Science](https://drivendata.github.io/cookiecutter-data-science/)\n\n```\n.\n│\n├── README.md                       \u003c----- this file\n│\n├── app\n│   ├── app.py                      \u003c----- main project file. contains routes and initialization code\n│   │\n│   ├── settings.py\n│   │\n│   ├── helpers                     \u003c----- helpers contain helper code that is SPECIFIC to this application\n│   │   ├── features.py                              they are placed here so as not to overly pollute the business logic\n│   │   ├── files.py                                 with scaffolding code.\n│   │   └── validation.py\n│   │\n│   ├── models                      \u003c----- code for training models\n│   │   └── train_model.py\n│   │\n│   └── utils                       \u003c----- utils contain helper code that is NOT SPECIFIC to this application,\n│       └── files.py                                i.e. it could be extracted and used elsewhere\n│\n├── data                            \u003c----- data files, intermediate representation, if needed.\n│   ├── interim\n│   ├── processed\n│   └── raw\n│       └── training_set.csv\n│\n├── logs                            \u003c----- logs folder\n│   └─ application.log\n│\n├── notebooks                       \u003c----- jupyter notebooks for data exploration and analyses\n│   └── view-data.ipynb\n│\n├── requirements-dev.txt            \u003c----- packages required to DEVELOP this project (train model, notebooks, tests, CLI commands)\n├── requirements-prod.txt           \u003c----- packages required to DEPLOY this project (only serves the API)\n│\n├── tests                           \u003c----- test code\n│   ├── utils_tests.py\n│   └── web_tests.py\n│\n├── trained-models                  \u003c------ trained models (serialized) are kept here\n│   ├── trained-model-v0.p\n│   ├── trained-model-v1.p\n│   └── ...\n│\n└── venv3                           \u003c------ python virtualenv\n```\n\n## API Docs\n\n### Healthcheck\n\nA simple healthcheck, to be used for monitoring (e.g. in AWS Elastic Beanstalk) a given model version.\n\n**Example: Correct Request, valid version**\n\n```\nREQUEST\nGET /v0/healthcheck\nRESPONSE 200\nOK\n```\n\n**Example: Correct Request, invalid version**\n\n```\nREQUEST\nGET /v31254/healthcheck\nRESPONSE 200\nNot OK\n```\n\n### Predict\n\nReturns a prediction, calculated by a previously trained model, whose version is `\u003cversion\u003e`.\n\n**Example: Correct Request**\n\n```\nREQUEST\nPOST /v0/predict\n{\n    \"id\": \"2\",\n    \"x_1\": -2.0,\n    \"x_2\": -0.414120,\n    \"x_3\": 0.2131,\n    \"x_4\": -1.2\n}\nRESPONSE 200\n{\n    \"id\": \"2\",\n    \"prediction\": 0.8077\n}\n```\n\n**Example: Model version not found**\n\n```\nREQUEST\nPOST /v43287/predict\n{\n    \"id\": \"19826478126\",\n    \"x_1\": 1.0,\n    \"x_2\": -0.414120,\n    \"x_3\": 0.2131,\n    \"x_4\": -1.2\n}\nRESPONSE 404\n{\n    \"message\": \"Trained model version 'v43287' was not found.\"\n}\n```\n\n**Example: Invalid request arguments**\n\n```\nREQUEST\nPOST /v0/predict\n{\n    \"id\": \"126\",\n    \"x_1\": 1.0,\n    \"x_2\": -2.2\n}\nRESPONSE 400\n{\n    \"message\": \"Missing keys: 'x_3', 'x_4'\"\n}\n```\n\n## Other info\n\n### Logging\n\nLogging is needed for keeping track of how people use your app (collect usage metrics) and to help diagnose errors in case something goes wrong.\n\nI've used an external package (`concurrent-log-handler`) because the default `RotatingFileHandler` does not support   compression of old log files. This is to make sure logging itself doesn't cause problems due to lack of disk space.\n\n### Caching\n\nThere are a couple of caching mechanisms for flask (e.g. https://github.com/sh4nks/flask-caching) but, since Logistic Regression is an eager learning method (i.e. inference is quite fast because most of the work is done at training time), it didn't seem to be worth the extra complexity.\n\nMaybe if you are using lazy methods (such as k-NN), caching would be more useful.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqueirozfcom%2Fflask-sklearn-seed","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqueirozfcom%2Fflask-sklearn-seed","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqueirozfcom%2Fflask-sklearn-seed/lists"}