{"id":20183041,"url":"https://github.com/papapana/iris_api","last_synced_at":"2025-06-15T11:39:18.633Z","repository":{"id":47446452,"uuid":"242575039","full_name":"papapana/iris_api","owner":"papapana","description":"Sample API for exploring the Iris dataset","archived":false,"fork":false,"pushed_at":"2025-06-06T08:15:41.000Z","size":67,"stargazers_count":1,"open_issues_count":39,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-06T09:26:28.252Z","etag":null,"topics":["docker","docker-compose","fastapi","ml-service","mongodb","pytest","python","rest-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/papapana.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-02-23T19:09:35.000Z","updated_at":"2023-05-20T19:37:25.000Z","dependencies_parsed_at":"2025-06-06T09:32:39.355Z","dependency_job_id":null,"html_url":"https://github.com/papapana/iris_api","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/papapana/iris_api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/papapana%2Firis_api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/papapana%2Firis_api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/papapana%2Firis_api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/papapana%2Firis_api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/papapana","download_url":"https://codeload.github.com/papapana/iris_api/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/papapana%2Firis_api/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259967798,"owners_count":22939516,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-compose","fastapi","ml-service","mongodb","pytest","python","rest-api"],"created_at":"2024-11-14T02:43:48.291Z","updated_at":"2025-06-15T11:39:18.605Z","avatar_url":"https://github.com/papapana.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![deepcode](https://www.deepcode.ai/api/gh/badge?key=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJwbGF0Zm9ybTEiOiJnaCIsIm93bmVyMSI6InBhcGFwYW5hIiwicmVwbzEiOiJpcmlzX2FwaSIsImluY2x1ZGVMaW50IjpmYWxzZSwiYXV0aG9ySWQiOjIxOTg2LCJpYXQiOjE1OTk3NzI1MzR9.-sdFw4efz-C38Ypon5oduZaT2CQX9l_6k0M_BE6zJBQ)](https://www.deepcode.ai/app/gh/papapana/iris_api/_/dashboard?utm_content=gh%2Fpapapana%2Firis_api)\n# Iris dataset exploration API\n\n## Introduction\n\nThis is an exercise to create a production-worthy API for the [iris dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set).\nIt is meant to be used later as a template for more complex applications and datasets.\nSome features suitable for bigger datasets e.g. pagination is already supported.\nThe API can be used both as a REST-API service and a python package used from other applications.\n\n\nThe key highlights are asynchronous API using [FastAPI](https://fastapi.tiangolo.com/), input validation and type-checking\nusing [Pydantic](https://pydantic-docs.helpmanual.io/) and ease of deployment and scaling using [Docker Compose](https://docs.docker.com/compose/).\n\nThe choice of database is a NoSQL one and more particularly [MongoDB](https://www.mongodb.com/).\nAlthough this dataset is very simple and structured and a relational database would be preferable, the exercise is meant\nto be as general as possible and I would like to use it for unstructured or multiple different datasets in the future.\n\nAdditionally, the setup is meant to be capable of scaling as much as possible. For example, FastAPI can scale vertically\nvery well, because we have dockerized the application, it can also be scaled horizontally easily at the level of the API.\nFurthermore, at the database level [MongoDB can also scale in multiple ways](https://www.mongodb.com/mongodb-scale).\n\n## Query structure\n\nAll queries are REST API POST queries with the following structure:\n\n```\n\u003c\u003e below means optional\nThe general query model:\n{\n    \u003cspecies: one or more of 'setosa', 'versicolor' or 'virginica' e.g. \"setosa\" or [\"setosa\", \"virginica\"]\u003e\n    \u003clower: the lower bound by column, default -- no bound, e.g. {\"sepal_length\": 5, \"petal_length\": 3}\u003e\n    \u003cupper: the upper bound by column, default -- no bound, e.g. {\"sepal_length\": 5.2}\u003e\n    \u003cpage: the page number if pagination is used, int \u003e=1 or not provided\u003e\n    \u003cper_page: the results per page if pagination is used, int\u003e=1 or not provided\u003e\n}\n```\n\n## Installation\n\n### With docker -- recommended\n\n1) Make sure [docker is installed](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/)\nis installed as well.\n\n2) Clone the project and cd into it:\n```bash\ngit clone https://github.com/papapana/iris_api.git\ncd iris_api\n```\n\n3) Run the build script, *the following commands might require **sudo** rights*\n```bash \ndocker-compose -f docker-compose.dev.yml up --build iris_api \n```\n\n4) Navigate to [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs) where you can test interactively the API\n    - Example `/range/` request body (application/json): \n        ```json \n        {\n          \"species\": \"setosa\",\n          \"lower\": {\n            \"sepal_length\": 5.0,\n            \"sepal_width\": 3.0\n          },\n          \"upper\": {\n            \"sepal_length\": 5.1\n          }\n        }\n        ```\n    - Example `/stats/mean` request body (application/json):\n        ```json \n        {\n          \"species\": [\n            \"setosa\",\n            \"versicolor\"\n          ],\n          \"lower\": {\n            \"sepal_width\": 3.0\n          },\n          \"upper\": {\n            \"sepal_length\": 6.1\n          }\n        }\n        ```\n---\n\n### Without docker\n\n1) First make sure python 3.8 is available and mongodb is running.\n\nTo install python 3.8:\n\nAssuming [Anaconda](https://www.anaconda.com/distribution/) has been installed on the system, create an environment:\n```bash\nconda create -n py38 python=3.8 numpy pandas pymongo\nconda activate py38\n```\n\nTo install mongodb please follow this [link](https://docs.mongodb.com/manual/administration/install-community/):\n\nQuick way for Ubuntu/Debian-based Linux:\n\n```bash\nsudo apt-get install mongodb\nsudo service mongodb start\n```\n\nQuick way for MacOSX:\n\n```bash\nbrew tap mongodb/brew\nbrew install mongodb-community@4.2\nbrew services start mongodb-community@4.2\n```\n\n2) Clone the project and enter in the directory\n\n```bash\ngit clone https://github.com/papapana/iris_api.git \u0026\u0026 cd iris_api\n```\n\n3) Install the requirements\n\n- To use as a library:\n```bash\npython setup.py install\n``` \n\n- No intent to use it as a library\n```bash\npip install -r requirements.txt\n```\n\n4) Run the database provisioning script\n```bash\npython scripts/provision_db.py\n``` \n\n5) Start the API server\n\n```bash \nuvicorn iris_api.app.main:app --reload  --port 8000\n```\n\n## Usage\n\n### As a REST API\n\n- Navigate to [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs) where you can test interactively the API\n\n- See examples in the jupyter notebook in the path `./notebooks/iris_demo.ipynb`\n    - to install Jupyter notebook follow this [link](https://jupyter.org/install)\n\n- Endpoints:\n    - `POST /range/`\n    - `POST /stats/mean/`\n\n### As a library\n\ne.g\n```python\n\"\"\"\nExample of using the API as an installed package\n\"\"\"\n\nfrom iris_api.core.queries import stats as st\n\nst.get_mean(st.IrisQuery(species=['setosa', 'versicolor']))\n\"\"\"\nResult:\n\"\"\"\n[{'mean_sepal_length': 5.006,\n  'mean_sepal_width': 3.428,\n  'mean_petal_length': 1.462,\n  'mean_petal_width': 0.24600000000000002,\n  'label': 'setosa'},\n {'mean_sepal_length': 5.936,\n  'mean_sepal_width': 2.77,\n  'mean_petal_length': 4.26,\n  'mean_petal_width': 1.3259999999999998,\n  'label': 'versicolor'}]\n```\n\n\n## Structure\n```\n├── docker-compose.dev.yml     -- settings for development purposes   \n├── docker-compose.yml         -- settings for production\n├── iris_api\n│   ├── app\n│   │   ├── api                -- the endpoints\n│   │   │   ├── __init__.py\n│   │   │   ├── models.py      -- the query models\n│   │   │   ├── ranges_api.py\n│   │   │   └── stats_api.py\n│   │   ├── db.py              -- connection to database\n│   │   ├── __init__.py\n│   │   └── main.py            -- setting up the endpoint routers\n│   ├── core\n│   │   ├── __init__.py\n│   │   └── queries            -- business logic is here (can be used as a library)\n│   │       ├── __init__.py\n│   │       ├── ranges.py      \n│   │       ├── stats.py\n│   │       └── utilities.py\n│   ├── __init__.py\n│   └── tests                  -- unit tests are here\n│       ├── __init__.py\n│       └── test_utilities.py\n├── notebooks\n│   └── iris_demo.ipynb        -- jupyter notebook demo\n├── ops                        -- devops scripts are here\n│   ├── dev                    -- here for development\n│   │   ├── Dockerfile\n│   │   └── provisioning       -- here for provisioning the database\n│   │       └── Dockerfile\n│   └── release                -- here for production\n│       └── Dockerfile\n├── README.md                  -- current file\n├── requirements.txt           -- Python package version requirements\n├── scripts\n│   └── provision_db.py        -- script that populates the database with the iris dataset\n└── setup.py                   -- the installation script for usage as a python library\n```\n\n## Questions\n\n### How would you deploy the application?\n\nThe application can be deployed wherever Docker is supported. This might be for example a managed container engine like\n[AWS Fargate](https://aws.amazon.com/fargate/), virtual hosts like [AWS EC2](https://aws.amazon.com/ec2/) or because\ndocker-compose is used, it can be more easily deployed on services supporting [Kubernetes](https://kubernetes.io/).\n\nA word of caution, the current version does not have authentication and authorization mechanism at the API level and\ntherefore should be used either for internal use only or behind a server/load-balancer that would provide \nauthentication/authorization. These features can be added easily because the framework used supports them.\n\nIt is anyway a good idea to hide the current service behind a production-scale web server such as [Nginx](https://www.nginx.com/)\nthat could act as a load-balancer also e.g. in cases of increased traffic or for redundancy of the API services\n\n## How would you test your application?\n\nThe tests or quality control for this type of application could be of several types like `unit-tests`, `integration tests`, \n`fuzz tests`, `load balancing tests`, `availability tests` and more.\n\nIn the current exercise, the most important thing to be tested is the query creation mechanism which implements the main logic.\nFor this purpose, unit tests have been written under the tests folder.\nIn the current state, most of the other code written consists of calls to libraries and therefore testing it might be \nredundant, because it would as if we were testing the library. A whole system test would be useful though.\n\nIntegration tests can happen using docker e.g. by doing a healthcheck on the database that is up, insert some dummy data\nand then perform a query and ensure it gets the appropriate data.\n\nFuzz tests are very useful if the service is exposed to the general public to ensure that no malicious input can harm\nor crash the system. Although, in the current state there is no explicit fuzz test, all the input is strictly-typed and\nbounded and checked on each request before it is passed to any logic in the backend. This happens in the `models.py` file\nusing the `Pydantic` library.\n\nStress testing can be done using either some online service or some command-line tool like `wrk2`. Improvement of the\nsituation leads to scaling\n\n### How would you scale your application?\n\nSince the application is deployed using docker, managed services such as `Fargate`, or tools like `Kubernetes` could be\nused to scale horizontally the application. \n\nIf the bottleneck is the API, then a load balancer (in practice e.g. an nginx service) could decide to send the requests\nto multiple different servers of the API therefore sharing the load. Also, caching results can be very useful to avoid\nexecuting the same query multiple times.\n\nIf the bottleneck is the database (in my experience it is more common) then it can also be scaled. The advantage of having\na NoSQL database here is that it can scale horizontally easily and with multiple ways. There could be for example replication\nif our app is read-heavy, so copying the same data to multiple databases and sharing the requests between them.\nHowever, NoSQL databases such as MongoDB also support sharding quite easily, so separating the data among servers according\nto some key and therefore balancing the load. This is particularly appropriate if the bottleneck is writing to the database.\n\nIf the bottleneck is the computation, vertical scaling might be considered, e.g. getting a server with more CPUs, more memory\nor attaching and using GPUs\n\n\n### How would you optimize the implementation of your application?\n\nIn terms of performance, I would determine the bottleneck by stress-testing and profiling the execution.\nIf the bottleneck was related to I/O but not the database, then I would try to have more workers serving the asynchronous requests\nor load-balance the service as discussed above.\nIf it was related to the database, I would scale the database as above\nIf it were related to computation, first I would try to optimize the bottleneck computations if possible by vectorizing\nand compiling to native code and then scale vertically by getting more powerful machines. In certain cases, such as \ndeep learning, the solution is most frequently using GPUs or TPUs.\n\nIn terms of ease quality control, I would automate the tests by having Continuous Integration set-up and also use metrics\nfor code-coverage and other quality metrics. For example, currently the Jetbrains PyCharm linter is used throughout and \nthere must be no problem before committing. In a production environment, I would add at least one more linter such as\n`flake8` and maybe also `black`. If the API was exposed I would add some fuzzy testing too and automated stress-testing\nand benchmarking to ensure that a new deployment would not degrade the system.\n\nIn terms of deployment, I would set-up continuous deployment and also optimize my docker files, to avoid duplication.\n\nIn terms of monitoring, setting up a good logging system would also be necessary for a bigger application. Also, a \nmonitoring system such as datadog to monitor metrics and uptime would be necessary.\n\n\n### How much time did it take for you to implement the task?\n\nThe main functionality (having a fully working solution with correct results) took about ~3-4 hours including reading and using FastAPI\nfor the first time. However, doing the devops for this particular application and the laptop I was using took more than that.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpapapana%2Firis_api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpapapana%2Firis_api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpapapana%2Firis_api/lists"}