{"id":19809956,"url":"https://github.com/lewagon/data-certification-api","last_synced_at":"2025-09-01T01:33:12.456Z","repository":{"id":49000224,"uuid":"347048447","full_name":"lewagon/data-certification-api","owner":"lewagon","description":null,"archived":false,"fork":false,"pushed_at":"2023-05-25T14:04:05.000Z","size":59,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-02-28T18:26:28.777Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lewagon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-12T11:50:26.000Z","updated_at":"2023-05-25T14:04:10.000Z","dependencies_parsed_at":"2025-01-11T06:44:51.608Z","dependency_job_id":"465e4de2-8961-4dc7-bfbc-528dcc8acb90","html_url":"https://github.com/lewagon/data-certification-api","commit_stats":null,"previous_names":[],"tags_count":0,"template":true,"template_full_name":null,"purl":"pkg:github/lewagon/data-certification-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lewagon%2Fdata-certification-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lewagon%2Fdata-certification-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lewagon%2Fdata-certification-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lewagon%2Fdata-certification-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lewagon","download_url":"https://codeload.github.com/lewagon/data-certification-api/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lewagon%2Fdata-certification-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273064369,"owners_count":25039259,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T09:18:46.756Z","updated_at":"2025-09-01T01:33:12.435Z","avatar_url":"https://github.com/lewagon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Data certification API\n\nLe Wagon Data Science certification exam starter pack for the predictive API test.\n\n**💡\u0026nbsp;\u0026nbsp;This challenge is completely independent of other challenges. It is not required to complete any other challenge in order to work on this challenge.**\n\n## Setup\n\n### Duplicate the repository for the API challenge\n\n**📝\u0026nbsp;\u0026nbsp;Let's duplicate the repository of the API challenge.**\n\nGo to https://github.com/lewagon/data-certification-api:\n- Click on `Use this template`\n- Enter the repository name `data-certification-api`\n- Set it as **Public**\n- Click on `Create repository from template`\n- Click on `Code`\n- Select `SSH`\n- Copy the SSH URL of the repository (the format is `git@github.com:YOUR_GITHUB_NICKNAME/data-certification-api.git`)\n\n### Clone the repository for the API challenge\n\n**📝\u0026nbsp;\u0026nbsp;Now we will clone your new repository.**\n\nOpen your terminal and run the following commands:\n\n👉\u0026nbsp;\u0026nbsp;replace `YOUR_GITHUB_NICKNAME` with your **github nickname** and `PASTE_REPOSITORY_URL_HERE` with the SSH URL you just copied:\n\n``` bash\ncd ~/code/YOUR_GITHUB_NICKNAME\ngit clone PASTE_REPOSITORY_URL_HERE\ncd data-certification-api\n```\n\n### Look around\n\n**💡\u0026nbsp;\u0026nbsp;The content of the challenge should look like this:**\n\n``` bash\ntree\n```\n\n```\n.\n├── Dockerfile\n├── MANIFEST.in\n├── Makefile\n├── README.md\n├── api\n│   ├── __init__.py\n│   └── app.py\n├── exampack\n│   ├── __init__.py\n│   ├── data\n│   ├── models\n│   ├── predictor.py\n│   ├── tests\n│   │   └── __init__.py\n│   └── utils.py\n├── notebooks\n├── requirements.txt\n├── scripts\n│   └── exampack-run\n└── setup.py\n```\n\nOpen your favourite text editor and proceed with the challenge.\n\n## API challenge\n\n**📝\u0026nbsp;\u0026nbsp;In this challenge, you are provided with a trained model saved as `model.joblib`. The goal is to create an API that will predict the popularity of a song based on its other features.**\n\n👉\u0026nbsp;\u0026nbsp;You will only need to edit the code of the API in `api/app.py` 🚨\n\n👉\u0026nbsp;\u0026nbsp;The package versions listed in `requirements.txt` should work out of the box with the pipelined model saved in `model.joblib`\n\n### Install the required packages\n\nThe `requirements.txt` file lists the exact version of the packages required in order to be able to load the pipelined model that we provide.\n\n``` bash\npip install -r requirements.txt\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003e👉\u0026nbsp;\u0026nbsp;If you encounter a version conflict while installing the packages 👈\u003c/summary\u003e\n\n  \u0026nbsp;\n\n\nIn this case you will need to create a new virtual environment in order to be able to load the pipeline.\n\n👉\u0026nbsp;\u0026nbsp;Only execute this commands if you encounter an issue while installing the packages 🚨\n\n``` bash\npyenv install 3.8.6\npyenv virtualenv 3.8.6 certif\npyenv local certif\npip install -r requirements.txt\n```\n\n\u003c/details\u003e\n\n### Run a uvicorn server\n\n**📝\u0026nbsp;\u0026nbsp;Start a `uvicorn` server in order to make sure that the setup works correctly.**\n\nRun the server:\n\n```bash\nuvicorn api.app:app --reload\n```\n\nOpen your browser at http://localhost:8000/\n\n👉\u0026nbsp;\u0026nbsp;You should see the response `{ \"ok\": true }`\n\nYou will now be able to work on the content of the API while `uvicorn` automatically reloads your code as it changes.\n\n### API specification\n\n**Predict the popularity of a Spotify song**\n\n`GET /predict`\n\n| Parameter | Type | Description |\n|---|---|---|\n| acousticness | float | whether the track is acoustic |\n| danceability | float | describes how suitable a track is for dancing |\n| duration_ms | int | duration of the track in milliseconds |\n| energy | float | represents a perceptual measure of intensity and activity |\n| explicit | int | whether the track has explicit lyrics |\n| id | string | id for the track |\n| instrumentalness | float | predicts whether a track contains no vocals |\n| key | int | the key the track is in |\n| liveness | float | detects the presence of an audience in the recording |\n| loudness | float | the overall loudness of a track in decibels |\n| mode | int | modality of a track |\n| name | string | name of the track |\n| release_date | string | release date |\n| speechiness | float | detects the presence of spoken words in a track |\n| tempo | float | overall estimated tempo of a track in beats per minute |\n| valence | float | describes the musical positiveness conveyed by a track |\n| artist | string | artist who performed the track |\n\nReturns a dictionary with the `artist`, the `name` of the song and predicted `popularity` as an integer.\n\nExample request:\n\n```\n/predict?acousticness=0.654\u0026danceability=0.499\u0026duration_ms=219827\u0026energy=0.19\u0026explicit=0\u0026id=0B6BeEUd6UwFlbsHMQKjob\u0026instrumentalness=0.00409\u0026key=7\u0026liveness=0.0898\u0026loudness=-16.435\u0026mode=1\u0026name=Back%20in%20the%20Goodle%20Days\u0026release_date=1971\u0026speechiness=0.0454\u0026tempo=149.46\u0026valence=0.43\u0026artist=John%20Hartford\n```\n\nExample response:\n\n``` json\n{\n  \"artist\": \"John Hartford\",\n  \"name\": \"Back in the Goodle Days\",\n  \"popularity\": 22\n}\n```\n\n👉 It is your turn, code the endpoint in `api/app.py`. If you want to verify what data types the pipeline expects, have a look at the docstring of the `create_pipeline` method in `exampack/trainer.py`.\n\n## API in production\n\n**📝\u0026nbsp;\u0026nbsp;Push your API to production on the hosting service of your choice.**\n\n\u003cdetails\u003e\n  \u003csummary\u003e👉\u0026nbsp;\u0026nbsp;If you opt for Google Cloud Platform 👈\u003c/summary\u003e\n\n  \u0026nbsp;\n\n\nOnce you have changed your `GCP_PROJECT_ID` in the `Makefile`, run the directives of the `Makefile` to build and deploy your containerized API to Container Registry and finally Cloud Run.\n\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flewagon%2Fdata-certification-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flewagon%2Fdata-certification-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flewagon%2Fdata-certification-api/lists"}