Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pechouc/data-certification-api


https://github.com/pechouc/data-certification-api

Last synced: 7 days ago
JSON representation

Awesome Lists containing this project

README

        

# URL of Prediction API

https://data-certification-ubkzofvvoq-ew.a.run.app

# Data certification API

Le Wagon Data Science certification exam starter pack for the predictive API test.

**πŸ’‘Β Β This challenge is completely independent of other challenges. It is not required to complete any other challenge in order to work on this challenge.**

## Setup

### Duplicate the repository for the API challenge

**πŸ“Β Β Let's duplicate the repository of the API challenge.**

Go to https://github.com/lewagon/data-certification-api:
- Click on `Use this template`
- Enter the repository name `data-certification-api`
- Set it as **Public**
- Click on `Create repository from template`
- Click on `Code`
- Select `SSH`
- Copy the SSH URL of the repository (the format is `[email protected]:YOUR_GITHUB_NICKNAME/data-certification-api.git`)

### Clone the repository for the API challenge

**πŸ“Β Β Now we will clone your new repository.**

Open your terminal and run the following commands:

πŸ‘‰Β Β replace `YOUR_GITHUB_NICKNAME` with your **github nickname** and `PASTE_REPOSITORY_URL_HERE` with the SSH URL you just copied:

``` bash
cd ~/code/YOUR_GITHUB_NICKNAME
git clone PASTE_REPOSITORY_URL_HERE
cd data-certification-api
```

### Look around

**πŸ’‘Β Β The content of the challenge should look like this:**

``` bash
tree
```

```
.
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ MANIFEST.in
β”œβ”€β”€ Makefile
β”œβ”€β”€ README.md
β”œβ”€β”€ api
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  └── app.py
β”œβ”€β”€ exampack
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ models
β”‚Β Β  β”œβ”€β”€ predictor.py
β”‚Β Β  β”œβ”€β”€ tests
β”‚Β Β  β”‚Β Β  └── __init__.py
β”‚Β Β  └── utils.py
β”œβ”€β”€ notebooks
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ scripts
β”‚Β Β  └── exampack-run
└── setup.py
```

Open your favourite text editor and proceed with the challenge.

## API challenge

**πŸ“Β Β In this challenge, you are provided with a trained model saved as `model.joblib`. The goal is to create an API that will predict the popularity of a song based on its other features.**

πŸ‘‰Β Β You will only need to edit the code of the API in `api/app.py` 🚨

πŸ‘‰Β Β The package versions listed in `requirements.txt` should work out of the box with the pipelined model saved in `model.joblib`

### Install the required packages

The `requirements.txt` file lists the exact version of the packages required in order to be able to load the pipelined model that we provide.

``` bash
pip install -r requirements.txt
```

πŸ‘‰Β Β If you encounter a version conflict while installing the packages πŸ‘ˆ

Β 

In this case you will need to create a new virtual environment in order to be able to load the pipeline.

πŸ‘‰Β Β Only execute this commands if you encounter an issue while installing the packages 🚨

``` bash
pyenv install 3.8.6
pyenv virtualenv 3.8.6 certif
pyenv local certif
pip install -r requirements.txt
```

### Run a uvicorn server

**πŸ“Β Β Start a `uvicorn` server in order to make sure that the setup works correctly.**

Run the server:

```bash
uvicorn api.app:app --reload
```

Open your browser at http://localhost:8000/

πŸ‘‰Β Β You should see the response `{ "ok": true }`

You will now be able to work on the content of the API while `uvicorn` automatically reloads your code as it changes.

### API specification

**Predict the popularity of a Spotify song**

`GET /predict`

| Parameter | Type | Description |
|---|---|---|
| acousticness | float | whether the track is acoustic |
| danceability | float | describes how suitable a track is for dancing |
| duration_ms | int | duration of the track in milliseconds |
| energy | float | represents a perceptual measure of intensity and activity |
| explicit | int | whether the track has explicit lyrics |
| id | string | id for the track |
| instrumentalness | float | predicts whether a track contains no vocals |
| key | int | the key the track is in |
| liveness | float | detects the presence of an audience in the recording |
| loudness | float | the overall loudness of a track in decibels |
| mode | int | modality of a track |
| name | string | name of the track |
| release_date | string | release date |
| speechiness | float | detects the presence of spoken words in a track |
| tempo | float | overall estimated tempo of a track in beats per minute |
| valence | float | describes the musical positiveness conveyed by a track |
| artist | string | artist who performed the track |

Returns a dictionary with the `artist`, the `name` of the song and predicted `popularity` as an integer.

Example request:

```
/predict?acousticness=0.654&danceability=0.499&duration_ms=219827&energy=0.19&explicit=0&id=0B6BeEUd6UwFlbsHMQKjob&instrumentalness=0.00409&key=7&liveness=0.0898&loudness=-16.435&mode=1&name=Back%20in%20the%20Goodle%20Days&release_date=1971&speechiness=0.0454&tempo=149.46&valence=0.43&artist=John%20Hartford
```

Example response:

``` json
{
"artist": "John Hartford",
"name": "Back in the Goodle Days",
"popularity": 28
}
```

πŸ‘‰ It is your turn, code the endpoint in `api/app.py`. If you want to verify what data types the pipeline expects, have a look at the docstring of the `create_pipeline` method in `exampack/trainer.py`.

## API in production

**πŸ“Β Β Push your API to production on the hosting service of your choice.**

πŸ‘‰Β Β If you opt for Google Cloud Platform πŸ‘ˆ

Β 

Once you have changed your `GCP_PROJECT_ID` in the `Makefile`, run the following commands to build and deploy your containerized API to Container Registry and finally Cloud Run.