https://github.com/lewagon/data-certification-api-movies
https://github.com/lewagon/data-certification-api-movies
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/lewagon/data-certification-api-movies
- Owner: lewagon
- Created: 2021-06-25T13:44:51.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-02-02T10:55:07.000Z (almost 2 years ago)
- Last Synced: 2025-01-11T06:45:51.656Z (about 1 year ago)
- Language: Python
- Size: 11.7 KB
- Stars: 0
- Watchers: 12
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Data certification API
Le Wagon Data Science certification exam starter pack for the predictive API test.
**π‘Β Β This challenge is completely independent of other challenges. It is not required to complete any other challenge in order to work on this challenge.**
## Setup
### Duplicate the repository for the API challenge
**πΒ Β Let's duplicate the repository of the API challenge.**
Go to https://github.com/lewagon/data-certification-api-movies:
- Click on `Use this template`
- Enter the repository name `data-certification-api-movies`
- Set it as **Public**
- Click on `Create repository from template`
- Click on `Code`
- Select `SSH`
- Copy the SSH URL of the repository (the format is `git@github.com:YOUR_GITHUB_NICKNAME/data-certification-api-movies.git`)
### Clone the repository for the API challenge
**πΒ Β Now we will clone your new repository.**
Open your terminal and run the following commands:
πΒ Β replace `YOUR_GITHUB_NICKNAME` with your **github nickname** and `PASTE_REPOSITORY_URL_HERE` with the SSH URL you just copied:
``` bash
cd ~/code/YOUR_GITHUB_NICKNAME
git clone PASTE_REPOSITORY_URL_HERE
cd data-certification-api-movies
```
### Look around
**π‘Β Β The content of the challenge should look like this:**
``` bash
tree
```
```
.
βββ Dockerfile
βββ MANIFEST.in
βββ Makefile
βββ README.md
βββ api
βΒ Β βββ __init__.py
βΒ Β βββ app.py
βββ exampack
βΒ Β βββ __init__.py
βΒ Β βββ data
βΒ Β βββ tests
βΒ Β βΒ Β βββ __init__.py
βΒ Β βββ trainer.py
βΒ Β βββ utils.py
βββ model.joblib
βββ notebooks
βββ requirements.txt
βββ scripts
βΒ Β βββ exampack-run
βββ setup.py
```
Open your favourite text editor and proceed with the challenge.
## API challenge
**πΒ Β In this challenge, you are provided with a trained model saved as `model.joblib`. The goal is to create an API that will predict the popularity of a movie based on its other features.**
πΒ Β You will only need to edit the code of the API in `api/app.py` π¨
πΒ Β The package versions listed in `requirements.txt` should work out of the box with the pipelined model saved in `model.joblib`
### Install the required packages
The `requirements.txt` file lists the exact version of the packages required in order to be able to load the pipelined model that we provide.
``` bash
pip install -r requirements.txt
```
πΒ Β If you encounter a version conflict while installing the packages π
Β
In this case you will need to create a new virtual environment in order to be able to load the pipeline.
πΒ Β Only execute this commands if you encounter an issue while installing the packages π¨
``` bash
pyenv install 3.8.6
pyenv virtualenv 3.8.6 certif
pyenv local certif
pip install -r requirements.txt
```
### Run a uvicorn server
**πΒ Β Start a `uvicorn` server in order to make sure that the setup works correctly.**
Run the server:
```bash
uvicorn api.app:app --reload
```
Open your browser at http://localhost:8000/
πΒ Β You should see the response `{ "ok": true }`
You will now be able to work on the content of the API while `uvicorn` automatically reloads your code as it changes.
### API specification
**Predict the popularity of a Spotify song**
`GET /predict`
| Parameter | Type | Description |
|---|---|---|
| original_title | string | original title of the movie |
| title | string | title of the movie in english |
| release_date | string | release date |
| duration_min | float | duration of the movie in minutes |
| description | string | short summary of the movie|
| budget | float | budget spent to produce the movie in USD |
| original_language | string | original language |
| status | string | is the movie already released or not |
| number_of_awards_won | int | number of awards won for the movie |
| number_of_nominations | int | number of nominations |
| has_collection | int | if the movie is part of a sequel or not |
| all_genres | string | movie genres |
| top_countries | string | countries where the movie was produced (can be zero, one or many!) |
| number_of_top_productions | float | number of top production companies that produced the film if any |
| available_in_english | bool | whether the movie is available in english or not |
Returns a dictionary with the `title` of the movie, and predicted `popularity` as a float.
Example request:
```
/predict?title=Harry%20Potter&original_title=Harry%20Potter&release_date=2010-06-09&duration_min=150&description=Harry%20is%20a%20wizard%20that%20tries%20to%20save%20the%20world%20from%20crazy%20guys&budget=1000000&original_language=en&status=Released&number_of_awards_won=80&number_of_nominations=120&has_collection=1&all_genres=Fantasy,%20Family,%20Adventure&top_countries=United%20States%20of%20America,,%20United%20Kindgom&number_of_top_productions=3&available_in_english=True
```
Example response:
``` json
{
"title": "Harry Potter",
"popularity": 15
}
```
π It is your turn, code the endpoint in `api/app.py`. If you want to verify what data types the pipeline expects, have a look at the docstring of the `create_pipeline` method in `exampack/trainer.py`.
## API in production
**πΒ Β Push your API to production on the hosting service of your choice.**
πΒ Β If you opt for Google Cloud Platform π
Β
Once you have changed your `GCP_PROJECT_ID` in the `Makefile`, run the directives of the `Makefile` to build and deploy your containerized API to Container Registry and finally Cloud Run.