Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/raptorblingx/fastapi_ml_task
This project demonstrates a complete pipeline for building a machine learning model using XGBoost on sustainable energy data, integrated with a FastAPI-based REST API. The application allows users to upload data, train the model, and make predictions via API endpoints. Docker and PostgreSQL are used for containerization and data management.
https://github.com/raptorblingx/fastapi_ml_task
api-development data-pipeline data-science docker docker-compose fastapi jupyter-notebook machine-learning model-training postgresql python rest-api sustainable-energy xgboost
Last synced: 19 days ago
JSON representation
This project demonstrates a complete pipeline for building a machine learning model using XGBoost on sustainable energy data, integrated with a FastAPI-based REST API. The application allows users to upload data, train the model, and make predictions via API endpoints. Docker and PostgreSQL are used for containerization and data management.
- Host: GitHub
- URL: https://github.com/raptorblingx/fastapi_ml_task
- Owner: RaptorBlingx
- Created: 2024-09-15T17:42:26.000Z (2 months ago)
- Default Branch: master
- Last Pushed: 2024-09-15T22:08:50.000Z (2 months ago)
- Last Synced: 2024-10-10T16:21:53.951Z (about 1 month ago)
- Topics: api-development, data-pipeline, data-science, docker, docker-compose, fastapi, jupyter-notebook, machine-learning, model-training, postgresql, python, rest-api, sustainable-energy, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 865 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# FastAPI Machine Learning Task
This repository contains a project to develop a machine learning model based on sustainable energy data and integrate it with a FastAPI-based REST API.
## Setup Instructions
1. Clone the repository:
```
git clone https://github.com/yourusername/fastapi_ml_task.git
```2. Install the dependencies:
```
pip install -r requirements.txt
```3. Run the Jupyter notebook to train the model:
- Navigate to `models/model_training.ipynb`.
- Ensure the dataset is in the `data/` folder.
- Run the notebook step by step.# FastAPI Machine Learning Task
This repository contains a project to develop a machine learning model based on sustainable energy data and integrate it with a FastAPI-based REST API.
## Project Structure
```
FastAPI_ML_Task/
│
├── data/ # Folder to store the dataset
│ └── global-data-on-sustainable-energy.csv
├── models/ # Folder to store model files and notebooks
│ ├── model_training.ipynb # Jupyter notebook for the ML workflow
│ ├── trained_model.pkl # Initial trained model
│ ├── tuned_model.pkl # Tuned XGBoost model
│ └── model_utils.py # Utility functions for model training
├── app/ # Folder for FastAPI application
│ ├── main.py # Main FastAPI application with endpoints
│ ├── database_setup.py # Database setup and connection management
├── Dockerfile # Dockerfile for building the FastAPI container
├── docker-compose.yml # Docker Compose file to orchestrate FastAPI and PostgreSQL services
├── requirements.txt # List of Python dependencies
├── README.md # Project documentation
└── .gitignore # Git ignore file to avoid committing sensitive files (like .env)
```## Setup Instructions
1. Clone the repository:
```
git clone https://github.com/yourusername/fastapi_ml_task.git
```2. Install the dependencies:
```
pip install -r requirements.txt
```3. Run the Jupyter notebook to train the model:
- Navigate to `models/model_training.ipynb`.
- Ensure the dataset is in the `data/` folder.
- Run the notebook step by step.## Database Setup (PostgreSQL)
To set up the PostgreSQL database, follow these steps:
1. **Install PostgreSQL and pgAdmin4**:
Ensure that PostgreSQL and pgAdmin4 are installed and running on your machine.2. **Configure the Database URL**:
In the `app/database_setup.py` file, update the `DATABASE_URL` with your PostgreSQL credentials.Example:
```
DATABASE_URL = "postgresql://your_username:your_password@localhost/your_dbname"
```3. **Initialize the Database**:
Run the following command to initialize the database and create the tables:```
python app/main.py
```This will create the following tables in your PostgreSQL database:
- **energy_data**: For storing energy data records.
- **predictions**: For storing prediction results.5. **Verify the Database**:
- You can open **pgAdmin4** and verify that the tables have been successfully created.
---## Setting Up the Environment Variables
You will need to create a `.env` file in the root directory of this project.
### Steps to Set Up `.env`:
1. **Create a `.env` file**:
- In the root of the project (where the `docker-compose.yml` file is located), create a new file named `.env`.2. **Add the following content** to the `.env` file:
```
POSTGRES_USER=your_postgres_username
POSTGRES_PASSWORD=your_postgres_password
POSTGRES_DB=your_database_name
```- Replace `your_postgres_username`, `your_postgres_password`, and `your_database_name` with your actual PostgreSQL credentials. For example:
```
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_secure_password
POSTGRES_DB=ml_api
```3. **Database URL**:
- The environment variables in the `.env` file will be used to automatically configure the database connection inside the **`docker-compose.yml`** and **`app/database_setup.py`**.
- The database connection string will look like this:
```
postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}
```---
### Run the FastAPI Application
Start the FastAPI server by running:```
uvicorn app.main:app --reload
```You can access the API documentation and test the endpoints at **`http://127.0.0.1:8000/docs`**.
## API Endpoints
### 1. `/upload-data` (POST)
Uploads a CSV file containing energy data and stores it in the PostgreSQL database.- **Request**: Upload a CSV file with columns `Entity`, `Year`, `Primary energy consumption per capita (kWh/person)`, and `Renewable energy share in the total final energy consumption (%)`.
### 2. `/get-data/{country}` (GET)
Retrieves energy data for a specific country from the PostgreSQL database.- **Parameters**: `country` - The name of the country (e.g., "Afghanistan").
### 3. `/train-model` (POST)
Trains an XGBoost model using the uploaded energy data and saves the trained model.- **Response**: Confirms that the model was successfully trained and saved.
### 4. `/predict` (POST)
Makes predictions using the trained model based on the energy production provided.- **Parameters**:
- `energy_production`: The amount of energy production (numeric value e.g., "5000").
- **Response**: Returns the predicted value for the renewable energy share.## Testing
You can test the API using the Swagger UI at **`http://127.0.0.1:8000/docs`**, or with tools like **Postman**.
=======## Running the Project with Docker
To run the FastAPI application and PostgreSQL database using Docker, follow these steps:
### Prerequisites
- Ensure Docker and Docker Compose are installed on your machine.
### Steps:
1. **Build the Docker containers**:
```
docker-compose build
```2. **Start the services** (FastAPI and PostgreSQL):
```
docker-compose up
```This will start both the FastAPI application and PostgreSQL database.
3. **Access the API**:
- The FastAPI application will be available at **`http://localhost:8000`**.
- The API documentation (Swagger UI) will be available at **`http://localhost:8000/docs`**.4. **Interacting with the API**:
- **Upload Data**: Use the `/upload-data` endpoint to upload the dataset (CSV file).
- **Train the Model**: Call the `/train-model` endpoint to train the XGBoost model.
- **Make Predictions**: Use the `/predict` endpoint to make predictions based on the trained model.
- **Retrieve Data**: Fetch energy data for a specific country via the `/get-data/{country}` endpoint.5. **Stopping the Docker services**:
```
docker-compose down
```This will stop and remove the containers.
### Setting Up the Environment Variables
- The environment variables (like PostgreSQL credentials) are configured in the `.env` file. Each user should create a `.env` file in the root directory with the following content:
```
POSTGRES_USER=your_postgres_username
POSTGRES_PASSWORD=your_postgres_password
POSTGRES_DB=ml_api
```- Replace `your_postgres_username` and `your_postgres_password` with your actual PostgreSQL credentials.