An open API service indexing awesome lists of open source software.

https://github.com/random-iceberg/model-backend


https://github.com/random-iceberg/model-backend

fastapi scikit-learn

Last synced: about 2 months ago
JSON representation

Awesome Lists containing this project

README

          

# Model Microservice

Machine learning service for training and inference using scikit-learn models.

## ๐Ÿš€ Quick Start (Zero Configuration)

```bash
# From the project root directory
docker compose -f 'compose/compose.dev.yaml' up -d --build

# Access Swagger UI
open http://localhost:8001/docs # Development mode
```

No setup needed! The service starts with pre-trained models ready for inference.

## ๐Ÿ“‹ Features

- **5 ML Algorithms**: Random Forest, SVM, Decision Tree, KNN, Logistic Regression
- **Model Training**: Train models with configurable feature selection
- **Model Persistence**: Automatic saving and loading of trained models
- **RESTful API**: Full Swagger/OpenAPI documentation

## ๐Ÿ—๏ธ API Documentation

### Interactive API Explorer
Access the Swagger UI at: **http://localhost:8001/docs** (development mode)

### Main Endpoints

- `GET /health` - Service health check
- `GET /models` - List all trained models with metadata
- `POST /models/train` - Train a new model
- `POST /models/{id}/predict` - Get prediction from specific model
- `DELETE /models/{id}` - Delete a trained model

## ๐Ÿค– Available Algorithms

| Algorithm | ID | Configurable Parameters |
|-----------|-----|------------------------|
| Random Forest | `rf` | `n_estimators` |
| Support Vector Machine | `svm` | - |
| Decision Tree | `dt` | - |
| K-Nearest Neighbors | `knn` | `n_neighbors` |
| Logistic Regression | `lr` | - |

## ๐Ÿ“Š Available Features

Features from the Titanic dataset (select which ones to use during training):
- `pclass` - Passenger class (1, 2, 3)
- `sex` - Gender (male/female)
- `age` - Age in years
- `fare` - Ticket fare
- `embarked` - Port of embarkation
- `title` - Extracted from name (Mr, Mrs, etc.)
- `is_alone` - Traveling alone flag
- `age_class` - Age ร— Class interaction

## ๐Ÿ› ๏ธ Development Workflow

### Testing the API with Swagger

1. Go to http://localhost:8001/docs
2. Try `/models` to see pre-loaded models
3. Test prediction with `/models/{model_id}/predict`:
```json
{
"pclass": 1,
"sex": "female",
"age": 30,
"fare": 100,
"travelled_alone": false,
"embarked": "cherbourg",
"title": "mrs"
}
```
4. Train a custom model with `/models/train`

## ๐Ÿงช Testing

```bash
cd model

# Install dependencies (if not already done)
uv sync --extra dev

# Run tests
uv run pytest

# Linting and formatting check
uv run ruff check
uv run ruff format --check

# Auto-fix formatting
uv run ruff format
```

## ๐Ÿ“ Project Structure

```
model/
โ”œโ”€โ”€ main.py # FastAPI application
โ”œโ”€โ”€ models_router.py # Model management endpoints
โ”œโ”€โ”€ schemas.py # Pydantic data models
โ”œโ”€โ”€ train.py # Training script
โ”œโ”€โ”€ utils/
โ”‚ โ”œโ”€โ”€ data.py # Data preprocessing
โ”‚ โ”œโ”€โ”€ models.py # Model loading/saving
โ”‚ โ””โ”€โ”€ model_factory.py # Algorithm factory
โ”œโ”€โ”€ data/ # Included dataset
โ”‚ โ”œโ”€โ”€ train.csv
โ”‚ โ”œโ”€โ”€ test.csv
โ”‚ โ””โ”€โ”€ gender_submission.csv
โ””โ”€โ”€ tests/ # Test suite
```

## ๐Ÿ”ง Model Training

### Using Swagger UI

1. Go to http://localhost:8001/docs
2. Expand `POST /models/train`
3. Click "Try it out"
4. Use this example request:
```json
{
"algo": {
"name": "rf",
"n_estimators": 150
},
"features": ["pclass", "sex", "age", "fare"],
"random_state": 42
}
```
5. Click "Execute"

### Training Response
```json
{
"id": "trained-abc123",
"params": { ... },
"info": {
"accuracy": 0.85
}
}
```

## ๐Ÿ“ˆ Making Predictions

### Using Swagger UI

1. Get model ID from `/models` endpoint
2. Use `POST /models/{model_id}/predict`
3. Provide passenger data
4. Receive survival prediction with probability

### Prediction Request
```json
{
"pclass": 3,
"sex": "male",
"age": 25,
"fare": 15.5,
"travelled_alone": true,
"embarked": "southampton",
"title": "mr"
}
```

### Prediction Response
```json
{
"survived": false,
"probability": 0.78
}
```

## ๐Ÿ’พ Data Management

### Model Storage
- Models automatically saved to `/data/models/`
- Persisted across container restarts
- Each model includes:
- `model.pkl` - Serialized scikit-learn model
- `params.json` - Training parameters
- `info.json` - Model metadata and accuracy

### Pre-loaded Models
On startup, the service loads:
- `rf` - Random Forest
- `svm` - Support Vector Machine
- `knn` - K-Nearest Neighbors
- `lr` - Logistic Regression

## ๐Ÿณ Production Deployment

The service is production-ready when deployed via:
```bash
docker compose -f compose/compose.prod-local.yaml up
```

## ๐Ÿ” Troubleshooting

### Model Not Found
- Check model ID with `GET /models`
- Verify model files in container: `docker compose exec model ls /data/models`

### Slow Predictions
- Models are loaded at container startup
- Consider model complexity and input data size

### Training Failures
- Check feature names match schema
- Verify algorithm parameters are valid
- Review logs: `docker compose logs model`

### Note on random_state
- The `random_state` parameter only affects model training for reproducibility
- It does not affect predictions

## ๐Ÿ“š Additional Resources

- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [Scikit-learn User Guide](https://scikit-learn.org/stable/user_guide.html)
- [Project Requirements](../../docs/Project-Requirements.md)