https://github.com/random-iceberg/model-backend
https://github.com/random-iceberg/model-backend
fastapi scikit-learn
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/random-iceberg/model-backend
- Owner: random-iceberg
- Created: 2025-07-23T03:44:59.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-23T23:46:38.000Z (11 months ago)
- Last Synced: 2025-07-24T00:03:49.177Z (11 months ago)
- Topics: fastapi, scikit-learn
- Language: Python
- Homepage:
- Size: 110 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Model Microservice
Machine learning service for training and inference using scikit-learn models.
## ๐ Quick Start (Zero Configuration)
```bash
# From the project root directory
docker compose -f 'compose/compose.dev.yaml' up -d --build
# Access Swagger UI
open http://localhost:8001/docs # Development mode
```
No setup needed! The service starts with pre-trained models ready for inference.
## ๐ Features
- **5 ML Algorithms**: Random Forest, SVM, Decision Tree, KNN, Logistic Regression
- **Model Training**: Train models with configurable feature selection
- **Model Persistence**: Automatic saving and loading of trained models
- **RESTful API**: Full Swagger/OpenAPI documentation
## ๐๏ธ API Documentation
### Interactive API Explorer
Access the Swagger UI at: **http://localhost:8001/docs** (development mode)
### Main Endpoints
- `GET /health` - Service health check
- `GET /models` - List all trained models with metadata
- `POST /models/train` - Train a new model
- `POST /models/{id}/predict` - Get prediction from specific model
- `DELETE /models/{id}` - Delete a trained model
## ๐ค Available Algorithms
| Algorithm | ID | Configurable Parameters |
|-----------|-----|------------------------|
| Random Forest | `rf` | `n_estimators` |
| Support Vector Machine | `svm` | - |
| Decision Tree | `dt` | - |
| K-Nearest Neighbors | `knn` | `n_neighbors` |
| Logistic Regression | `lr` | - |
## ๐ Available Features
Features from the Titanic dataset (select which ones to use during training):
- `pclass` - Passenger class (1, 2, 3)
- `sex` - Gender (male/female)
- `age` - Age in years
- `fare` - Ticket fare
- `embarked` - Port of embarkation
- `title` - Extracted from name (Mr, Mrs, etc.)
- `is_alone` - Traveling alone flag
- `age_class` - Age ร Class interaction
## ๐ ๏ธ Development Workflow
### Testing the API with Swagger
1. Go to http://localhost:8001/docs
2. Try `/models` to see pre-loaded models
3. Test prediction with `/models/{model_id}/predict`:
```json
{
"pclass": 1,
"sex": "female",
"age": 30,
"fare": 100,
"travelled_alone": false,
"embarked": "cherbourg",
"title": "mrs"
}
```
4. Train a custom model with `/models/train`
## ๐งช Testing
```bash
cd model
# Install dependencies (if not already done)
uv sync --extra dev
# Run tests
uv run pytest
# Linting and formatting check
uv run ruff check
uv run ruff format --check
# Auto-fix formatting
uv run ruff format
```
## ๐ Project Structure
```
model/
โโโ main.py # FastAPI application
โโโ models_router.py # Model management endpoints
โโโ schemas.py # Pydantic data models
โโโ train.py # Training script
โโโ utils/
โ โโโ data.py # Data preprocessing
โ โโโ models.py # Model loading/saving
โ โโโ model_factory.py # Algorithm factory
โโโ data/ # Included dataset
โ โโโ train.csv
โ โโโ test.csv
โ โโโ gender_submission.csv
โโโ tests/ # Test suite
```
## ๐ง Model Training
### Using Swagger UI
1. Go to http://localhost:8001/docs
2. Expand `POST /models/train`
3. Click "Try it out"
4. Use this example request:
```json
{
"algo": {
"name": "rf",
"n_estimators": 150
},
"features": ["pclass", "sex", "age", "fare"],
"random_state": 42
}
```
5. Click "Execute"
### Training Response
```json
{
"id": "trained-abc123",
"params": { ... },
"info": {
"accuracy": 0.85
}
}
```
## ๐ Making Predictions
### Using Swagger UI
1. Get model ID from `/models` endpoint
2. Use `POST /models/{model_id}/predict`
3. Provide passenger data
4. Receive survival prediction with probability
### Prediction Request
```json
{
"pclass": 3,
"sex": "male",
"age": 25,
"fare": 15.5,
"travelled_alone": true,
"embarked": "southampton",
"title": "mr"
}
```
### Prediction Response
```json
{
"survived": false,
"probability": 0.78
}
```
## ๐พ Data Management
### Model Storage
- Models automatically saved to `/data/models/`
- Persisted across container restarts
- Each model includes:
- `model.pkl` - Serialized scikit-learn model
- `params.json` - Training parameters
- `info.json` - Model metadata and accuracy
### Pre-loaded Models
On startup, the service loads:
- `rf` - Random Forest
- `svm` - Support Vector Machine
- `knn` - K-Nearest Neighbors
- `lr` - Logistic Regression
## ๐ณ Production Deployment
The service is production-ready when deployed via:
```bash
docker compose -f compose/compose.prod-local.yaml up
```
## ๐ Troubleshooting
### Model Not Found
- Check model ID with `GET /models`
- Verify model files in container: `docker compose exec model ls /data/models`
### Slow Predictions
- Models are loaded at container startup
- Consider model complexity and input data size
### Training Failures
- Check feature names match schema
- Verify algorithm parameters are valid
- Review logs: `docker compose logs model`
### Note on random_state
- The `random_state` parameter only affects model training for reproducibility
- It does not affect predictions
## ๐ Additional Resources
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [Scikit-learn User Guide](https://scikit-learn.org/stable/user_guide.html)
- [Project Requirements](../../docs/Project-Requirements.md)