https://github.com/triiju/hybrid-recommendation-engine

Tri-Weight Hybrid Recommendation Engine API
https://github.com/triiju/hybrid-recommendation-engine

collaborative-filtering content-based-filtering fastapi hybrid-recommendation machine-learning neural-cf personalisation python ranking-algorithm recommendation-engine recommender-system

Last synced: about 2 months ago
JSON representation

Tri-Weight Hybrid Recommendation Engine API

Host: GitHub
URL: https://github.com/triiju/hybrid-recommendation-engine
Owner: triiJU
License: mit
Created: 2025-10-03T06:44:51.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-11-01T06:46:26.000Z (8 months ago)
Last Synced: 2025-11-01T08:30:23.552Z (8 months ago)
Topics: collaborative-filtering, content-based-filtering, fastapi, hybrid-recommendation, machine-learning, neural-cf, personalisation, python, ranking-algorithm, recommendation-engine, recommender-system
Language: Python
Homepage:
Size: 69.3 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Hybrid Recommendation Engine

## Overview

A modular hybrid recommender system supporting:

- Popularity baseline

- Collaborative Filtering (user–user & item–item cosine)

- Content-Based Ranking (TF-IDF over item metadata)

- Neural Collaborative Filtering (PyTorch embeddings + MLP)

- Hybrid Blending (tri-weight: CF, content, neural)

- Alpha sweep & Optuna hyperparameter tuning

- Cold-start evaluation (new / sparse users & items)

- Ranking Metrics: Precision@K, Recall@K, NDCG@K, Coverage, Item Diversity

- Reproducible pipeline (Makefile + scripts)

## Why Hybrid?

Pure collaborative filtering struggles with cold-start and sparse data. Content-based models generalize to new items but lack deep personalization. Neural CF captures non-linear interactions. The hybrid blends these to raise NDCG and coverage while mitigating cold-start performance loss.

## Features

| Component | File | Description |

|-----------|------|-------------|

| Data download | `src/data_loading.py` | MovieLens 100K fetch & extract |

| Preprocessing | `src/preprocess.py` | Filtering low-interaction users, train/test split |

| Popularity baseline | `src/popularity.py` | Global item ranking |

| Collaborative Filtering | `src/cf_baseline.py` | User–User / Item–Item cosine similarity |

| Content-based | `src/content_based.py` | TF-IDF item embeddings & similarity |

| Neural CF | `src/neural_cf.py` | Embeddings + MLP (explicit ratings regression) |

| Hybrid blend | `src/hybrid.py` | Tri-weight combination of CF + content + neural |

| Metrics | `src/metrics.py` | Precision@K, Recall@K, NDCG@K, Coverage, Diversity |

| Evaluation | `src/evaluation.py` | Unified baseline evaluation |

| Alpha sweep | `scripts/alpha_sweep.py` | Evaluate multiple α values |

| Optuna tuning | `src/optuna_tune.py` | Optimize CF neighbor count / mode |

| Cold-start eval | `src/cold_start.py` | Segment sparse users/items |

| Logging | `src/logging_config.py` | Structured logging |

| Pipeline script | `scripts/run_pipeline.sh` | End-to-end automation |

| Make targets | `Makefile` | Reproducible commands |

## Metrics (Example / Placeholder)

| Model                 | P@10 | R@10 | NDCG@10 | Coverage | Diversity |

|-----------------------|------|------|---------|----------|-----------|

| Popularity            | 0.18 | 0.09 | 0.11    | 0.04     | 0.21      |

| User-CF               | 0.27 | 0.14 | 0.21    | 0.33     | 0.37      |

| Item-CF               | 0.26 | 0.13 | 0.20    | 0.29     | 0.35      |

| Content (TF-IDF)      | 0.19 | 0.10 | 0.15    | 0.41     | 0.49      |

| Neural CF             | 0.29 | 0.16 | 0.23    | 0.36     | 0.39      |

| Hybrid (CF=0.6, Content=0.4)        | 0.31 | 0.17 | 0.24    | 0.44     | 0.46      |

| Hybrid (CF=0.5, Content=0.3, Neural=0.2) | 0.32 | 0.18 | 0.25    | 0.45     | 0.45      |

(Replace with real outputs after running.)

## Cold-Start (Example)

| Segment | P@10 (User-CF) | P@10 (Content) | P@10 (Hybrid) |

|---------|----------------|----------------|---------------|

| New Users (≤4 ratings) | 0.09 | 0.15 | 0.19 |

| New Items (low exposure) | 0.06 | 0.14 | 0.17 |

## Directory Structure

```

.

├── README.md

├── LICENSE

├── Makefile

├── requirements.txt

├── models/

├── experiments/

├── scripts/

└── src/

```

## Quick Start

```bash

python -m venv .venv

source .venv/bin/activate         # Windows: .venv\Scripts\activate

pip install -r requirements.txt

make download

make preprocess

make baselines

make neural

make hybrid

make evaluate

make alpha

make coldstart

```

## Example Hybrid Run

### Tri-Weight Blending

```bash

# CF + Content blend (no neural)

python -m src.hybrid --w_cf 0.6 --w_content 0.4 --w_neural 0.0 \

  --train_path data/processed/train.csv \

  --test_path data/processed/test.csv \

  --items_path data/ml-100k/u.item

# CF + Content + Neural blend (neural weight ignored if model not found)

python -m src.hybrid --w_cf 0.5 --w_content 0.3 --w_neural 0.2 \

  --train_path data/processed/train.csv \

  --test_path data/processed/test.csv \

  --items_path data/ml-100k/u.item \

  --neural_model_path models/neural_cf.pt

```

**Note:** If the neural model is not trained/available, the neural weight will be automatically set to 0 and weights will be renormalized.

## API Usage

The system provides a FastAPI-based REST API for serving recommendations:

```bash

# Start the API server

uvicorn api.app:app --reload

# Get recommendations with custom weights

curl "http://localhost:8000/recommend?user_id=1&k=10&w_cf=0.6&w_content=0.4&w_neural=0.0"

# Check API health

curl "http://localhost:8000/health"

# Get metadata

curl "http://localhost:8000/meta"

```

## Alpha Sweep

```bash

python scripts/alpha_sweep.py --alphas 0.3 0.5 0.7 0.9 --k 10

```

## Optuna Hyperparameter Tuning

```bash

python -m src.optuna_tune --trials 30

```

## Cold-Start Evaluation

```bash

python -m src.cold_start --train_path data/processed/train.csv \

  --test_path data/processed/test.csv --k 10

```

## Scalability (Interview Talking Points)

- Replace brute-force similarity with ANN (Faiss)

- Offline batch refresh + incremental updates

- Candidate generation → re-ranking pipeline

- Feature store for user/item embeddings

```

https://gpt-website-builder-1-0.onrender.com/gpt/0acbab5a.html

```

## Future Work

- Transformer text embeddings

- Implicit feedback (BPR / WARP)

- Meta-learning blend weights

- MLflow tracking

- A/B simulation harness

## License

MIT

## Citation

MovieLens datasets: © GroupLens Research Group.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/triiju/hybrid-recommendation-engine

Awesome Lists containing this project

README