https://github.com/lukeprior/letterboxdreccomender
https://github.com/lukeprior/letterboxdreccomender
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/lukeprior/letterboxdreccomender
- Owner: LukePrior
- License: mit
- Created: 2025-06-17T08:15:53.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-06-17T14:33:38.000Z (12 months ago)
- Last Synced: 2025-06-17T14:39:24.063Z (12 months ago)
- Language: Python
- Size: 1000 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Letterboxd Recommender
A self-contained movie recommendation system that provides personalized suggestions based on your Letterboxd viewing history or manual movie input. The system uses a Sequential Recommendation model (SASRec) trained on the MovieLens 32M dataset to generate accurate recommendations.
## 🎬 Features
- **Letterboxd Integration**: Automatically fetch your recent films from any public Letterboxd profile
- **Manual Input**: Enter movie titles directly for recommendations
- **Rating-Based Filtering**: Get recommendations based on your highly-rated films (3★+, 4★+, 5★ only)
- **Real-time Processing**: Client-side inference using ONNX.js for fast, private recommendations
- **Self-Contained**: No external API dependencies - everything runs locally
## 🚀 Quick Start
1. **Clone the repository**
```bash
git clone https://github.com/yourusername/LetterboxdReccomender.git
cd LetterboxdReccomender
```
2. **Start the local server**
```bash
python serve.py
```
3. **Open your browser**
Navigate to `http://localhost:8000` and start getting recommendations!
## 🏗️ Architecture Overview
### Frontend (Web Interface)
- **Pure JavaScript**: No frameworks, lightweight and fast
- **ONNX.js Runtime**: Client-side model inference
- **CORS Proxy**: Fetches Letterboxd data without backend dependencies
- **Responsive Design**: Works on desktop and mobile
### Backend (Training & Data Processing)
- **PyTorch**: Model training and validation
- **SASRec Algorithm**: Sequential recommendation using self-attention
- **MovieLens 32M**: 32 million ratings across 87,000+ movies
- **ONNX Export**: Model converted for web deployment
## 🧠 Algorithm Design
### SASRec (Self-Attentive Sequential Recommendation)
The recommendation system is built on the SASRec architecture, which excels at understanding sequential patterns in user behavior.
**Key Components:**
- **Self-Attention Mechanism**: Captures relationships between movies in your viewing history
- **Positional Encoding**: Understands the order of movies you've watched
- **Multi-Head Attention**: Focuses on different aspects of movie preferences simultaneously
**Training Process:**
1. **Data Preprocessing** ([`code/main.py`](code/main.py))
- Filters users with ≥10 movie ratings
- Calculates 70th percentile ratings per user
- Creates binary positive feedback (above percentile = liked)
2. **Sequence Generation** ([`code/train.py`](code/train.py))
- Converts user ratings into sequential movie-watching patterns
- Pads sequences to consistent length (50 movies max)
- Splits into training/validation sets (80/20)
3. **Model Training**
- **Architecture**: 2 transformer layers, 8 attention heads, 128 embedding dimensions
- **Loss Function**: Binary cross-entropy with negative sampling
- **Optimization**: Adam optimizer with early stopping
- **Validation**: Tracks accuracy and prevents overfitting
4. **ONNX Conversion** ([`code/convert.py`](code/convert.py))
- Exports trained PyTorch model to ONNX format
- Optimizes for web deployment and inference speed
### Model Performance
- **Training Data**: 200K+ users, 32M+ ratings
- **Vocabulary**: 87K+ unique movies
- **Context Length**: Up to 50 previous movies
- **Inference Time**: <100ms client-side
## 📁 Project Structure
```
LetterboxdReccomender/
├── web/ # Frontend application
│ ├── index.html # Main web interface
│ ├── app.js # Core application logic
│ ├── letterboxd.js # Letterboxd integration
│ └── style.css # UI styling
├── code/ # Training and processing scripts
│ ├── main.py # Data preprocessing pipeline
│ ├── train.py # Model training with validation
│ ├── convert.py # ONNX model conversion
│ ├── inference.py # Python inference testing
│ ├── mapping.py # Movie ID/title mappings
│ └── depickle.py # Model inspection utilities
├── data/ # MovieLens dataset
│ ├── movies.csv # Movie metadata
│ ├── ratings.csv # User ratings (32M+ entries)
│ └── README.txt # Dataset documentation
├── models/ # Trained models and mappings
│ ├── sasrec_model.onnx # Web-optimized model
│ ├── metadata.json # Model configuration
│ ├── movie_map.pkl # Movie ID mappings
│ └── *.pth # PyTorch checkpoints
└── output/ # Processing outputs and logs
```
## 🔧 Technical Implementation
### Data Flow
1. **Input Processing**
- Letterboxd: Scrapes user profile via CORS proxy
- Manual: Parses movie titles with fuzzy matching
- Normalizes titles and maps to MovieLens IDs
2. **Sequence Preparation**
- Left-pads movie sequence to 50 items
- Converts to appropriate tensor format
- Handles various input lengths gracefully
3. **Model Inference**
- ONNX.js runs model in browser
- Processes attention weights across sequence
- Generates probability scores for all movies
4. **Recommendation Generation**
- Excludes already-watched movies
- Ranks by confidence score
- Returns top 10 suggestions with metadata
### Key Files Explained
**[`web/app.js`](web/app.js)**: Core frontend logic
- Model loading and initialization
- Movie title normalization and matching
- ONNX inference pipeline
- UI state management
**[`web/letterboxd.js`](web/letterboxd.js)**: Letterboxd integration
- Profile scraping with CORS proxy
- Rating extraction and processing
- Film matching against MovieLens database
**[`code/train.py`](code/train.py)**: Model training pipeline
- SASRec implementation in PyTorch
- Training loop with validation
- Model checkpointing and saving
**[`code/main.py`](code/main.py)**: Data preprocessing
- Rating threshold calculation (70th percentile)
- User/movie filtering
- Binary feedback generation
## 🎯 Self-Contained Design
This project is designed to run completely independently:
**No External APIs Required:**
- Letterboxd data fetched via public CORS proxy
- All inference happens client-side
- MovieLens data included in repository
**Offline Capability:**
- Once loaded, works without internet
- All models and data bundled locally
- No tracking or data collection
**Easy Deployment:**
- Single Python file serves everything
- No database or complex setup needed
- Works on any system with Python 3.6+
## 🚀 Getting Started (Detailed)
### Prerequisites
- Python 3.6+ with basic libraries (pandas, numpy)
- Modern web browser with JavaScript enabled
- ~500MB disk space for models and data
### Installation
1. **Download the project**
```bash
git clone https://github.com/yourusername/LetterboxdReccomender.git
cd LetterboxdReccomender
```
2. **Install Python dependencies** (if training models)
```bash
pip install torch pandas numpy scikit-learn onnx
```
3. **Start the web server**
```bash
python serve.py
```
4. **Access the application**
Open `http://localhost:8000` in your browser
### Usage
**Option 1: Letterboxd Integration**
1. Enter any public Letterboxd username
2. System fetches recent rated films automatically
3. Choose recommendation source:
- Recent films (latest 50)
- 5-star films only
- 4+ star films
- 3+ star films
**Option 2: Manual Input**
1. Enter comma-separated movie titles
2. System matches against MovieLens database
3. Get recommendations based on your list
## 🔬 Model Training (Advanced)
If you want to retrain the model or experiment with different parameters:
### 1. Data Preparation
```bash
cd code
python main.py # Processes MovieLens data
```
### 2. Model Training
```bash
python train.py # Trains SASRec model with validation
```
### 3. ONNX Conversion
```bash
python convert.py # Converts to web-compatible format
```
### Training Configuration
Edit [`code/train.py`](code/train.py) to modify:
- `EMBED_DIM`: Embedding dimensions (default: 128)
- `NUM_HEADS`: Attention heads (default: 8)
- `NUM_LAYERS`: Transformer layers (default: 2)
- `MAX_LEN`: Sequence length (default: 50)
- `BATCH_SIZE`: Training batch size (default: 1024)
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## 📊 Dataset Attribution
This project uses the MovieLens 32M dataset:
> F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- **GroupLens Research** for the MovieLens dataset
- **Microsoft** for ONNX.js runtime
- **Letterboxd** for the inspiration and public profiles
- **SASRec authors** for the sequential recommendation algorithm
## 🐛 Troubleshooting
**Model won't load:**
- Check browser console for errors
- Ensure all files in `models/` directory exist
- Try a different browser (Chrome/Firefox recommended)
**Letterboxd fetch fails:**
- Verify the username is correct and profile is public
- Check CORS proxy status
- Try manual input as alternative
**No recommendations generated:**
- Ensure matched movies > 0
- Check that movie titles match MovieLens database
- Try different/more popular movie titles
## 🔮 Future Improvements
- [ ] Support for more rating platforms (IMDb, TMDb)
- [ ] Collaborative filtering integration
- [ ] Genre-based filtering options
- [ ] Movie poster and metadata display
- [ ] Recommendation explanations
- [ ] Export/save recommendation lists