{"id":29263103,"url":"https://github.com/lukeprior/letterboxdreccomender","last_synced_at":"2025-10-16T17:28:34.687Z","repository":{"id":299634222,"uuid":"1003487491","full_name":"LukePrior/LetterboxdReccomender","owner":"LukePrior","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-17T14:33:38.000Z","size":1,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-17T14:39:24.063Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LukePrior.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-17T08:15:53.000Z","updated_at":"2025-06-17T14:33:42.000Z","dependencies_parsed_at":"2025-06-17T14:51:40.290Z","dependency_job_id":null,"html_url":"https://github.com/LukePrior/LetterboxdReccomender","commit_stats":null,"previous_names":["lukeprior/letterboxdreccomender"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LukePrior/LetterboxdReccomender","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LukePrior%2FLetterboxdReccomender","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LukePrior%2FLetterboxdReccomender/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LukePrior%2FLetterboxdReccomender/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LukePrior%2FLetterboxdReccomender/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LukePrior","download_url":"https://codeload.github.com/LukePrior/LetterboxdReccomender/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LukePrior%2FLetterboxdReccomender/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263499191,"owners_count":23476021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-04T11:08:03.742Z","updated_at":"2025-10-16T17:28:34.595Z","avatar_url":"https://github.com/LukePrior.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Letterboxd Recommender\n\nA self-contained movie recommendation system that provides personalized suggestions based on your Letterboxd viewing history or manual movie input. The system uses a Sequential Recommendation model (SASRec) trained on the MovieLens 32M dataset to generate accurate recommendations.\n\n## 🎬 Features\n\n- **Letterboxd Integration**: Automatically fetch your recent films from any public Letterboxd profile\n- **Manual Input**: Enter movie titles directly for recommendations\n- **Rating-Based Filtering**: Get recommendations based on your highly-rated films (3★+, 4★+, 5★ only)\n- **Real-time Processing**: Client-side inference using ONNX.js for fast, private recommendations\n- **Self-Contained**: No external API dependencies - everything runs locally\n\n## 🚀 Quick Start\n\n1. **Clone the repository**\n   ```bash\n   git clone https://github.com/yourusername/LetterboxdReccomender.git\n   cd LetterboxdReccomender\n   ```\n\n2. **Start the local server**\n   ```bash\n   python serve.py\n   ```\n\n3. **Open your browser**\n   Navigate to `http://localhost:8000` and start getting recommendations!\n\n## 🏗️ Architecture Overview\n\n### Frontend (Web Interface)\n- **Pure JavaScript**: No frameworks, lightweight and fast\n- **ONNX.js Runtime**: Client-side model inference\n- **CORS Proxy**: Fetches Letterboxd data without backend dependencies\n- **Responsive Design**: Works on desktop and mobile\n\n### Backend (Training \u0026 Data Processing)\n- **PyTorch**: Model training and validation\n- **SASRec Algorithm**: Sequential recommendation using self-attention\n- **MovieLens 32M**: 32 million ratings across 87,000+ movies\n- **ONNX Export**: Model converted for web deployment\n\n## 🧠 Algorithm Design\n\n### SASRec (Self-Attentive Sequential Recommendation)\n\nThe recommendation system is built on the SASRec architecture, which excels at understanding sequential patterns in user behavior.\n\n**Key Components:**\n- **Self-Attention Mechanism**: Captures relationships between movies in your viewing history\n- **Positional Encoding**: Understands the order of movies you've watched\n- **Multi-Head Attention**: Focuses on different aspects of movie preferences simultaneously\n\n**Training Process:**\n1. **Data Preprocessing** ([`code/main.py`](code/main.py))\n   - Filters users with ≥10 movie ratings\n   - Calculates 70th percentile ratings per user\n   - Creates binary positive feedback (above percentile = liked)\n\n2. **Sequence Generation** ([`code/train.py`](code/train.py))\n   - Converts user ratings into sequential movie-watching patterns\n   - Pads sequences to consistent length (50 movies max)\n   - Splits into training/validation sets (80/20)\n\n3. **Model Training**\n   - **Architecture**: 2 transformer layers, 8 attention heads, 128 embedding dimensions\n   - **Loss Function**: Binary cross-entropy with negative sampling\n   - **Optimization**: Adam optimizer with early stopping\n   - **Validation**: Tracks accuracy and prevents overfitting\n\n4. **ONNX Conversion** ([`code/convert.py`](code/convert.py))\n   - Exports trained PyTorch model to ONNX format\n   - Optimizes for web deployment and inference speed\n\n### Model Performance\n- **Training Data**: 200K+ users, 32M+ ratings\n- **Vocabulary**: 87K+ unique movies\n- **Context Length**: Up to 50 previous movies\n- **Inference Time**: \u003c100ms client-side\n\n## 📁 Project Structure\n\n```\nLetterboxdReccomender/\n├── web/                    # Frontend application\n│   ├── index.html         # Main web interface\n│   ├── app.js            # Core application logic\n│   ├── letterboxd.js     # Letterboxd integration\n│   └── style.css         # UI styling\n├── code/                  # Training and processing scripts\n│   ├── main.py           # Data preprocessing pipeline\n│   ├── train.py          # Model training with validation\n│   ├── convert.py        # ONNX model conversion\n│   ├── inference.py      # Python inference testing\n│   ├── mapping.py        # Movie ID/title mappings\n│   └── depickle.py       # Model inspection utilities\n├── data/                  # MovieLens dataset\n│   ├── movies.csv        # Movie metadata\n│   ├── ratings.csv       # User ratings (32M+ entries)\n│   └── README.txt        # Dataset documentation\n├── models/                # Trained models and mappings\n│   ├── sasrec_model.onnx # Web-optimized model\n│   ├── metadata.json     # Model configuration\n│   ├── movie_map.pkl     # Movie ID mappings\n│   └── *.pth            # PyTorch checkpoints\n└── output/               # Processing outputs and logs\n```\n\n## 🔧 Technical Implementation\n\n### Data Flow\n\n1. **Input Processing**\n   - Letterboxd: Scrapes user profile via CORS proxy\n   - Manual: Parses movie titles with fuzzy matching\n   - Normalizes titles and maps to MovieLens IDs\n\n2. **Sequence Preparation**\n   - Left-pads movie sequence to 50 items\n   - Converts to appropriate tensor format\n   - Handles various input lengths gracefully\n\n3. **Model Inference**\n   - ONNX.js runs model in browser\n   - Processes attention weights across sequence\n   - Generates probability scores for all movies\n\n4. **Recommendation Generation**\n   - Excludes already-watched movies\n   - Ranks by confidence score\n   - Returns top 10 suggestions with metadata\n\n### Key Files Explained\n\n**[`web/app.js`](web/app.js)**: Core frontend logic\n- Model loading and initialization\n- Movie title normalization and matching\n- ONNX inference pipeline\n- UI state management\n\n**[`web/letterboxd.js`](web/letterboxd.js)**: Letterboxd integration\n- Profile scraping with CORS proxy\n- Rating extraction and processing\n- Film matching against MovieLens database\n\n**[`code/train.py`](code/train.py)**: Model training pipeline\n- SASRec implementation in PyTorch\n- Training loop with validation\n- Model checkpointing and saving\n\n**[`code/main.py`](code/main.py)**: Data preprocessing\n- Rating threshold calculation (70th percentile)\n- User/movie filtering\n- Binary feedback generation\n\n## 🎯 Self-Contained Design\n\nThis project is designed to run completely independently:\n\n**No External APIs Required:**\n- Letterboxd data fetched via public CORS proxy\n- All inference happens client-side\n- MovieLens data included in repository\n\n**Offline Capability:**\n- Once loaded, works without internet\n- All models and data bundled locally\n- No tracking or data collection\n\n**Easy Deployment:**\n- Single Python file serves everything\n- No database or complex setup needed\n- Works on any system with Python 3.6+\n\n## 🚀 Getting Started (Detailed)\n\n### Prerequisites\n- Python 3.6+ with basic libraries (pandas, numpy)\n- Modern web browser with JavaScript enabled\n- ~500MB disk space for models and data\n\n### Installation\n\n1. **Download the project**\n   ```bash\n   git clone https://github.com/yourusername/LetterboxdReccomender.git\n   cd LetterboxdReccomender\n   ```\n\n2. **Install Python dependencies** (if training models)\n   ```bash\n   pip install torch pandas numpy scikit-learn onnx\n   ```\n\n3. **Start the web server**\n   ```bash\n   python serve.py\n   ```\n\n4. **Access the application**\n   Open `http://localhost:8000` in your browser\n\n### Usage\n\n**Option 1: Letterboxd Integration**\n1. Enter any public Letterboxd username\n2. System fetches recent rated films automatically\n3. Choose recommendation source:\n   - Recent films (latest 50)\n   - 5-star films only\n   - 4+ star films\n   - 3+ star films\n\n**Option 2: Manual Input**\n1. Enter comma-separated movie titles\n2. System matches against MovieLens database\n3. Get recommendations based on your list\n\n## 🔬 Model Training (Advanced)\n\nIf you want to retrain the model or experiment with different parameters:\n\n### 1. Data Preparation\n```bash\ncd code\npython main.py  # Processes MovieLens data\n```\n\n### 2. Model Training\n```bash\npython train.py  # Trains SASRec model with validation\n```\n\n### 3. ONNX Conversion\n```bash\npython convert.py  # Converts to web-compatible format\n```\n\n### Training Configuration\nEdit [`code/train.py`](code/train.py) to modify:\n- `EMBED_DIM`: Embedding dimensions (default: 128)\n- `NUM_HEADS`: Attention heads (default: 8)\n- `NUM_LAYERS`: Transformer layers (default: 2)\n- `MAX_LEN`: Sequence length (default: 50)\n- `BATCH_SIZE`: Training batch size (default: 1024)\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## 📊 Dataset Attribution\n\nThis project uses the MovieLens 32M dataset:\n\n\u003e F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- **GroupLens Research** for the MovieLens dataset\n- **Microsoft** for ONNX.js runtime\n- **Letterboxd** for the inspiration and public profiles\n- **SASRec authors** for the sequential recommendation algorithm\n\n## 🐛 Troubleshooting\n\n**Model won't load:**\n- Check browser console for errors\n- Ensure all files in `models/` directory exist\n- Try a different browser (Chrome/Firefox recommended)\n\n**Letterboxd fetch fails:**\n- Verify the username is correct and profile is public\n- Check CORS proxy status\n- Try manual input as alternative\n\n**No recommendations generated:**\n- Ensure matched movies \u003e 0\n- Check that movie titles match MovieLens database\n- Try different/more popular movie titles\n\n## 🔮 Future Improvements\n\n- [ ] Support for more rating platforms (IMDb, TMDb)\n- [ ] Collaborative filtering integration\n- [ ] Genre-based filtering options\n- [ ] Movie poster and metadata display\n- [ ] Recommendation explanations\n- [ ] Export/save recommendation lists","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flukeprior%2Fletterboxdreccomender","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flukeprior%2Fletterboxdreccomender","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flukeprior%2Fletterboxdreccomender/lists"}