https://github.com/connergroth/timbrality
Hybrid music recommender combining NMF collaborative filtering, two-tower content embeddings, audio feature synthesis, and meta-learning fusion for adaptive personalization.
https://github.com/connergroth/timbrality
beautifulsoup cloudscraper docker fastapi lastfm-api machine-learning music postgresql python pytorch redis scikit-learn spotify-api supabase
Last synced: about 1 month ago
JSON representation
Hybrid music recommender combining NMF collaborative filtering, two-tower content embeddings, audio feature synthesis, and meta-learning fusion for adaptive personalization.
- Host: GitHub
- URL: https://github.com/connergroth/timbrality
- Owner: connergroth
- Created: 2025-01-20T20:54:30.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-10-08T03:01:07.000Z (8 months ago)
- Last Synced: 2026-04-11T03:29:29.580Z (about 1 month ago)
- Topics: beautifulsoup, cloudscraper, docker, fastapi, lastfm-api, machine-learning, music, postgresql, python, pytorch, redis, scikit-learn, spotify-api, supabase
- Language: Python
- Homepage: https://timbrality.com
- Size: 65.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Timbrality - AI Powered Music Discovery
> **Timbrality** — a machine learning-powered music recommendation engine that uses AI agents to create personalized music experiences.
Timbrality is an intelligent music recommendation platform that combines data from **Spotify**, **Last.fm**, and **Album of the Year (AOTY)** to provide personalized music suggestions through conversational AI agents. The platform features a hybrid recommendation system powered by the **Timbral** ML engine and modern web interface built with React and Next.js.
---
## Features
- **🤖 AI-Powered Music Agent**
Conversational AI agent that understands music preferences and provides intelligent recommendations through natural language interactions.
- **🎧 Personalized Recommendations**
Hybrid recommendation system combining collaborative filtering and content-based approaches using listening behavior and audio features.
- **🔗 Multi-Platform Integration**
Seamlessly connects with **Spotify**, **Last.fm**, and **Album of the Year** to gather comprehensive music data and preferences.
- **📱 Modern Web Interface**
Clean, responsive UI built with React/Next.js featuring chat interface, playlist management, and real-time music discovery.
- **🎵 Smart Playlist Creation**
AI-generated playlists with Spotify integration for seamless music discovery and playlist management.
- **⚡ High-Performance Backend**
FastAPI-powered backend with multi-tier caching (Redis + in-memory), rate limiting, and async processing.
- **📊 Rich Music Metadata**
Enhanced with AOTY ratings, reviews, tags, and similar album data through sophisticated CloudScraper-based pipeline with rating count extraction.
---
## Architecture Overview
### Backend (FastAPI)
- **Multi-tier caching**: Redis primary + in-memory fallback
- **Rate limiting**: 30 requests/minute via SlowAPI
- **Database**: PostgreSQL with SQLAlchemy ORM and Alembic migrations
- **Web scraping**: CloudScraper with async processing for comprehensive AOTY data extraction
- **AI Agent**: NLP processor with tool registry for music recommendations
### ML Service (Timbral Engine)
- **Hybrid recommendation engine**: NMF collaborative + BERT content-based filtering
- **Dedicated FastAPI service**: Port 8001 with ML-specific endpoints
- **Model serving**: Redis-cached recommendations with explainability
- **HTTP integration**: Proxied through main backend at `/timbral/*` routes
### Frontend
- **Main site**: Vite + React + shadcn/ui components
- **Auth app**: Next.js application for OAuth flows
- **State management**: React Context + Supabase auth
### Key Components
- `/backend/agent/`: AI agent core, tools, and NLP processing
- `/backend/routes/`: API endpoints (agent, albums, playlists, users, timbral)
- `/backend/services/`: Business logic (Spotify, Last.fm, ML, AOTY)
- `/backend/ingestion/`: Data pipeline for music metadata
- `/ml/timbral/`: Timbral ML engine (models, training, inference)
- `/frontend/app/`: Next.js authentication and chat interface
---
## Tech Stack
### 💻 Backend Technologies
- **FastAPI** – Async Python web framework with automatic OpenAPI docs
- **PostgreSQL + SQLAlchemy** – Relational database with async ORM
- **Redis** – High-performance caching layer
- **CloudScraper** – Advanced web scraping with anti-bot protection bypass for AOTY data
- **Pydantic** – Data validation and serialization
### 📊 Data Sources & APIs
- **Spotify Web API** – User listening data, playlists, and audio features
- **Last.fm API** – Scrobbling data and music discovery
- **AOTY Custom Scraper** – Album ratings, reviews, rating counts, and comprehensive metadata
- **Supabase** – Authentication and user management
### 🤖 AI & Machine Learning
- **AI Agent Architecture** – Tool-based agent for music recommendations
- **NLP Processing** – Natural language understanding for music queries
- **Timbral Engine** – Dedicated ML microservice with hybrid recommendation engine
- **NMF Collaborative Filtering** – User-item matrix factorization for personalized suggestions
- **BERT Content-Based Filtering** – Semantic understanding of music metadata and genres
- **Model Explainability** – Built-in recommendation reasoning and explanations
---
## Model Design
### 🔸 Collaborative Filtering (CF)
- Built from play counts and listening behavior
- Uses Non-negative Matrix Factorization (NMF)
- Predicts latent user-track affinities
### 🔹 Content-Based Filtering (CBF)
- Embeds mood, genre, and tags using Sentence-BERT
- Computes track similarity with cosine distance
- Useful for cold-starts and fallback recs
### 🔶 Hybrid Fusion
- Weighted blending of CF + CBF scores
- Tunable or learnable fusion logic
- Produces rich, explainable recs per user or seed
---
## AI Agent
Timbrality features an advanced AI agent with **dual-store memory architecture** that combines fast Redis working memory with durable PostgreSQL long-term storage for intelligent, context-aware music recommendations.
### **Memory Architecture:**
- **Redis Working Memory** – Sub-millisecond access to recent conversations (last 50-200 turns per chat)
- **PostgreSQL + pgvector** – Semantic search and long-term memory with embeddings
- **Context Assembly** – Intelligent retrieval combining recent turns, relevant memories, and user preferences
- **Background Processing** – Async summarization, fact extraction, and topic analysis
### **Agent Capabilities:**
- **Conversational Memory** – Remembers user preferences, music tastes, and conversation context
- **Semantic Understanding** – Uses Sentence-BERT embeddings for natural language processing
- **Tool Integration** – Access to music databases, recommendation engines, and analysis tools
- **Streaming Responses** – Real-time interaction with memory context updates
- **Automatic Learning** – Extracts user facts, preferences, and music patterns over time
### **Memory Features:**
- **Working Memory** – Fast access to recent chat context with configurable TTL (24-72 hours)
- **Long-term Memory** – Durable storage of important facts, preferences, and conversation summaries
- **Semantic Search** – Vector similarity search for relevant context using pgvector
- **Importance Scoring** – Memory prioritization based on user interaction patterns
- **Topic Tracking** – Automatic extraction and trending of music-related topics
### **API Endpoints:**
```bash
POST /api/agent/chat # Enhanced chat with memory integration
POST /api/agent/chat/stream # Streaming responses with context
GET /api/agent/memory/stats # User memory statistics
POST /api/agent/memory/process # Trigger background memory processing
```
---
## AOTY Data Scraper
Timbrality includes a sophisticated web scraper that extracts rich music metadata from **Album of the Year (AOTY)**, one of the most comprehensive music databases available. This custom scraper enhances the platform's recommendation capabilities with detailed album ratings, reviews, and metadata.
### 🎯 What It Scrapes
**Albums:**
- User scores and rating counts (e.g., "Based on 37,040 ratings")
- Critic reviews from major publications
- Popular user reviews with like counts
- Genre tags and metadata
- Similar album recommendations
- "Must Hear" designations
**Artists:**
- Overall user ratings and rating counts
- Biography and formation details
- Geographic location data
- Complete discography listings
- Genre classifications
**Tracks:**
- Individual track ratings and rating counts
- Track-level metadata and features
- Featured artist information
- Track length and positioning data
### 🛠 Technical Implementation
**Web Scraping Engine:**
- **CloudScraper** for bypassing anti-bot protection
- **BeautifulSoup** for robust HTML parsing
- **Async/await** processing for high performance
- **Custom retry logic** with exponential backoff
- **Rate limiting** to respect AOTY's servers
**Data Models:**
```python
class Album(BaseModel):
title: str
artist: str
user_score: Optional[float]
num_ratings: int
tracks: List[Track]
critic_reviews: List[CriticReview]
popular_reviews: List[AlbumUserReview]
class Track(BaseModel):
title: str
rating: Optional[int]
num_ratings: int
featured_artists: List[str]
```
**API Endpoints:**
```bash
GET /scraper/album?artist=Radiohead&album=OK+Computer
GET /scraper/similar?artist=Radiohead&album=OK+Computer
GET /scraper/artist?name=Radiohead
```
### 🔄 Data Pipeline Integration
**Automated Population:**
```bash
# Add rating count columns to existing tables
psql $DATABASE_URL -f backend/add_aoty_rating_counts.sql
# Populate rating counts for all entities
python backend/populate_aoty_rating_counts.py --type all --batch-size 10
```
**Database Enhancement:**
- Adds `aoty_num_ratings` columns to albums, artists, and tracks tables
- Batch processing with configurable limits
- Resume capability for interrupted runs
- Error handling and logging for production use
**Caching Strategy:**
- **Redis caching** for scraped data with configurable TTL
- **In-memory fallback** when Redis is unavailable
- **Smart cache keys** based on artist/album combinations
- **Cache warming** for popular albums and artists
### 🎵 Use Cases
**Recommendation Enhancement:**
- Weight recommendations by AOTY rating popularity
- Surface critically acclaimed but undiscovered albums
- Filter by minimum rating thresholds
- Include review-based reasoning in AI responses
**Music Discovery:**
- "Similar Albums" recommendations from AOTY's algorithm
- Genre-based exploration using AOTY's tagging system
- Critical consensus analysis for new releases
- User review sentiment for recommendation explanations
**Data Quality:**
- Cross-reference Spotify/Last.fm data with AOTY metadata
- Resolve artist/album name discrepancies
- Enrich sparse metadata with comprehensive AOTY details
- Validate music catalog completeness
---
## Getting Started
### Prerequisites
- Python 3.8+
- Node.js 18+
- PostgreSQL
- Redis (optional, falls back to in-memory cache)
### Full Stack Setup (Docker)
```bash
docker-compose up
```
### Manual Setup
#### Backend
```bash
cd backend
pip install -r requirements.txt
uvicorn main:app --reload # Port 8000
```
#### ML Service
```bash
cd ml
pip install -r requirements.txt
python main.py # Port 8001
```
#### Frontend
```bash
# Main site
cd frontend && npm install && npm run dev # Port 3001
# Auth app
cd frontend/app && npm install && npm run dev # Port 3000
```
### Environment Variables
Configure `.env` files in `backend/`, `ml/`, and `frontend/app/` directories with your API keys for Spotify, Last.fm, Supabase, and OpenAI.
---
## Current Status
✅ **Completed:**
- AI agent architecture with conversational interface
- Multi-platform data integration (Spotify, Last.fm, AOTY)
- Modern React/Next.js frontend with chat interface
- FastAPI backend with caching and rate limiting
🚧 **In Progress:**
- Enhanced playlist management features
- Performance optimizations and deployment preparation
- Advanced ML model training and fine-tuning