https://github.com/ejfox/scrap-enlightener
https://github.com/ejfox/scrap-enlightener
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/ejfox/scrap-enlightener
- Owner: ejfox
- Created: 2025-06-12T14:52:31.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-08-21T22:04:10.000Z (10 months ago)
- Last Synced: 2025-10-26T01:39:12.372Z (8 months ago)
- Language: JavaScript
- Size: 16.5 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scrap Enlightener 🕵️
**Transform your Pinboard bookmarks into a knowledge graph with AI-powered enrichment, semantic search, and automatic tagging.**
## What This Does
Scrap Enlightener supercharges your Pinboard bookmarks by:
- 🤖 **AI Enrichment**: Automatically extracts key information from bookmarked pages (quotes, questions, links, headers)
- 🔗 **Connection Detection**: Discovers relationships between bookmarks through shared links and concepts
- 🔍 **Semantic Search**: Find bookmarks by meaning, not just keywords (powered by OpenAI embeddings)
- 🏷️ **Smart Tagging**: AI suggests relevant tags based on actual page content
- 📸 **Visual Memory**: Captures screenshots of bookmarked pages
- 💬 **Chat with Bookmarks**: Ask questions about your bookmark collection
- 📊 **Social Discovery**: Find Hacker News and Reddit discussions about your bookmarks
- 🕰️ **Time Machine**: Browse your bookmarks by date with timeline view
- 📦 **Data Export**: Export your enriched bookmarks as JSON, CSV, or standalone HTML
## How We Compare
| Feature | Scrap Enlightener | [Hoarder](https://github.com/hoarder-app/hoarder) | [Linkding](https://github.com/sissbruecker/linkding) | [Raindrop.io](https://raindrop.io) | [Readwise Reader](https://readwise.io/reader) |
|---------|------------------|---------|----------|------------|----------------|
| **Pinboard Integration** | ✅ Native | ❌ | ✅ Import only | ✅ Import only | ❌ |
| **AI Auto-Tagging** | ✅ | ✅ | ❌ | ❌ | ✅ |
| **Semantic Search** | ✅ OpenAI | ❌ | ❌ | ✅ Paid tier | ✅ |
| **Link Connections** | ✅ | ❌ | ❌ | ❌ | ✅ |
| **Screenshots** | ✅ | ✅ | ✅ | ✅ | ✅ |
| **Full-Text Archive** | ✅ | ✅ | ✅ | ✅ Paid | ✅ |
| **HN/Reddit Discovery** | ✅ | ❌ | ❌ | ❌ | ❌ |
| **Chat with Bookmarks** | ✅ | ✅ | ❌ | ❌ | ✅ |
| **Self-Hosted** | ✅ | ✅ | ✅ | ❌ | ❌ |
| **Offline Export** | ✅ HTML/JSON/CSV | ❌ | ✅ HTML | ✅ Limited | ✅ Limited |
| **Pricing** | Free (BYO API keys) | Free | Free | $3-9/mo | $8-15/mo |
| **Database** | Supabase/PostgreSQL | SQLite | SQLite/PostgreSQL | Proprietary | Proprietary |
| **Primary Focus** | Pinboard enhancement | Everything saver | Simple bookmarks | Visual bookmarks | Reading & highlights |
### Why Scrap Enlightener?
- **Pinboard-first**: Built specifically for Pinboard users, not trying to replace it
- **Bring Your Own Keys**: Use your own OpenAI API keys, pay only for what you use
- **Knowledge Graph**: Unique focus on discovering connections between bookmarks
- **Fully Exportable**: Your data is always yours, export to standalone HTML that works offline
## Prerequisites
- Node.js 20+ and npm
- A [Pinboard](https://pinboard.in) account with API token
- A [Supabase](https://supabase.com) project (free tier works) *
- An [OpenAI API key](https://platform.openai.com/api-keys) for AI features
\* *Supabase dependency is being removed - see [decoupling docs](docs/supabase-decoupling.md) for SQLite/PostgreSQL/JSON alternatives*
## Quick Start
### 1. Clone and Install
```bash
git clone https://github.com/yourusername/scrap-enlightener.git
cd scrap-enlightener
npm install
```
### 2. Choose Your Database
#### Option A: SQLite (Easiest - No External Dependencies!)
```bash
# Just set this in your .env:
DB_ADAPTER=sqlite
SQLITE_PATH=./data/bookmarks.db
# That's it! Database will be created automatically.
```
#### Option B: Supabase (Cloud-hosted)
1. Create a new project at [supabase.com](https://supabase.com)
2. Run the migrations in order from `supabase/migrations/`:
```sql
-- Run each .sql file in your Supabase SQL editor
```
3. Copy your project URL and anon key from Project Settings > API
### 3. Configure Environment
```bash
cp .env.example .env
```
Edit `.env` with your keys:
```env
# From Supabase project settings
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_ANON_KEY=your-anon-key-here
# From OpenAI platform
OPENAI_API_KEY=sk-your-openai-key-here
# Optional: Direct Pinboard API access (for testing)
PINBOARD_API_TOKEN=username:token
```
### 4. Start Development Server
```bash
npm run dev
```
Open http://localhost:3000
### 5. Connect Your Pinboard
1. Sign up/login with Supabase Auth
2. Go to Settings page
3. Enter your Pinboard API token (format: `username:XXXXXXXXXX`)
4. Your token is securely stored in the database per-user
## Core Features
### ⚡ ENLIGHTEN.EXE (`/enlighten`)
**Radically simplified single-screen bookmark processing theater**
Transform your untagged Pinboard bookmarks into properly tagged knowledge:
- **Real-time AI streaming** - Watch AI analyze content as it happens
- **Single-word tag suggestions** - Only from your existing vocabulary
- **Live processing display** - See content analysis and tag generation
- **Automatic progression** - Accept tags and move to next bookmark seamlessly
- **Rate-limited & respectful** - Proper API throttling for Pinboard
**Simple workflow:**
1. Enter your Pinboard token
2. Click "Load Bookmarks"
3. Watch AI analyze each bookmark in real-time
4. Accept/reject suggested tags with one click
5. Automatically moves to next untagged bookmark
### 🔍 Semantic Search (`/search`)
Search bookmarks by meaning using AI embeddings:
- Finds conceptually related bookmarks
- Works even without exact keyword matches
- Powered by OpenAI's text-embedding-3-small model
### 🔗 Connection Discovery
Automatically finds relationships between bookmarks:
- Detects when bookmarks link to each other
- Creates a knowledge graph of your bookmarks
- Surfaces forgotten related content
### 📊 Analytics Dashboard (`/analytics`)
Track your bookmark habits:
- Tagging velocity and streaks
- Domain frequency analysis
- Achievement system for gamification
## API Endpoints
### Enrichment APIs
- `POST /api/enrichment/extract` - Extract content from a URL
- `POST /api/enrichment/embed` - Generate embeddings for semantic search
### Capture APIs
- `POST /api/capture/screenshot` - Capture page screenshot
- `POST /api/capture/archive` - Archive full page content
### Discovery APIs
- `POST /api/social/discover` - Find HN/Reddit discussions
- `POST /api/chat/bookmarks` - Chat with your bookmarks
- `POST /api/diff/check` - Check if a page has changed
### Export APIs
- `GET /api/export/json` - Export as JSON
- `GET /api/export/csv` - Export as CSV
- `GET /api/export/static` - Export as standalone HTML
### Time Machine
- `GET /api/timemachine/snapshots` - Browse bookmarks by date
### REST API
- `GET /api/v1/bookmarks` - List bookmarks with filters
- `POST /api/v1/bookmarks` - Add new bookmark
- `DELETE /api/v1/bookmarks` - Delete bookmark
## Architecture
```
Frontend (Nuxt 3 + Vue 3 + Tailwind)
↓
API Layer (Nitro server endpoints)
↓
Services
├── Pinboard API (bookmark sync)
├── OpenAI API (enrichment + embeddings)
├── Supabase (auth + database + vector search)
└── External APIs (HN, Reddit, web scraping)
```
## Database Schema
- `users` - User accounts and settings
- `bookmarks` - Synced Pinboard bookmarks
- `enrichments` - Extracted content and metadata
- `bookmark_embeddings` - Vector embeddings for semantic search
- `connections` - Relationships between bookmarks
- `achievements` - Gamification tracking
## Common Issues
### "Cannot find package '@xenova/transformers'"
```bash
npm install @xenova/transformers
```
### "Pinboard API token invalid"
- Format should be `username:TOKEN` (get from https://pinboard.in/settings/password)
- Token is stored per-user in Settings, not in .env
### "OpenAI API key missing"
- Get key from https://platform.openai.com/api-keys
- Add to .env as `OPENAI_API_KEY=sk-...`
### "Supabase connection failed"
- Check SUPABASE_URL and SUPABASE_ANON_KEY in .env
- Ensure migrations are run in order
## Development Commands
```bash
npm run dev # Start dev server
npm run build # Build for production
npm run preview # Preview production build
npm run lint # Run linter
npm run typecheck # Check TypeScript types
```
## Tech Stack
- **Frontend**: Nuxt 3, Vue 3, Tailwind CSS, Headless UI
- **Backend**: Nitro, Node.js
- **Database**: Supabase (PostgreSQL + pgvector)
- **AI/ML**: OpenAI API (GPT-3.5, embeddings)
- **APIs**: Pinboard, Hacker News Algolia, Reddit
- **Scraping**: Cheerio, Playwright (optional)
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit changes (`git commit -m 'Add amazing feature'`)
4. Push to branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
MIT - See [LICENSE](LICENSE) file
## Acknowledgments
- Inspired by tools like Hoarder, Karakeep, and Readwise
- Built for the Pinboard community
- Powered by OpenAI and Supabase