{"id":30622459,"url":"https://github.com/robleto/shakesfind","last_synced_at":"2025-08-30T15:40:49.643Z","repository":{"id":311712073,"uuid":"1044023769","full_name":"robleto/ShakesFind","owner":"robleto","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-26T04:21:26.000Z","size":146,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-26T06:19:43.034Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robleto.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-25T04:34:13.000Z","updated_at":"2025-08-26T04:21:30.000Z","dependencies_parsed_at":"2025-08-26T06:19:51.371Z","dependency_job_id":"6401ad04-c4d7-4639-a4ee-ef3c77772d43","html_url":"https://github.com/robleto/ShakesFind","commit_stats":null,"previous_names":["robleto/shakesfind"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/robleto/ShakesFind","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robleto%2FShakesFind","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robleto%2FShakesFind/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robleto%2FShakesFind/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robleto%2FShakesFind/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robleto","download_url":"https://codeload.github.com/robleto/ShakesFind/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robleto%2FShakesFind/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272871257,"owners_count":25007134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-30T15:40:45.645Z","updated_at":"2025-08-30T15:40:49.635Z","avatar_url":"https://github.com/robleto.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ShakesFind 🎭\n\nA production-ready MVP that continuously discovers and normalizes upcoming Shakespeare productions from multiple theater websites into a Neon Postgres database, exposes a simple API, and publishes a Next.js site on Netlify.\n\n## 🚀 Features\n\n- **Automated Scraping**: Discovers productions from 5 major Shakespeare theaters\n- **Smart Normalization**: Maps raw titles to canonical Shakespeare plays using configurable aliases\n- **Multiple Data Sources**: Supports ICS calendars, JSON-LD, and HTML parsing\n- **Admin Interface**: Review queue for low-confidence items with approval/edit workflow\n- **Public API**: RESTful endpoints for productions and companies\n- **Search \u0026 Filters**: Find productions by play, location, and date range\n- **Rate Limited \u0026 Polite**: Respects robots.txt and implements proper crawling etiquette\n\n## 🏗️ Tech Stack\n\n- **Frontend**: Next.js 14+ (App Router), TypeScript, Tailwind CSS, shadcn/ui\n- **Backend**: Next.js Route Handlers + background workers\n- **Database**: Neon Postgres with Prisma ORM\n- **Auth**: NextAuth.js (Email magic link + GitHub OAuth)\n- **Hosting**: Netlify\n- **CI/CD**: GitHub Actions for scheduled scraping\n- **Monitoring**: Console logging + health endpoints\n\n## 🎯 Supported Theaters\n\n1. **Alabama Shakespeare Festival** (asf.net)\n2. **American Shakespeare Center** (americanshakespearecenter.com)\n3. **Oregon Shakespeare Festival** (osfashland.org)\n4. **Shakespeare Theatre Company** (shakespearetheatre.org)\n5. **Utah Shakespeare Festival** (bard.org)\n\n## 📊 Data Pipeline\n\n### Ingestion Priority\n1. **ICS Calendars** (95% confidence) - If discoverable\n2. **JSON-LD** (90% confidence) - schema.org/Event structured data\n3. **HTML Heuristics** (60% confidence) - Conservative pattern matching\n\n### Processing Flow\n1. **Fetch**: Polite crawling with robots.txt respect and rate limiting\n2. **Parse**: Extract events using appropriate parser\n3. **Normalize**: Map to canonical plays and standardize dates\n4. **Deduplicate**: Avoid duplicate productions per company/play/date\n5. **Review**: Queue low-confidence items for admin approval\n\n## 🚧 Quick Start\n\n### Prerequisites\n- Node.js 18+\n- pnpm 8+\n- Neon Postgres database\n- (Optional) GitHub OAuth app\n- (Optional) Email provider for magic links\n\n### 1. Clone and Install\n```bash\ngit clone \u003crepository-url\u003e\ncd ShakesFind\npnpm install\n```\n\n### 2. Environment Setup\n```bash\ncp .env.example .env.local\n```\n\nEdit `.env.local` with your configuration:\n```env\n# Required\nDATABASE_URL=\"postgresql://user:password@your-neon-url/database\"\nNEXTAUTH_SECRET=\"your-secure-random-string\"\nNEXTAUTH_URL=\"http://localhost:3000\"\nADMIN_EMAIL=\"your-email@example.com\"\n\n# Optional\nGITHUB_ID=\"your-github-oauth-id\"\nGITHUB_SECRET=\"your-github-oauth-secret\"\n```\n\n### 3. Database Setup\n```bash\n# Generate Prisma client\npnpm prisma generate\n\n# Run migrations\npnpm prisma migrate dev\n\n# Seed initial data\npnpm prisma db seed\n```\n\n### 4. Development\n```bash\n# Start dev server\npnpm dev\n\n# Run scraper manually\npnpm scrape\n\n# Run tests\npnpm test\n```\n\nVisit `http://localhost:3000` to see the site.\n\n## 🌐 Deployment\n\n### Neon Database\n1. Create a new Neon project\n2. Copy the connection string to `DATABASE_URL`\n3. Run migrations: `pnpm prisma migrate deploy`\n\n### Netlify Deployment\n1. Connect your GitHub repository to Netlify\n2. Set build command: `pnpm build`\n3. Set publish directory: `.next`\n4. Add environment variables in Netlify dashboard\n5. Deploy!\n\n### GitHub Actions Setup\nAdd these secrets to your GitHub repository:\n- `DATABASE_URL`: Your Neon connection string\n- `NEXTAUTH_SECRET`: Same as in your .env\n\nThe scraper will run automatically at 3:15 AM UTC daily.\n\n## 📡 API Documentation\n\n### Productions Endpoint\n```\nGET /api/productions\n```\n\n**Query Parameters:**\n- `play`: Filter by canonical play (e.g., \"HAMLET\")\n- `companyId`: Filter by specific company\n- `q`: Search query (company, city, or play name)\n- `start`: Start date filter (ISO format)\n- `end`: End date filter (ISO format)\n- `limit`: Results per page (default: 20, max: 100)\n- `cursor`: Pagination cursor\n\n**Example:**\n```bash\ncurl \"https://yoursite.netlify.app/api/productions?play=HAMLET\u0026limit=10\"\n```\n\n### Companies Endpoint\n```\nGET /api/companies\n```\n\n**Query Parameters:**\n- `q`: Search query (name or city)\n- `region`: Filter by state/region\n- `country`: Filter by country (default: US)\n- `limit`: Results per page (default: 20)\n- `cursor`: Pagination cursor\n\n## 🛠️ Development Guide\n\n### Adding New Theaters\n\n1. **Create adapter**: Add file in `lib/scraping/adapters/`\n```typescript\nexport async function scrapeTheaterName(): Promise\u003cNormalizedEvent[]\u003e {\n  // Implementation\n}\n```\n\n2. **Register scraper**: Add to `scripts/scrape.ts` SCRAPERS object\n\n3. **Add to seed**: Include company in `prisma/seed.ts`\n\n### Custom Play Aliases\n\nEdit `lib/normalization/plays.ts` to add new regex patterns:\n```typescript\n{\n  pattern: '(?i)custom\\\\s+pattern',\n  play: CanonicalPlay.YOUR_PLAY,\n  confidence: 0.9,\n}\n```\n\n### Parser Extensions\n\n- **ICS**: Extend `lib/scraping/parse-ics.ts`\n- **JSON-LD**: Extend `lib/scraping/parse-jsonld.ts`\n- **HTML**: Extend `lib/scraping/parse-html.ts`\n\n## 🔐 Admin Interface\n\nAccess admin features at `/admin` (requires authentication):\n\n### Review Queue\n- View productions awaiting approval\n- Edit canonicalPlay, dates, and venue\n- Approve or archive items\n- Bulk actions for efficiency\n\n### Source Management\n- Monitor scraping health\n- Enable/disable sources\n- Manual re-crawl triggers\n- View last run status\n\n## 📊 Data Model\n\n### Core Entities\n- **Company**: Theater organizations\n- **Venue**: Performance locations\n- **Production**: Individual shows\n- **Source**: Scraping endpoints\n- **RawPage**: Cached content for debugging\n\n### Relationships\n```\nCompany 1:N Productions\nCompany 1:N Sources\nCompany 1:N Venues\nVenue 1:N Productions\nSource 1:N RawPages\n```\n\n## ⚙️ Configuration\n\n### Rate Limiting\nDefault: 1 request/second per domain. Adjust in `lib/scraping/fetch.ts`:\n```typescript\nsetRateLimit(2000) // 2 seconds\n```\n\n### Confidence Thresholds\n- Auto-publish: \u003e80% confidence\n- Review queue: 20-80% confidence\n- Auto-reject: \u003c20% confidence\n\n## 🧪 Testing\n\n```bash\n# Run all tests\npnpm test\n\n# Test specific parser\npnpm test parse-jsonld\n\n# Test with coverage\npnpm test --coverage\n```\n\n### Test Data\nSample fixtures are included in `__tests__/fixtures/` for offline testing.\n\n## 📈 Monitoring\n\n### Health Endpoints\n- `/api/health`: Basic service status\n- `/api/health/db`: Database connectivity\n- `/api/health/sources`: Last scraping status\n\n### Logging\nStructured console logging with error tracking. Production logs available in Netlify functions dashboard.\n\n## 🛡️ Security \u0026 Compliance\n\n- **Robots.txt**: Automatically checked before crawling\n- **Rate Limiting**: Configurable delays between requests\n- **User-Agent**: Identifies as \"ShakesFindBot/0.1\"\n- **Attribution**: Links back to original box office pages\n- **Opt-out**: Easy disable mechanism for theater operators\n\n## 🚨 Troubleshooting\n\n### Common Issues\n\n**Database Connection**\n```bash\n# Test connection\npnpm prisma db push\n```\n\n**Scraping Failures**\n```bash\n# Check individual adapter\nnode -e \"console.log(require('./lib/scraping/adapters/asf.net.js').scrapeASF())\"\n```\n\n**Build Errors**\n```bash\n# Clear cache\nrm -rf .next node_modules\npnpm install\n```\n\n### Debug Mode\nSet `NODE_ENV=development` for verbose logging.\n\n## 📄 License\n\nMIT License - see LICENSE file for details.\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make changes with tests\n4. Submit a pull request\n\nFor major changes, please open an issue first to discuss.\n\n## 📞 Support\n\n- **Issues**: GitHub Issues\n- **Email**: admin@shakesfind.com\n- **Docs**: This README + inline code comments\n\n---\n\n**Built with ❤️ for the Shakespeare theater community**\n\n## 🗂 Route Structure \u0026 Layout Strategy\n\nThis project uses Next.js App Router route groups (folder names in parentheses) to separate concerns without affecting URLs:\n\n```\napp/\n  (site)/               # Public site pages (these map directly to /companies, /plays, /productions, etc.)\n    layout.tsx          # Public shell: Header/Footer, metadata, skip link\n    companies/\n    plays/\n    productions/\n  (admin)/              # Admin area grouping (auth‑protected)\n    layout.tsx          # Auth gate + minimal wrapper\n    admin/              # Actual URL segment /admin/*\n      layout.tsx        # Admin shell (nav, spacing)\n      page.tsx          # /admin (review queue)\n      sources/page.tsx  # /admin/sources (source status)\n  admin/ (legacy)       # Older admin implementation (can be consolidated later)\n```\n\nWhy route groups:\n- Provide separate layout shells (public vs admin) without nesting extra path segments.\n- Allow future groups (e.g., `(auth)` for login flows or `(marketing)` for landing pages) while keeping clean URLs.\n- Enable incremental refactors (you can stage a new group alongside an old one, then retire the old code).\n\nPublic Layout Enhancements:\n- Exports `metadata` for SEO \u0026 social cards.\n- Adds an accessible skip link and proper `\u003cmain id=\"site-main\"\u003e` landmark.\n\nAdmin Group Scaffold:\n- `(admin)/layout.tsx` performs role check (`ADMIN`) and redirects unauthenticated users.\n- Inner `admin/layout.tsx` provides navigation and page chrome.\n- Pages under `admin/` fetch data server-side (e.g., review queue, sources).\n\nRefactor Notes:\n- Legacy duplicate pages at `app/companies`, `app/plays`, `app/productions` were removed in favor of `(site)` versions.\n- When confident, you can migrate any remaining older admin pages into the grouped structure and delete the legacy `app/admin` folder.\n\nTo add a new public page:\n1. Create `app/(site)/\u003csegment\u003e/page.tsx`.\n2. The URL will be `/\u003csegment\u003e` automatically.\n3. Shared Header/Footer applied via `(site)/layout.tsx`.\n\nTo add a new admin tool page:\n1. Add file under `app/(admin)/admin/\u003ctool\u003e/page.tsx`.\n2. It becomes available at `/admin/\u003ctool\u003e` with auth + admin shell.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobleto%2Fshakesfind","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobleto%2Fshakesfind","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobleto%2Fshakesfind/lists"}