{"id":31745390,"url":"https://github.com/shreshthmohan/av-cf-scraper","last_synced_at":"2025-10-09T12:55:07.416Z","repository":{"id":313823506,"uuid":"1052540458","full_name":"shreshthmohan/av-cf-scraper","owner":"shreshthmohan","description":"A scraper based on two cloudflare workers for getting the lastest Aranya Vihara Permit status data ","archived":false,"fork":false,"pushed_at":"2025-09-08T09:32:20.000Z","size":48,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-22T12:03:55.267Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shreshthmohan.png","metadata":{"files":{"readme":"README/local-testing-guide.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-08T07:43:13.000Z","updated_at":"2025-09-08T09:32:23.000Z","dependencies_parsed_at":"2025-09-08T21:04:11.699Z","dependency_job_id":"dc25cb33-19ee-45f5-99fa-76c504640998","html_url":"https://github.com/shreshthmohan/av-cf-scraper","commit_stats":null,"previous_names":["shreshthmohan/av-cf-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shreshthmohan/av-cf-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreshthmohan%2Fav-cf-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreshthmohan%2Fav-cf-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreshthmohan%2Fav-cf-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreshthmohan%2Fav-cf-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shreshthmohan","download_url":"https://codeload.github.com/shreshthmohan/av-cf-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shreshthmohan%2Fav-cf-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001416,"owners_count":26083078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-09T12:55:06.467Z","updated_at":"2025-10-09T12:55:07.411Z","avatar_url":"https://github.com/shreshthmohan.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Local Testing Guide for Two-Worker Architecture\n\nThis document provides instructions for testing the two-worker architecture locally using Wrangler's development mode before deploying to production.\n\n## Architecture Overview\n\nThe two-worker system uses **RPC (Remote Procedure Call)** via Cloudflare service bindings for inter-worker communication:\n\n- **Discovery Worker**: Calls `availabilityBinding.processAvailability(requestData)`\n- **Availability Worker**: Exports anonymous class extending `WorkerEntrypoint` as default\n- **Communication**: Direct method calls (no HTTP requests in production)\n\nLocal testing uses RPC service bindings, which work seamlessly in wrangler dev mode.\n\n## Prerequisites\n\n- [Wrangler CLI](https://developers.cloudflare.com/workers/wrangler/install-and-update/) installed\n- Node.js and npm/pnpm installed\n- Local environment variables configured\n- Supabase database accessible from local environment\n\n## Local Environment Setup\n\n### 1. Install Dependencies\n\n```bash\n# Install project dependencies\nnpm install\n# or if using pnpm\npnpm install\n```\n\n### 2. Create Local Environment Files\n\n#### Create `.dev.vars` for Discovery Worker\n\n```bash\n# Create environment file for discovery worker\ncat \u003e .dev.vars.discovery \u003c\u003c EOF\nSUPABASE_URL=your_supabase_url_here\nSUPABASE_SERVICE_ROLE_KEY=your_supabase_service_key_here\nAVAILABILITY_DAYS=5\nDEBUG=true\nVERBOSE=true\nMAX_CONCURRENT_REQUESTS=3\nRETRY_ATTEMPTS=2\nEOF\n```\n\n#### Create `.dev.vars` for Availability Worker\n\n```bash\n# Create environment file for availability worker\ncat \u003e .dev.vars.availability \u003c\u003c EOF\nSUPABASE_URL=your_supabase_url_here\nSUPABASE_SERVICE_ROLE_KEY=your_supabase_service_key_here\nAVAILABILITY_DAYS=5\nDEBUG=true\nVERBOSE=true\nEOF\n```\n\n**Note**: Replace the placeholder values with your actual Supabase credentials.\n\n### 3. Verify Environment Files\n\n```bash\n# Check that environment files exist and have correct content\nls -la .dev.vars.*\nhead -n 3 .dev.vars.discovery\nhead -n 3 .dev.vars.availability\n```\n\n## Local Development Testing\n\n### RPC Service Bindings Testing\n\nService bindings work seamlessly in local development with `wrangler dev`. Both workers run locally and communicate via RPC.\n\n#### Step 1: Start Availability Worker First\n\n```bash\n# Terminal 1 - Start availability worker (must start first for binding to work)\nwrangler dev --config wrangler-availability.toml --port 8787 --env-file .dev.vars.availability\n```\n\n#### Step 2: Start Discovery Worker\n\n```bash\n# Terminal 2 - Start discovery worker (automatically binds to availability worker)\nwrangler dev --config wrangler-discovery.toml --port 8788 --env-file .dev.vars.discovery\n```\n\n**Note**: You should see binding confirmation in the terminal output when the discovery worker starts.\n\n#### Step 3: Test Individual Workers\n\n```bash\n# Terminal 3 - Test availability worker directly\ncurl -X GET \"http://localhost:8787/health\" | jq\n\n# Test availability worker with sample data\ncurl -X POST \"http://localhost:8787/fetch-availability\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"trek_id\": \"85\",\n    \"trek_name\": \"Test Trek Local\",\n    \"district_id\": \"17\",\n    \"availability_days\": 5,\n    \"start_date\": \"'$(date -I)'\"\n  }' | jq\n```\n\n#### Step 4: Test RPC Communication Between Workers\n\n```bash\n# Test discovery worker health\ncurl -X GET \"http://localhost:8788/health\" | jq\ncurl -X GET \"http://localhost:8788/status\" | jq\n\n# Test discovery with RPC calls to availability worker\ncurl -X POST \"http://localhost:8788/discover-only\" | jq\n```\n\n**What to Look For:**\n\n- Discovery worker logs: `\"📤 RPC: Sending availability request\"`\n- Availability worker logs: `\"🔗 RPC: Processing trek via service binding\"`\n- No HTTP requests between workers - everything goes through RPC\n\n## Local Testing Scenarios\n\n### Scenario 1: Basic Health Checks\n\n```bash\n# Test both workers are running\ncurl -X GET \"http://localhost:8787/health\"  # Availability worker\ncurl -X GET \"http://localhost:8788/health\"  # Discovery worker\ncurl -X GET \"http://localhost:8788/status\"  # Discovery worker status\n```\n\n### Scenario 2: Single Trek Processing\n\n```bash\n# Process a single trek for 3 days\ncurl -X POST \"http://localhost:8787/fetch-availability\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"trek_id\": \"17-85\",\n    \"trek_name\": \"Local Test Trek\",\n    \"district_id\": \"17\",\n    \"availability_days\": 3\n  }' | jq\n```\n\n### Scenario 3: Discovery Only (Safe Testing)\n\n```bash\n# Run discovery without triggering availability collection\ncurl -X POST \"http://localhost:8788/discover-only\" | jq\n```\n\n### Scenario 4: Limited Discovery with Orchestration\n\n```bash\n# This will process ALL treks in your database - use carefully!\ncurl -X POST \"http://localhost:8788/discover\"\n```\n\n### Scenario 5: Test Full RPC Workflow\n\n```bash\n# Test that RPC calls are working properly\ncurl -X POST \"http://localhost:8788/discover-only\"\n# Check logs for \"🔗 RPC: Processing trek\" messages indicating RPC calls\n\n# Monitor both worker logs to see RPC communication:\n# Discovery worker: \"📤 RPC: Sending availability request\"\n# Availability worker: \"🔗 RPC: Processing trek via service binding\"\n```\n\n## Local Testing Best Practices\n\n### 1. Use Small Data Sets\n\n- Test with a small number of treks initially\n- Use `AVAILABILITY_DAYS=2` or `3` for faster testing\n- Set `MAX_CONCURRENT_REQUESTS=2` to avoid overwhelming local setup\n\n### 2. Monitor Logs\n\n```bash\n# Watch logs in real-time\n# Terminal 1: Availability worker logs\n# Terminal 2: Discovery worker logs\n# Terminal 3: Your test commands\n```\n\n### 3. Database Verification\n\n```sql\n-- Check local test data in Supabase\nSELECT sr.id, sr.status, sr.started_at, sr.completed_at\nFROM av_scrape_runs sr\nWHERE sr.started_by LIKE '%local%'\nORDER BY sr.started_at DESC\nLIMIT 5;\n\n-- Check availability data from local tests\nSELECT COUNT(*) as records, DATE(scraped_at) as test_date\nFROM av_trek_availability\nWHERE scrape_run_id LIKE '%local%'\nGROUP BY DATE(scraped_at);\n```\n\n### 4. Error Testing\n\n```bash\n# Test invalid request\ncurl -X POST \"http://localhost:8787/fetch-availability\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"invalid\": \"data\"}'\n\n# Test missing worker (stop availability worker and test discovery)\n```\n\n## Cleanup After Local Testing\n\n### 1. Stop All Workers\n\n```bash\n# Stop wrangler dev processes (Ctrl+C in each terminal)\n```\n\n### 2. Clean Up Test Data (Optional)\n\n```sql\n-- Remove local test data from database\nDELETE FROM av_trek_availability\nWHERE scrape_run_id LIKE '%local%' OR scrape_run_id LIKE '%test%';\n\nDELETE FROM av_scrape_runs\nWHERE id LIKE '%local%' OR id LIKE '%test%';\n```\n\n### 3. Remove Local Files\n\n```bash\n# Remove local testing files\nrm -f .dev.vars.*\n```\n\n## Troubleshooting Local Development\n\n### Common Issues:\n\n1. **\"Port already in use\"**\n\n   ```bash\n   # Kill processes on specific ports\n   lsof -ti:8787 | xargs kill -9\n   lsof -ti:8788 | xargs kill -9\n   ```\n\n2. **\"Database connection failed\"**\n\n   - Check `.dev.vars` files have correct Supabase credentials\n   - Verify your network can reach Supabase\n   - Test database connection separately\n\n3. **\"Service binding not found\"**\n\n   - Ensure availability worker is started first and is running\n   - Verify the service binding configuration in wrangler-discovery.toml (no entrypoint needed for default export)\n   - Verify anonymous WorkerEntrypoint class is properly exported as default\n   - Look for binding confirmation messages in discovery worker startup logs\n\n4. **\"Module not found\" errors**\n\n   ```bash\n   # Reinstall dependencies\n   rm -rf node_modules package-lock.json\n   npm install\n   ```\n\n5. **High resource usage**\n\n   - Reduce `AVAILABILITY_DAYS` to 1-2\n   - Set `MAX_CONCURRENT_REQUESTS=1`\n   - Test with fewer treks\n\n6. **RPC method not found**\n\n   - Verify anonymous class is exported with `export default class extends WorkerEntrypoint`\n   - Check method names match: `processAvailability()` and `getHealth()`\n   - Ensure `import { WorkerEntrypoint } from \"cloudflare:workers\"` is present\n\n7. **\"Database save failed: invalid input syntax for type uuid\"**\n\n   - Use proper UUID format for `scrape_run_id`: `$(uuidgen | tr '[:upper:]' '[:lower:]')`\n   - Avoid simple strings like `\"local-test-123\"` - database expects UUID format\n   - Check that database schema expects UUID type for scrape_run_id field\n\n8. **\"Foreign key constraint violation\" (trek_id_fkey)**\n   - The `trek_id` you're testing with doesn't exist in the database\n   - Either create test trek data or use a real trek_id from your database\n   - The worker will automatically create the scrape_run_id in the database\n\n## Local vs. Production Differences\n\n| Aspect         | Local Development                     | Production                    |\n| -------------- | ------------------------------------- | ----------------------------- |\n| Communication  | RPC via service bindings              | RPC via service bindings      |\n| Concurrency    | Limited (2-3)                         | Higher (10+)                  |\n| Data Volume    | Small test sets                       | Full dataset                  |\n| Error Handling | More verbose logging                  | Production logging            |\n| Performance    | Slower (development)                  | Optimized                     |\n| RPC Classes    | WorkerEntrypoint classes work locally | Same WorkerEntrypoint classes |\n\n## Next Steps\n\nAfter successful local testing:\n\n1. Commit your changes (excluding local test files)\n2. Follow the main [testing-guide.md](./testing-guide.md) for deployment\n3. Deploy to staging environment first\n4. Monitor production deployment\n\nRemember: Local testing is great for development and debugging, but always test in a staging environment before production deployment!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshreshthmohan%2Fav-cf-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshreshthmohan%2Fav-cf-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshreshthmohan%2Fav-cf-scraper/lists"}