{"id":24131541,"url":"https://github.com/hrbrmstr/skygrep","last_synced_at":"2026-04-13T00:13:55.189Z","repository":{"id":271955863,"uuid":"911648336","full_name":"hrbrmstr/skygrep","owner":"hrbrmstr","description":"A real-time Bluesky Jetstream firehose consumer that filters and forwards posts to Kafka topics based on configurable rules.","archived":false,"fork":false,"pushed_at":"2025-01-03T14:13:41.000Z","size":19,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-22T21:32:25.700Z","etag":null,"topics":["bluesky","bluesky-firehose","bluesky-jetstream","deno","docker","docker-compose","kafka","redpanda"],"latest_commit_sha":null,"homepage":"https://codeberg.org/hrbrmstr/skygrep","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hrbrmstr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-03T14:12:44.000Z","updated_at":"2025-01-04T22:16:46.000Z","dependencies_parsed_at":"2025-01-17T19:03:30.718Z","dependency_job_id":null,"html_url":"https://github.com/hrbrmstr/skygrep","commit_stats":null,"previous_names":["hrbrmstr/skygrep"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fskygrep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fskygrep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fskygrep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fskygrep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hrbrmstr","download_url":"https://codeload.github.com/hrbrmstr/skygrep/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241329420,"owners_count":19944985,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bluesky","bluesky-firehose","bluesky-jetstream","deno","docker","docker-compose","kafka","redpanda"],"created_at":"2025-01-11T21:17:57.895Z","updated_at":"2026-04-13T00:13:55.181Z","avatar_url":"https://github.com/hrbrmstr.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Skygrep\n\nA real-time Bluesky Jetstream firehose consumer that filters and forwards posts to Kafka topics based on configurable rules.\n\n## 🌟 Features\n\n- Connect to Bluesky's firehose via Jetstream\n- Filter posts using configurable regex patterns\n- Forward matched posts to Kafka topics\n- Prometheus-compatible metrics endpoint\n- Health monitoring endpoint\n- Configurable historical backfill\n- Docker-based development environment\n\n## 🛠️ Prerequisites\n\n- [Deno](https://deno.land/) runtime\n- [Redpanda](https://github.com/redpanda-data/redpanda/) (recomended vs.Kafka-proper)\n- [Docker](https://www.docker.com/) and Docker Compose\n- [just](https://github.com/casey/just) command runner\n\n## 🏃🏼‍♀️ Quickstart\n\nEdit `docker-config.json` to setup `rules` for what you want to monitor then run:\n\n```bash\ndocker compose up --build -d\n```\n\nor\n\n```bash\njust start\n```\n\nand Docker Compose will do everything for you.\n\nIt's pretty efficient, resource-wise:\n\n```bash\n$ docker stats\nCONTAINER ID   NAME               CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS\n24aa1f3f6169   skygrep            33.67%    59.73MiB / 15.66GiB   0.37%     1.71GB / 23.5MB   169MB / 3.46MB   5\n962906967b45   redpanda-console   0.01%     24.26MiB / 15.66GiB   0.15%     153kB / 269kB     167MB / 8.19kB   10\n3d5bbc67bf81   redpanda-0         1.03%     1.824GiB / 15.66GiB   11.65%    2.85MB / 397kB    255MB / 7.21MB   3\n```\n\n## 🚀 Getting Started\n\n1. Clone the repository:\n```bash\ngit clone https://codeberg.org/hrbrmstr/skygrep.git\ncd skygrep\n```\n\n2. Create a `config.json` file:\n```json\n{\n  \"jetstream\": {\n    \"endpoint\": \"wss://jetstream2.us-east.bsky.network/subscribe\"\n  },\n  \"kafka\": {\n    \"brokers\": [\"localhost:19092\"]\n  },\n  \"rules\": [\n    {\n      \"type\": \"collection\",\n      \"collections\": [\n        \"sh.tangled.repo\",\n        \"sh.tangled.feed.star\",\n        \"sh.tangled.graph.follow\",\n        \"sh.tangled.publicKey\",\n        \"sh.tangled.repo.issue.comment\"\n      ],\n      \"kafkaTopic\": \"tangled\"\n    },\n    {\n      \"field\": \"text\",\n      \"pattern\": \"(?i)(bitcoin|crypto|eth|nft)\",\n      \"kafkaTopic\": \"crypto_posts\"\n    },\n    {\n      \"field\": \"text\",\n      \"pattern\": \"(?i)CVE-\\\\d{4}-\\\\d{4,}\",\n      \"kafkaTopic\": \"cve_mentions\"\n    }\n  ]\n}\n```\n\n3. Start the development environment:\n```bash\njust dev\n```\n\n## 🔧 Available Commands\n\n- `just build` — build cli\n- `just clean` — clean up docker resources — this also deletes the volume\n- `just default` — show tasks\n- `just dev` — dev mode\n- `just health-check` — monitor the health of Skygrep\n- `just reset` — rebuild and run fresh instance — this also deletes the volume\n- `just start` — start services\n- `just stop` — stop docker w/o deleting the volume\n- `just watch-metrics` — watch metrics with live updates every 5 seconds\n\n## 📊 Monitoring\n\n### Metrics Endpoint\nAccess metrics at `http://localhost:3030/metrics`\n\nExample response:\n```json\n{\n  \"crypto_posts\": 42,\n  \"cve_mentions\": 7\n}\n```\n\n### Health Endpoint\nAccess health status at `http://localhost:3030/health`\n\nExample response:\n```json\n{\n  \"status\": \"healthy\",\n  \"uptime_ms\": 20918,\n  \"last_event_ms_ago\": 0\n}\n```\n\n## 🏗️ Architecture\n\nThe application consists of several key components:\n\n1. **Jetstream Client**: Connects to Bluesky's firehose and receives real-time posts\n2. **Kafka Producer**: Forwards matched posts to configured Kafka topics\n3. **Rule Engine**: Applies regex patterns to filter relevant posts or captures events for one or more collections\n4. **Metrics Server**: Exposes operational metrics and health status\n\n## 🐳 Docker Services\n\n- **Redpanda**: Kafka-compatible event streaming platform\n  - Kafka API: localhost:19092\n  - Schema Registry: localhost:18081\n  - Admin API: localhost:19644\n- **Redpanda Console**: Web UI for managing Kafka\n  - Interface: http://localhost:9080\n- **Skygrep**:\n  - Health: http://localhost:3030/health\n  - Metrics: http://localhost:3030/metrics\n\n## 📝 Configuration Options\n\nCommand line flags:\n- `--hours`: Number of hours to look back in history (default: 24)\n- `--port`: HTTP server port (default: 3030)\n- `--help`: Show help message\n\n## 🚨 Monitoring and Maintenance\n\nThe application provides:\n- Real-time metrics for rule matches\n- Health status monitoring\n- Graceful shutdown on SIGINT/SIGTERM\n- Connection status logging\n\n## 📄 License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fskygrep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhrbrmstr%2Fskygrep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fskygrep/lists"}