{"id":50071872,"url":"https://github.com/elkysofficial/sonnar_scraping","last_synced_at":"2026-06-10T03:01:05.535Z","repository":{"id":190909928,"uuid":"683572709","full_name":"ElkysOfficial/Sonnar_Scraping","owner":"ElkysOfficial","description":"Bot que busca vagas de emprego em Python com web scraping.","archived":false,"fork":false,"pushed_at":"2026-06-08T15:15:17.000Z","size":24400,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-06-08T15:25:22.328Z","etag":null,"topics":["automacao","automation","bot","busca-de-emprego","empregos","job-search","job-sites","jobs","python","sites-de-emprego","vagas-de-emprego","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ElkysOfficial.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"Roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-08-27T02:24:11.000Z","updated_at":"2026-06-08T15:15:46.000Z","dependencies_parsed_at":"2023-08-27T03:48:13.545Z","dependency_job_id":"3f0f1cd6-bded-43d7-951b-96ea1877b29d","html_url":"https://github.com/ElkysOfficial/Sonnar_Scraping","commit_stats":null,"previous_names":["lucelhosilva/bot_discord","lucelhosilva/bot-search-job","elkysofficial/sonnar_scraping"],"tags_count":131,"template":false,"template_full_name":null,"purl":"pkg:github/ElkysOfficial/Sonnar_Scraping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElkysOfficial%2FSonnar_Scraping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElkysOfficial%2FSonnar_Scraping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElkysOfficial%2FSonnar_Scraping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElkysOfficial%2FSonnar_Scraping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ElkysOfficial","download_url":"https://codeload.github.com/ElkysOfficial/Sonnar_Scraping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElkysOfficial%2FSonnar_Scraping/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34134633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automacao","automation","bot","busca-de-emprego","empregos","job-search","job-sites","jobs","python","sites-de-emprego","vagas-de-emprego","web-scraping"],"created_at":"2026-05-22T03:14:40.578Z","updated_at":"2026-06-10T03:01:05.489Z","avatar_url":"https://github.com/ElkysOfficial.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sonnar Scraping\n\nMonorepo do Sonnar — agregador de vagas de tecnologia que coleta, normaliza, persiste e distribui ofertas para **Discord**, **WhatsApp** e o **frontend web público + dashboard**.\n\n## Arquitetura em uma figura\n\n```\n                      ┌───────────────────────────────────┐\n                      │ apps/scraper (Python)             │\n                      │ engines → JobsRepository → 3 sinks│\n                      └──────────┬─────────┬──────────────┘\n                                 │         │\n                  jobs.json (local)        Supabase (public.jobs)\n                                 │         │\n              ┌──────────────────┴──┐      └───────────┐\n              ▼                                        ▼\n   packages/message-formatting-core            apps/web (Vue 3 + Vite)\n   (HTTP API, porta 3100)                      landing + dashboard + admin\n        │\n        ├─ apps/discord/sender ◀── apps/discord/formatter\n        └─ apps/whatsapp/sender ◀── apps/whatsapp/formatter\n```\n\n- **Scraper** escreve em três sinks independentes: `jobs.json` (local, dict por URL), `job.csv` (append-only) e `public.jobs` no Supabase.\n- **Bots** consomem **só** o `jobs.json` via API HTTP do `message-formatting-core` (porta 3100). Não tocam Supabase para vagas.\n- **Frontend web** lê agregados do Supabase via RPCs com `SECURITY DEFINER`.\n\nDecisões arquiteturais relevantes:\n- [ADR-004 — Reestruturação para monorepo](docs/vault/12-decisions/ADR-004-monorepo-restructure.md)\n- [ADR-005 — Core via jobs.json](docs/vault/12-decisions/ADR-005-message-formatting-core-jobs-json.md)\n\n## Layout do repositório\n\n```\nsonnar-scraping/\n├── apps/                            Aplicações executáveis\n│   ├── scraper/                     Pipeline Python de coleta + persistência\n│   ├── discord/\n│   │   ├── sender/                  Bot Discord (envio)\n│   │   └── formatter/               API Express de formatação Discord\n│   ├── whatsapp/\n│   │   ├── sender/                  Bot WhatsApp (envio, Baileys)\n│   │   └── formatter/               Gerador de cards (Canvas) + API\n│   └── web/                         Frontend Vue 3 + Vite (Sonnar Jobs)\n│\n├── packages/\n│   └── message-formatting-core/     API HTTP central — porta 3100\n│                                    (intermedia bots ↔ jobs.json)\n│\n├── supabase/                        Source-of-truth do schema\n│   ├── config.toml\n│   ├── functions/                   Edge functions (Stripe, OTP, admin)\n│   ├── migrations/                  Migrations canônicas (timestamp)\n│   └── _legacy_migrations/          Histórico — não aplicadas\n│\n├── docs/\n│   ├── vault/                       Vault Obsidian canônico (second brain)\n│   └── _archive/                    Vaults antigos a consolidar\n│\n├── scripts/\n│   └── db_legacy/                   Helpers antigos de DB (referência)\n│\n├── .github/workflows/               CI/CD\n│   ├── branch-name.yml              Valida nome de branch (git-flow)\n│   ├── web-ci.yml                   Lint + build em PRs do web\n│   ├── web-deploy.yml               Deploy FTP → Hostinger em push main\n│   ├── web-bundle-analysis.yml      Métrica de bundle\n│   └── web-security.yml             npm audit semanal\n│\n├── .githooks/\n├── README.md\n├── Roadmap.md\n└── LICENSE\n```\n\n## Aplicações\n\n| Caminho                      | Stack                | Porta | Função                                                      |\n| ---------------------------- | -------------------- | ----- | ----------------------------------------------------------- |\n| `apps/scraper`               | Python 3.13          | —     | Coleta vagas de N engines, normaliza, escreve 3 sinks       |\n| `apps/web`                   | Vue 3 + Vite + Antd  | 5173  | Frontend público (sonnarjobs.com.br) + dashboard + admin    |\n| `apps/discord/sender`        | Node + TypeScript    | —     | Bot do Discord — `client.login` + envia embeds              |\n| `apps/discord/formatter`     | Node + TypeScript    | —     | API Express de formatação Discord (chama o core)            |\n| `apps/whatsapp/sender`       | Node (Baileys)       | —     | Bot WhatsApp — envia cards, gerencia VIP/grupos             |\n| `apps/whatsapp/formatter`    | Node + Canvas        | 3001  | Gera cards 1080×1080 e prepara payload do WhatsApp          |\n| `packages/message-formatting-core` | Node + Express | 3100  | API HTTP de vagas (fonte: `apps/scraper/src/data/jobs.json`) |\n\n## Como rodar localmente\n\n### Pré-requisitos\n- Node 20+\n- Python 3.13+ (apenas para o scraper)\n- Acesso ao Supabase para o `apps/web` e features VIP do `whatsapp/sender`\n\n### Pipeline mínimo para testar bots (sem Supabase)\n\n```powershell\n# 1) Scraper gera apps/scraper/src/data/jobs.json\ncd apps/scraper\npython -m pip install -r requirements.txt\npython scrapy.py   # ou rode 1 engine específica\n\n# 2) Core serve jobs.json em HTTP (porta 3100)\ncd ../../packages/message-formatting-core\nnpm install\nnpm start\n\n# 3) WhatsApp formatter gera cards (porta 3001)\ncd ../../apps/whatsapp/formatter\nnpm install\nnpm start\n\n# 4) WhatsApp sender ou Discord sender\ncd ../sender   # ou apps/discord/sender\nnpm install\nnpm start\n```\n\nSem rodar o scraper, você pode escrever um `apps/scraper/src/data/jobs.json` manualmente (dict por URL com `sent_to: []`) e o core servirá normalmente.\n\n### Frontend web\n\n```powershell\ncd apps/web\nnpm install\nnpm run dev\n```\n\nVariáveis de ambiente em `.env` na raiz de `apps/web/` (ver `.env.example`).\n\n## Banco de dados\n\nSource-of-truth do schema vive em `supabase/`. Migrations canônicas em `supabase/migrations/` (formato `YYYYMMDDHHMMSS_descricao.sql`).\n\nAplicar localmente:\n```powershell\ncd supabase\nsupabase db reset  # roda todas as migrations\n```\n\nMigrations em formatos antigos ficam em `supabase/_legacy_migrations/`, separadas por origem (`from_bot_database_root`, `from_bot_database_supabase`). **Não são aplicadas** — só referência histórica.\n\n## CI/CD\n\n| Workflow              | Trigger                                  | O que faz                                         |\n| --------------------- | ---------------------------------------- | ------------------------------------------------- |\n| `branch-name`         | PR (qualquer)                            | Valida padrão git-flow do nome da branch          |\n| `web-ci`              | PR ou push main com mudanças em `apps/web/**` | Lint + build do frontend                     |\n| `web-deploy`          | push main com mudanças em `apps/web/**`  | Build → FTP Hostinger → smoke check → Discord     |\n| `web-bundle-analysis` | PR ou push main com mudanças em `apps/web/**` | Métricas de bundle (raw + gzip) + artifact   |\n| `web-security`        | PR/push/cron semanal (segunda 9h UTC)    | `npm audit --audit-level=high`                    |\n\n**Secrets exigidos pelo `web-deploy`** (em Settings \u003e Secrets and variables \u003e Actions):\n- `FTP_SERVER`, `FTP_USERNAME`, `FTP_PASSWORD` (Hostinger)\n- `VITE_INVERTEXTO_TOKEN` (API de telefone)\n- `DISCORD_WEBHOOK` (notificação de deploy)\n\n## Documentação\n\nVault Obsidian em [`docs/vault/`](docs/vault/) — second brain operacional. Pontos de entrada:\n\n- [`00-index/brain.md`](docs/vault/00-index/brain.md) — MOC central\n- [`01-architecture/`](docs/vault/01-architecture/) — visão de sistema\n- [`12-decisions/`](docs/vault/12-decisions/) — ADRs (5 hoje)\n- [`13-issues/`](docs/vault/13-issues/) — débito técnico catalogado\n\n## Roadmap e histórico\n\n- [Roadmap.md](Roadmap.md) — releases e visão de longo prazo.\n- [CHANGELOG.md](CHANGELOG.md) — histórico detalhado de mudanças por versão.\n- [`docs/vault/14-roadmap/`](docs/vault/14-roadmap/) — roadmap operacional no vault.\n\n## Licença\n\nVer [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felkysofficial%2Fsonnar_scraping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felkysofficial%2Fsonnar_scraping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felkysofficial%2Fsonnar_scraping/lists"}