{"id":16296254,"url":"https://github.com/jannchie/gazer-system","last_synced_at":"2025-03-20T04:31:19.477Z","repository":{"id":43670839,"uuid":"415544149","full_name":"Jannchie/gazer-system","owner":"Jannchie","description":"Gazer system is used to track data.","archived":false,"fork":false,"pushed_at":"2022-12-27T09:49:44.000Z","size":110,"stargazers_count":12,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-17T14:49:05.479Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jannchie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-10T09:47:31.000Z","updated_at":"2023-10-23T22:01:59.000Z","dependencies_parsed_at":"2023-01-31T03:00:38.128Z","dependency_job_id":null,"html_url":"https://github.com/Jannchie/gazer-system","commit_stats":null,"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jannchie%2Fgazer-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jannchie%2Fgazer-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jannchie%2Fgazer-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jannchie%2Fgazer-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jannchie","download_url":"https://codeload.github.com/Jannchie/gazer-system/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244552386,"owners_count":20471067,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-10T20:21:51.484Z","updated_at":"2025-03-20T04:31:19.047Z","avatar_url":"https://github.com/Jannchie.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gazer System - 凝视系统\n\n这是一个用于长期追踪数据变化的工具。\n\n我们常需要定时针对某些链接进行爬取，以获取历史数据，这个系统为此而生。这是一组无状态的服务，意味着它可以简单地水平扩展。它提供了如下几个功能：\n\n1. 自动代理：使用洋葱路由进行代理，无需自行维护 IP 池，也可以绕过大多数基于 IP 地址的反爬虫措施。\n2. 计划爬取：可以定义爬取计划，以一定间隔进行接口的访问。\n3. 并发爬取：使用 Golang 开发，支持高速并发爬取，IO 才是唯一瓶颈。\n4. 数据存储：支持 Sqlite 数据存储，无需配置数据库，开箱即用。\n\n## 架构\n\n整个系统采用 CS 架构，服务器能够并发消费爬取任务，并提供原始数据。客户端可以批量拉取原始数据进行进一步解析。\n整个过程提供类似消息队列的确认机制和重试机制，防止爬取任务重复或者丢失。\n\n## 服务端部署\n\n建议使用 Docker Compose 部署。否则需要自行配置代理服务器并进行构建（目前没有提供构建步骤）。\n\n使用 Docker Compose 部署，需要提供的 yaml 示例文件如下：\n\n```yaml\nversion: \"3\"\nvolumes:\n  db:\nservices:\n  server:\n    image: jannchie/gazer-system-server\n    ports:\n      - \"2000:2000\"\n    volumes:\n      - db:/data\n    environment:\n      TOR: \"proxy:9050\"\n      TOR_CTL: \"proxy:9051\"\n      PORT: 2000\n      DSN: \"/data/gazer-system.db\"\n      TOR_PASSWORD: ${TOR_PASSWORD}\n  proxy:\n    image: dperson/torproxy\n    expose:\n      - 9050\n      - 9051\n    environment:\n      PASSWORD: ${TOR_PASSWORD}\n```\n\n使用如下命令即可启动服务器：\n\n```bash\ndocker compose pull\ndocker compose up\n```\n\n## 客户端\n\n目前只开发了 Golang 的客户端。\n\n示例程序如下：\n\n``` golang\nfunc main() {\n\tctx := context.Background()\n\n\t// 定义一个 Worker Group，连接服务器\n\twg := gs.NewWorkerGroup([]string{\"localhost:2000\"})\n\n\t// 定义一个既能解析，又能发起爬取任务的 Worker\n\tbwu := gs.NewBothWorker(wg.Client, \"test\", func(tasks chan\u003c- *api.Task) {\n\t\ttasks \u003c- \u0026api.Task{\n\t\t\tUrl:        \"https://pv.sohu.com/cityjson\",\n\t\t\tTag:        \"test\",\n\t\t\tIntervalMS: 0, // 表示只爬一次\n\t\t}\n\t}, func(r *api.Raw, c *gs.Client) error {\n\t\tlog.Printf(\"%+v\\n\", r)\n\t\treturn nil\n\t})\n\n\t// 添加这个 Worker\n\twg.AddByWorkUnit(bwu)\n\t// 跑\n\twg.Run(ctx)\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjannchie%2Fgazer-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjannchie%2Fgazer-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjannchie%2Fgazer-system/lists"}