{"id":26600762,"url":"https://github.com/suconghou/sitemap","last_synced_at":"2026-05-15T17:33:12.500Z","repository":{"id":258770311,"uuid":"873687478","full_name":"suconghou/sitemap","owner":"suconghou","description":"a simple sitemap generator and page crawler","archived":false,"fork":false,"pushed_at":"2025-11-18T09:39:24.000Z","size":29,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-18T11:24:22.893Z","etag":null,"topics":["crawler","html-parser","nim-lang","scraper","sitemap","spiders"],"latest_commit_sha":null,"homepage":"","language":"Nim","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/suconghou.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-10-16T14:51:59.000Z","updated_at":"2025-11-18T09:38:54.000Z","dependencies_parsed_at":"2024-10-28T03:28:59.624Z","dependency_job_id":"75622415-e136-44e2-8def-ca95a2c9ec78","html_url":"https://github.com/suconghou/sitemap","commit_stats":null,"previous_names":["suconghou/sitemap"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/suconghou/sitemap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suconghou%2Fsitemap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suconghou%2Fsitemap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suconghou%2Fsitemap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suconghou%2Fsitemap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/suconghou","download_url":"https://codeload.github.com/suconghou/sitemap/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suconghou%2Fsitemap/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33073348,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-15T11:35:32.926Z","status":"ssl_error","status_checked_at":"2026-05-15T11:35:31.362Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","html-parser","nim-lang","scraper","sitemap","spiders"],"created_at":"2025-03-23T18:34:54.250Z","updated_at":"2026-05-15T17:33:12.494Z","avatar_url":"https://github.com/suconghou.png","language":"Nim","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\na simple sitemap generator and page crawler\n\n\n## 参数\n\n`./main options args`\n\n**options**\n\n\u003e ua/u : 配置请求时的User-Agent\n\u003e\n\u003e refer/r : 配置请求时的Referer,不指定时则为域名根目录\n\u003e\n\u003e timeout/t : 超时时间,必须是数字,默认 8000ms,\n\u003e\n\u003e file/f : 存储的sitemap文件,默认`sitemap.xml`\n\u003e\n\u003e host/h : 主域名,不指定时自动从入口页面提取，提取协议域名端口号\n\u003e\n\u003e match/m : 关键词匹配，url 中包含关键词才读取此url内容继续分析\n\u003e\n\u003e cache/c : 配置缓存目录，当配置有值时启用缓存\n\u003e\n\u003e sleep/s : 每次请求的间隔休眠时间，默认0，单位ms\n\u003e\n**args**\n\n\u003e\n\u003e 参数: 入口页面，可以指定多个\n\n```\n./main url1 url2\n```\n\n\n**额外功能**\n\n可提取图片等资源地址\n\n\u003e attrs/a : 提取的选择器和属性，例如`img[src]` , 可添加多个, `-a=\"img[src^=https]\" -a=\"img[src]\" -a=\"script[src]\"`\n\n死链检测/404检测\n\n\u003e 存储的json文件可查看404或无法访问等URL\n\n下载模式\n\n当没有配置`args`和`-h`参数时，进入下载模式\n\n`file/f`为读取的json文件\n\n此时的`attrs/a`为json文件里提取url的key\n\n\u003e -a=\"img[src^=http]\" , 可配置多个\n\u003e\n\u003e -f=\"sitemap.json\"\n\u003e\n\n如果`-a`参数和`-f`参数也没有指定，还可以从标准输入读取，每行一个url\n\n下载模式如果指定`-c`参数为`stdout`可将下载的文件输出到标准输出\n\n环境变量`HEADER_`开头可额外配置HTTP的请求头\n\n## 编译\n\n**依赖**\n\n```\nnimble install css3selectors\n```\n\n`nim --threads:off --mm:arc -d:ssl -d:release --opt:speed c main.nim`\n\n**static build**\n\n`apk add openssl-libs-static`\n\n```\nnim --mm:arc --threads:off -d:release -d:nimDisableCertificateValidation -d:useOpenSsl3 --passL:\"-ffunction-sections -fdata-sections\" --passL:\"-Wl,--gc-sections\" --dynlibOverrideAll --passL:-s --passL:-static --passL:-lssl --passL:-lcrypto -d:ssl --opt:speed c main\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuconghou%2Fsitemap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsuconghou%2Fsitemap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuconghou%2Fsitemap/lists"}