{"id":13574643,"url":"https://github.com/twiny/spidy","last_synced_at":"2025-08-17T06:32:01.663Z","repository":{"id":44368623,"uuid":"281510312","full_name":"twiny/spidy","owner":"twiny","description":"Domain names collector - Crawl websites and collect domain names along with their availability status.","archived":false,"fork":false,"pushed_at":"2023-07-23T23:48:50.000Z","size":58,"stargazers_count":142,"open_issues_count":6,"forks_count":26,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-05-22T18:32:29.423Z","etag":null,"topics":["backlinks","crawler","domain","expired-domain","golang","scraper","seotools","spider"],"latest_commit_sha":null,"homepage":"https://github.com/twiny/spidy/wiki","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/twiny.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-07-21T21:42:21.000Z","updated_at":"2024-05-03T17:34:32.000Z","dependencies_parsed_at":"2024-01-13T01:39:39.186Z","dependency_job_id":"c0d47831-3f31-4b40-9e49-1ba69fe3b517","html_url":"https://github.com/twiny/spidy","commit_stats":{"total_commits":11,"total_committers":4,"mean_commits":2.75,"dds":0.2727272727272727,"last_synced_commit":"fc5a8447c142eaec31dd823ca3c008db99b70c37"},"previous_names":["superiss/spidy"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twiny%2Fspidy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twiny%2Fspidy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twiny%2Fspidy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twiny%2Fspidy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/twiny","download_url":"https://codeload.github.com/twiny/spidy/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230098768,"owners_count":18172740,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backlinks","crawler","domain","expired-domain","golang","scraper","seotools","spider"],"created_at":"2024-08-01T15:00:53.266Z","updated_at":"2024-12-17T10:08:08.542Z","avatar_url":"https://github.com/twiny.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"## Spidy\nA tool that crawl websites to find domain names and checks thier availiabity.\n\n### Install\n\n```sh\ngit clone https://github.com/twiny/spidy.git\ncd ./spidy\n\n# build\ngo build -o bin/spidy -v cmd/spidy/main.go\n\n# run\n./bin/spidy -c config/config.yaml -u https://github.com\n```\n\n## Usage\n\n```sh\nNAME:\n   Spidy - Domain name scraper\n\nUSAGE:\n   spidy [global options] command [command options] [arguments...]\n\nVERSION:\n   2.0.0\n\nCOMMANDS:\n   help, h  Shows a list of commands or help for one command\n\nGLOBAL OPTIONS:\n   --config path, -c path  path to config file\n   --help, -h              show help (default: false)\n   --urls urls, -u urls    urls of page to scrape  (accepts multiple inputs)\n   --version, -v           print the version (default: false)\n```\n\n## Configuration\n\n```yaml\n# main crawler config\ncrawler:\n    max_depth: 10 # max depth of pages to visit per website.\n    # filter: [] # regexp filter\n    rate_limit: \"1/5s\" # 1 request per 5 sec\n    max_body_size: \"20MB\" # max page body size\n    user_agents: # array of user-agents\n      - \"Spidy/2.1; +https://github.com/ twiny/spidy\"\n    # proxies: [] # array of proxy. http(s), SOCKS5\n# Logs\nlog:\n    rotate: 7 # log rotation\n    path: \"./log\" # log directory\n# Store\nstore:\n    ttl: \"24h\" # keep cache for 24h \n    path: \"./store\" # store directory\n# Results\nresult:\n    path: ./result # result directory\nparralle: 3 # number of concurrent workers \ntimeout: \"5m\" # request timeout\ntlds: [\"biz\", \"cc\", \"com\", \"edu\", \"info\", \"net\", \"org\", \"tv\"] # array of domain extension to check.\n```\n\n\n## TODO\n\n- [ ] Add support to more `writers`.\n- [ ] Add terminal logging.\n- [ ] Add test cases.\n\n## Issues\n\nNOTE: This package is provided \"as is\" with no guarantee. Use it at your own risk and always test it yourself before using it in a production environment. If you find any issues, please create a new issue.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwiny%2Fspidy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftwiny%2Fspidy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwiny%2Fspidy/lists"}