{"id":20560361,"url":"https://github.com/andreaskoch/gargantua","last_synced_at":"2025-04-14T14:03:25.240Z","repository":{"id":56472394,"uuid":"80768239","full_name":"andreaskoch/gargantua","owner":"andreaskoch","description":"The fast website crawler","archived":false,"fork":false,"pushed_at":"2021-01-14T15:22:34.000Z","size":8891,"stargazers_count":34,"open_issues_count":2,"forks_count":3,"subscribers_count":8,"default_branch":"develop","last_synced_at":"2024-06-19T00:39:34.525Z","etag":null,"topics":["command-line","crawler","golang","xml-sitemap"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andreaskoch.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-02-02T21:00:20.000Z","updated_at":"2023-10-19T10:51:28.000Z","dependencies_parsed_at":"2022-08-15T19:20:51.339Z","dependency_job_id":null,"html_url":"https://github.com/andreaskoch/gargantua","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreaskoch%2Fgargantua","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreaskoch%2Fgargantua/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreaskoch%2Fgargantua/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreaskoch%2Fgargantua/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andreaskoch","download_url":"https://codeload.github.com/andreaskoch/gargantua/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224873086,"owners_count":17384078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line","crawler","golang","xml-sitemap"],"created_at":"2024-11-16T03:54:20.835Z","updated_at":"2024-11-16T03:54:21.544Z","avatar_url":"https://github.com/andreaskoch.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 「 gargantua 」\n\nThe fast website crawler\n\nYou can use「 gargantua 」to quickly and easily\n\n- **warm-up** your frontend caches\n- perform small **load-tests** against your publicly available pages\n- **measure** response times\n- **detect** broken links\n\nfrom your command line on Linux, macOS and Windows.\n\n![Animation: gargantua v0.1.0 crawling a website](files/gargantua-in-action-crawling-a-website.gif)\n\n\u003e Note: Press `Q` to stop the current crawling process.\n\n## Usage\n\nCrawl **www.sitemaps.org** with 5 concurrent workers:\n\n```bash\ngargantua crawl --url https://www.sitemaps.org/sitemap.xml --workers 5\n```\n\nsee also: [A short introduction video of gargantua on YouTube](https://www.youtube.com/watch?v=TSCMvUvc0qo)\n\n### Customize the user-agent\n\nYou can specify a customized user agent using the `--user-agent` argument:\n\n```bash\ngargantua crawl --url https://www.sitemaps.org/sitemap.xml --workers 5 --user-agent \"gargantua bot / iPhone\"\n```\n\n### Log all requests\n\nYou can specify a log file with the `--log` argument:\n\n```bash\ngargantua crawl --url https://www.sitemaps.org/sitemap.xml --workers 5 --log \"gargantua.log\"\n```\n\n```\nDate and time       #worker   Status Code     Bytes   Response Time   URL                                                          Parent URL\n2020/11/05 09:23:14 #001:     200             4403    148.759000ms    https://www.sitemaps.org                                     https://www.sitemaps.org/ko/faq.html\n2020/11/05 09:23:14 #002:     200             4403    290.536000ms    http://www.sitemaps.org/                                     https://www.sitemaps.org/ko/faq.html\n2020/11/05 09:23:14 #003:     200            45077    283.243000ms    https://www.sitemaps.org/protocol.html                       https://www.sitemaps.org/ko/faq.html\n2020/11/05 09:23:14 #004:     404             1245    155.376000ms    https://www.sitemaps.org/protocol.htm                        https://www.sitemaps.org/ko/faq.html\n2020/11/05 09:23:14 #005:     200             4403    155.577000ms    https://www.sitemaps.org/index.html                          https://www.sitemaps.org/ko/faq.html\n2020/11/05 09:23:14 #001:     200             2591    286.451000ms    http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd    https://www.sitemaps.org/ko/faq.html\n2020/11/05 09:23:14 #003:     200            10839    143.738000ms    https://www.sitemaps.org/terms.html                          https://www.sitemaps.org/ko/faq.html\n2020/11/05 09:23:14 #005:     200            15681    141.580000ms    https://www.sitemaps.org/faq.html                            https://www.sitemaps.org/ko/protocol.html\n2020/11/05 09:23:14 #002:     404             1245    286.175000ms    http://www.sitemaps.org/protocol.htm                         https://www.sitemaps.org/ko/faq.html\n```\n\n[gargantua.log](files/gargantua.log)\n\n\n## Download\n\nYou can download binaries for Linux, macOS and Windows from [github.com »andreaskoch » gargantua » releases](https://github.com/andreaskoch/gargantua/releases):\n\nLinux:\n\n```bash\ncurl -L https://github.com/andreaskoch/gargantua/releases/download/v0.5.0-alpha/gargantua_linux_amd64 -o gargantua\nchmod +x gargantua\n```\n\nmacOS:\n\n```bash\ncurl -L https://github.com/andreaskoch/gargantua/releases/download/v0.5.0-alpha/gargantua_darwin_amd64 -o gargantua\nchmod +x gargantua\n```\n\nWindows:\n\n```bash\ncurl -L https://github.com/andreaskoch/gargantua/releases/download/v0.5.0-alpha/gargantua_windows_amd64 -o gargantua.exe\n```\n\n## Docker Image\n\nThere is also a docker image that you can use to download or run the latest version of gargantua:\n\n[andreaskoch/gargantua](https://hub.docker.com/r/andreaskoch/gargantua/)\n\n```bash\ndocker run --rm andreaskoch/gargantua:latest \\\n       crawl \\\n       --verbose \\\n       --url https://www.sitemaps.org/sitemap.xml \\\n       --workers 5\n```\n\n**Note**: You will need the `--verbose` flag in order to prevent the command-line UI from loading. Otherwise gargantua will fail.\n\n## Roadmap\n\n- Increase the number of workers at runtime\n- Silent mode (only show statistics at the end)\n- CSV mode (print CSV output to stdout)\n- Web-UI\n- Save downloaded data to disk\n\n## License\n\n「 gargantua 」is licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for the full license text.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreaskoch%2Fgargantua","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandreaskoch%2Fgargantua","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreaskoch%2Fgargantua/lists"}