{"id":37110841,"url":"https://github.com/greytabby/grawl","last_synced_at":"2026-01-14T13:09:47.118Z","repository":{"id":57707843,"uuid":"256750731","full_name":"greytabby/grawl","owner":"greytabby","description":"Simple web crawler for learning.","archived":false,"fork":false,"pushed_at":"2022-06-17T15:02:35.000Z","size":5832,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-06-21T11:48:18.971Z","etag":null,"topics":["crawler"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greytabby.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-18T12:41:13.000Z","updated_at":"2021-12-04T07:40:59.000Z","dependencies_parsed_at":"2022-08-31T21:34:07.580Z","dependency_job_id":null,"html_url":"https://github.com/greytabby/grawl","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/greytabby/grawl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greytabby%2Fgrawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greytabby%2Fgrawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greytabby%2Fgrawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greytabby%2Fgrawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greytabby","download_url":"https://codeload.github.com/greytabby/grawl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greytabby%2Fgrawl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28420828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T10:47:48.104Z","status":"ssl_error","status_checked_at":"2026-01-14T10:46:19.031Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler"],"created_at":"2026-01-14T13:09:46.432Z","updated_at":"2026-01-14T13:09:47.109Z","avatar_url":"https://github.com/greytabby.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# grawl\n\nSimple web crawler for learning.\n\n[![Build Status](https://travis-ci.com/greytabby/grawl.svg?branch=master)](https://travis-ci.com/greytabby/grawl)\n![Go](https://github.com/greytabby/grawl/workflows/Go/badge.svg)\n\n## Usage\n\n```Text\nUsage of Grawl:\n  -allowed_hosts string\n        Accessibel hosts. Use comma to specify multiple hosts\n  -depth int\n        Limit number of follow links on crawling (default 1)\n  -headless_chrome\n        Use headless chrome on crawling\n  -output_dir string\n        Directory name for saving crawl result\n  -parallelism int\n        Number of parallel execution of crawler (default 5)\n  -site string\n        Site to crawl\n  -v    show version\n```\n\n## Example\n\n### Command\n\n```sh\n./Grawl -site \"https://hub.docker.com\" \\\n-allowed_hosts hub.docker.com \\\n-depth 2 \\\n-headless_chrome \\\n-parallelism 10 \\\n-output_dir /tmp/dockerhub\n```\n\n### Output\n\nconsole log\n\n```text\nGrawl 2020/05/01 15:44:53 Output base directory: /tmp/dockerhub\nGrawl 2020/05/01 15:44:53 Crawling site: https://hub.docker.com\nGrawl 2020/05/01 15:44:53 Crawling max depth: 2\nGrawl 2020/05/01 15:44:53 Start Crawling...\nGrawl 2020/05/01 15:44:57 Visited: https://hub.docker.com\nGrawl 2020/05/01 15:44:57 Forbidden host: blog.docker.com\nGrawl 2020/05/01 15:44:57 Forbidden host: www.docker.com\nGrawl 2020/05/01 15:44:57 Forbidden host: www.docker.com\nGrawl 2020/05/01 15:44:57 Already visited: https://hub.docker.com/\nGrawl 2020/05/01 15:44:57 Forbidden host: www.docker.com\nGrawl 2020/05/01 15:44:57 Forbidden host: www.docker.com\nGrawl 2020/05/01 15:44:57 Forbidden host: www.linkedin.com\nGrawl 2020/05/01 15:44:57 Forbidden host: www.youtube.com\nGrawl 2020/05/01 15:44:57 Forbidden host: www.docker.com\nGrawl 2020/05/01 15:44:57 Already visited: https://hub.docker.com/search\nGrawl 2020/05/01 15:45:06 Visited: https://hub.docker.com/signup\nGrawl 2020/05/01 15:45:06 Visited: https://hub.docker.com/_/redis\nGrawl 2020/05/01 15:45:07 Visited: https://hub.docker.com/_/ubuntu\nGrawl 2020/05/01 15:45:07 Visited: https://hub.docker.com/_/nginx\nGrawl 2020/05/01 15:45:07 Visited: https://hub.docker.com/_/postgres\nGrawl 2020/05/01 15:45:07 Visited: https://hub.docker.com/_/node\nGrawl 2020/05/01 15:45:07 Visited: https://hub.docker.com/_/alpine\nGrawl 2020/05/01 15:45:08 Visited: https://hub.docker.com/_/couchbase\nGrawl 2020/05/01 15:45:08 Already visited: https://hub.docker.com/signup\nGrawl 2020/05/01 15:45:08 Already visited: https://hub.docker.com/search\nGrawl 2020/05/01 15:45:08 Invalid URL: \nGrawl 2020/05/01 15:45:08 Invalid URL: \n.\n.\n.\n\n```\n\noutput directory tree\n\n```text\n/tmp/dockerhub\n└── hub_docker_com\n    ├── _\n    │   ├── aerospike\n    │   │   └── index.html\n    │   ├── alpine\n    │   │   └── index.html\n    │   ├── busybox\n    │   │   └── index.html\n    │   ├── couchbase\n    │   │   └── index.html\n    │   ├── golang\n    │   │   └── index.html\n    │   ├── hello-world\n    │   │   └── index.html\n    │   ├── mongo\n    │   │   └── index.html\n    │   ├── mysql\n    │   │   └── index.html\n    │   ├── nginx\n    │   │   └── index.html\n    │   ├── node\n    │   │   └── index.html\n    │   ├── postgres\n    │   │   └── index.html\n    │   ├── redis\n    │   │   └── index.html\n    │   ├── registry\n    │   │   └── index.html\n    │   ├── traefik\n    │   │   └── index.html\n    │   └── ubuntu\n    │       └── index.html\n    ├── index.html\n    ├── search\n    │   └── index.html\n    └── signup\n        └── index.html\n```\n\n## docker-compose\n\n```yml\nversion: '2'\nservices: \n  grawl:\n    image: greytabby/grawl:latest\n    volumes:\n      - .:/result/dockerhub\n    environment: \n      # SITE is url which grawl first visit\n      - SITE=https://hub.docker.com/\n      # ALLOWED_HOSTS are acceessible hosts. Use comma to specify multiple hosts\n      - ALLOWED_HOSTS=hub.docker.com\n      # DEPTH is limit nuber of follow links on crawling\n      - DEPTH=2\n      # PARALLELISM is number of parallel execution of crawler\n      - PARALLELISM=5\n      # HEADLESS_CHROME use headless chreme on crawling\n      # If you do not use headless chrome, delete this variable\n      - HEADLESS_CHROME=\n      # OUTPUT_DIR is directory name for saving crawl result\n      - OUTPUT_DIR=/result/dockerhub\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreytabby%2Fgrawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreytabby%2Fgrawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreytabby%2Fgrawl/lists"}