{"id":43055841,"url":"https://github.com/gustavooferreira/wcrawler","last_synced_at":"2026-01-31T11:07:11.891Z","repository":{"id":57569983,"uuid":"340811297","full_name":"gustavooferreira/wcrawler","owner":"gustavooferreira","description":"Simple Web Crawler CLI tool with \"minimal\" dependencies","archived":false,"fork":false,"pushed_at":"2023-10-23T23:59:41.000Z","size":131,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-06-20T03:41:31.277Z","etag":null,"topics":["cli","crawler","golang","graph","html","links","web"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gustavooferreira.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-02-21T03:48:07.000Z","updated_at":"2023-10-23T23:59:45.000Z","dependencies_parsed_at":"2024-06-20T02:58:11.696Z","dependency_job_id":null,"html_url":"https://github.com/gustavooferreira/wcrawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gustavooferreira/wcrawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gustavooferreira%2Fwcrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gustavooferreira%2Fwcrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gustavooferreira%2Fwcrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gustavooferreira%2Fwcrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gustavooferreira","download_url":"https://codeload.github.com/gustavooferreira/wcrawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gustavooferreira%2Fwcrawler/sbom","scorecard":{"id":450129,"data":{"date":"2025-08-11","repo":{"name":"github.com/gustavooferreira/wcrawler","commit":"f9fa47ea7d95ca48a97cf666395a729548e9e50c"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.7,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"13 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GO-2022-0236 / GHSA-h86h-8ppg-mxmh","Warn: Project is vulnerable to: GO-2021-0238 / GHSA-83g2-8m93-v3w7","Warn: Project is vulnerable to: GO-2022-0288","Warn: Project is vulnerable to: GO-2022-0969 / GHSA-69cg-p879-7622","Warn: Project is vulnerable to: GO-2022-1144 / GHSA-xrjj-mj9h-534m","Warn: Project is vulnerable to: GO-2023-1571 / GHSA-vvpx-j8f3-3w6h","Warn: Project is vulnerable to: GO-2023-1988 / GHSA-2wrh-6pvc-2jm9","Warn: Project is vulnerable to: GO-2023-2102 / GHSA-4374-p667-p6c8","Warn: Project is vulnerable to: GHSA-qppj-fm5r-hxr3","Warn: Project is vulnerable to: GO-2024-2687 / GHSA-4v7x-pqxf-cx7m","Warn: Project is vulnerable to: GO-2024-3333","Warn: Project is vulnerable to: GO-2025-3503 / GHSA-qxp5-gwg8-xv66","Warn: Project is vulnerable to: GO-2025-3595 / GHSA-vvgc-356p-c3xw"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-19T07:59:38.335Z","repository_id":57569983,"created_at":"2025-08-19T07:59:38.336Z","updated_at":"2025-08-19T07:59:38.336Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28939586,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T10:18:23.202Z","status":"ssl_error","status_checked_at":"2026-01-31T10:18:22.693Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","crawler","golang","graph","html","links","web"],"created_at":"2026-01-31T11:07:11.797Z","updated_at":"2026-01-31T11:07:11.871Z","avatar_url":"https://github.com/gustavooferreira.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WCrawler\n\n[![Build Status](https://travis-ci.com/gustavooferreira/wcrawler.svg?branch=master)](https://travis-ci.com/gustavooferreira/wcrawler)\n[![codecov](https://codecov.io/gh/gustavooferreira/wcrawler/branch/master/graph/badge.svg)](https://codecov.io/gh/gustavooferreira/wcrawler)\n[![Go Report Card](https://goreportcard.com/badge/github.com/gustavooferreira/wcrawler)](https://goreportcard.com/report/github.com/gustavooferreira/wcrawler)\n[![PkgGoDev](https://pkg.go.dev/badge/github.com/gustavooferreira/wcrawler)](https://pkg.go.dev/github.com/gustavooferreira/wcrawler)\n\nWCrawler is a simple web crawler CLI tool.\n\n**NOTE:** This tool was created mainly for practice purposes and therefore doesn't rely on any library that facilitates crawling.\n\nhttps://user-images.githubusercontent.com/17534422/109546768-85aec680-7ac2-11eb-8c72-2dbf7c7223a8.mp4\n\n\n# Usage\n\nExploring the Web:\n\n```\n❯ wcrawler explore --help\nExplore the web by following links up to a pre-determined depth.\nA depth of zero means no limit.\n\nUsage:\n  wcrawler explore URL [flags]\n\n\nFlags:\n  -d, --depth uint        depth of recursion (default 5)\n  -h, --help              help for explore\n  -s, --nostats           don't show live stats\n  -o, --output string     file to save results (default \"./web_graph.json\")\n  -r, --retry uint        retry requests when they timeout (default 2)\n  -z, --stayinsubdomain   follow links only in the same subdomain\n  -t, --timeout uint      HTTP requests timeout in seconds (default 10)\n  -m, --treemode          doesn't add links which would point back to known nodes\n  -w, --workers uint      number of workers making concurrent requests (default 100)\n```\n\nVisualizing the graph in the browser:\n\n```\n❯ wcrawler view --help\nView web links relationships in the browser\n\nUsage:\n  wcrawler view [flags]\n\nFlags:\n  -h, --help            help for view\n  -i, --input string    file containing the data (default \"./web_graph.json\")\n  -n, --noautoopen      don't open browser automatically\n  -o, --output string   HTML output file (default \"./web_graph.html\")\n```\n\nThis will generate a webpage and load it on your default browser.\n\nSpheres are coloured based on the URL subdomain, you can pan, tilt and rotate the scene, drag the spheres and move them around, hover to check the URL they represent and click on them to go straight to that URL.\n\n**NOTE:** If you want to see a nice graph, make sure to run `wcrawler explore` with the `-m` flag.\nTree mode doesn't create links back to the original URLs making for much nicer visualizations.\nIts utility? None, but the graphs are undeniably more beautiful.\n\nNaturally, if you want a proper graph of the links visited and where they point to, just disregard the `-m` option. Don't try to visualize that, however, cos it's going to look ugly, if not freeze your browser entirely. Consider yourself warned :)\n\n# Example\n\nThe following command will crawl the web starting at the `example.com` website up to a max of 8 depth levels, using 5 workers with a 6 second timeout per request and saving the collected data to `/tmp/result.json`.\n\n```\nwcrawler explore https://example.com -d 8 -w 5 -t 6 -o /tmp/result.json\n```\n\nThe following command will then generate an HTML file with a graph view of the data collected and load it onto the default web browser. Only try to visualize the graph if you have specified the `-m` option! It's going to be the wrong graph, but it's going to look nice!\n\n```\nwcrawler view -i /tmp/result.json\n```\n\n---\n\n# Considerations\n\nHere I'm going to discuss the design decisions and a few caveats, but only when I'm actually done with the project.\n\nStill have a few more things to do like:\n\n- Add logic to fetch website's robots.txt file and adhere to whatever it's in there. At the moment we are just crawling everything (feeling like an outlaw here at the minute)\n- Show last 10 errors in the CLI while crawling\n- Make output more colorful\n- Docs, docs and more docs\n- Write more unittests\n- Increase coverage and run some benchmarks (I'm pretty sure I can speed up some parts and reduce allocations, even though this program is I/O bound more than anything else so won't benefit much from these optimizations, but practice is practice)\n- Add golangci-lint to travis-ci (cos it's quite nice)\n- Organize code in a way that makes it for a useful library (mostly done)\n\n---\n\n# Third party libraries being used (directly):\n\nCould have written the whole thing without using any library, but reusability is not a bad idea at all!\n\nThe only rule I had was to not use any library that facilitates crawling.\n\n```\n- github.com/gosuri/uilive     [updating terminal output in realtime]\n- github.com/spf13/cobra       [CLI args and flags parsing]\n- github.com/stretchr/testify  [writing unit tests]\n- golang.org/x/net             [HTML parsing]\n- github.com/oleiade/lane      [Provides a Queue data structure implementation]\n```\n\n---\n\n# Staying up to date\n\nTo update wcrawler to the latest version, use `go get -u github.com/gustavooferreira/wcrawler`.\n\n---\n\n# Build\n\nTo build this project run:\n\n```\nmake build\n```\n\nThe `wcrawler` binary will be placed inside the `bin/` folder.\n\n---\n\n# Tests\n\nTo run tests:\n\n```\nmake test\n```\n\nTo get coverage:\n\n```\nmake coverage\n```\n\n## Free tip\n\n\u003e If you run `make` without any targets, it will display all options available on the makefile followed by a short description.\n\n---\n\n# Contributing\n\nI'd normally be more than happy to accept pull requests, but given that I've created this project with the sole intent of practicing, it doesn't make sense for me to accept other people's work.\n\nHowever, feel free to fork the project and add whatever new features you feel like.\n\nI'd still be glad if you notice a bug and report it by opening an issue.\n\n---\n\n# License\n\nThis project is licensed under the terms of the MIT license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgustavooferreira%2Fwcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgustavooferreira%2Fwcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgustavooferreira%2Fwcrawler/lists"}