{"id":22355545,"url":"https://github.com/zbo14/web-tree-crawler","last_synced_at":"2025-09-13T16:05:31.305Z","repository":{"id":35202238,"uuid":"208334081","full_name":"zbo14/web-tree-crawler","owner":"zbo14","description":"A web crawler that builds a tree of URLs.","archived":false,"fork":false,"pushed_at":"2023-03-03T06:42:48.000Z","size":418,"stargazers_count":0,"open_issues_count":2,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-22T20:39:03.655Z","etag":null,"topics":["http","https","tree","url","web-crawler"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zbo14.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-13T19:37:46.000Z","updated_at":"2022-04-09T17:32:21.000Z","dependencies_parsed_at":"2023-01-15T16:03:45.501Z","dependency_job_id":null,"html_url":"https://github.com/zbo14/web-tree-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zbo14%2Fweb-tree-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zbo14%2Fweb-tree-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zbo14%2Fweb-tree-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zbo14%2Fweb-tree-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zbo14","download_url":"https://codeload.github.com/zbo14/web-tree-crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228124658,"owners_count":17873170,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["http","https","tree","url","web-crawler"],"created_at":"2024-12-04T14:07:10.458Z","updated_at":"2024-12-04T14:07:11.021Z","avatar_url":"https://github.com/zbo14.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# web-tree-crawler\n\nA naive web crawler that builds a tree of URLs under a domain using [web-tree](https://www.npmjs.com/package/web-tree).\n\n**Note:** This software is intended for personal learning and testing purposes.\n\n## How it works\n\nYou pass `web-tree-crawler` a URL and it tries to discover/visit as many URLs under that domain name as it can within a time limit. When time's up or it's run out of URLs, `web-tree-crawler` spits out a tree of URLs it visited. There are several configuration options - see the usage sections below.\n\n## Install\n\n`npm i web-tree-crawler`\n\n## CLI\n\n### Usage\n\n```\nUsage: [option=] web-tree-crawler \u003curl\u003e\n\nOptions:\n  format     , f  The output format of the tree (default=\"string\")\n  headers    , h  File containing headers to send with each request\n  numRequests, n  The number of requests to send at a time (default=200)\n  outFile    , o  Write the tree to file instead of stdout\n  pathList   , p  File containing paths to initially crawl\n  timeLimit  , t  The max number of seconds to run (default=120)\n  verbose    , v  Log info and progress to stdout\n```\n\n### Examples\n\n#### Crawl and print tree to stdout\n\n```\n$ h=/path/to/file web-tree-crawler \u003curl\u003e\n\n.com\n  .domain\n    .subdomain1\n      /foo\n        /bar\n      .subdomain-of-subdomain1\n        /baz\n          ?q=1\n    .subdomain2\n...\n```\n\nAnd to print an HTML tree...\n\n```\n$ f=html web-tree-crawler \u003curl\u003e\n\n...\n```\n\n#### Crawl and write tree to file\n\n```\n$ o=/path/to/file web-tree-crawler \u003curl\u003e\n\nWrote tree to file!\n```\n\n#### Crawl with verbose logging\n\n```\n$ v=true web-tree-crawler \u003curl\u003e\n\nVisited \"\u003curl\u003e\"\nVisited \"\u003canother-url\u003e\"\n...\n```\n\n## JS\n\n### Usage\n\n```js\n/**\n * This is the main exported function that crawls and resolves the URL tree.\n *\n * @param  {String}   url\n * @param  {Object}   [opts = {}]\n * @param  {Object}   [opts.headers]           - headers to send with each request\n * @param  {Number}   [opts.numRequests = 200] - the number of requests to send at a time\n * @param  {String[]} [opts.startPaths]        - paths to initially crawl\n * @param  {Number}   [opts.timeLimit = 120]   - the max number of seconds to run for\n * @param  {Boolean}  [opts.verbose]           - if true, logs info and progress to stdout\n * @param  {}         [opts....]               - additional options for #lib.request()\n *\n * @return {Promise}\n */\n```\n\n### Example\n\n```js\n'use strict'\n\nconst crawl = require('web-tree-crawler')\n\ncrawl(url, opts)\n  .then(tree =\u003e { ... })\n  .catch(err =\u003e { ... })\n```\n\n### Test\n\n`npm test`\n\n### Lint\n\n`npm run lint`\n\n### Documentation\n\n`npm run doc`\n\nGenerate the docs and open in browser.\n\n## Contributing\n\nPlease do!\n\nIf you find a bug, want a feature added, or just have a question, feel free to [open an issue](https://github.com/zbo14/web-tree-crawler/issues/new). In addition, you're welcome to [create a pull request](https://github.com/zbo14/web-tree-crawler/compare/develop...) addressing an issue. You should push your changes to a feature branch and request merge to `develop`.\n\nMake sure linting and tests pass and coverage is 💯 before creating a pull request!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzbo14%2Fweb-tree-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzbo14%2Fweb-tree-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzbo14%2Fweb-tree-crawler/lists"}