{"id":19238560,"url":"https://github.com/tmpfs/wget-parser","last_synced_at":"2025-02-23T13:52:55.484Z","repository":{"id":66127967,"uuid":"51428759","full_name":"tmpfs/wget-parser","owner":"tmpfs","description":"Parses the wget spider output","archived":false,"fork":false,"pushed_at":"2016-02-10T09:43:35.000Z","size":29,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-05T11:06:50.396Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tmpfs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-02-10T08:22:03.000Z","updated_at":"2020-04-29T23:40:00.000Z","dependencies_parsed_at":"2023-04-30T04:36:04.430Z","dependency_job_id":null,"html_url":"https://github.com/tmpfs/wget-parser","commit_stats":{"total_commits":23,"total_committers":1,"mean_commits":23.0,"dds":0.0,"last_synced_commit":"389949add583007ef3a7e8e8502667be4c58baa3"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmpfs%2Fwget-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmpfs%2Fwget-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmpfs%2Fwget-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tmpfs%2Fwget-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tmpfs","download_url":"https://codeload.github.com/tmpfs/wget-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240324059,"owners_count":19783453,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T16:33:29.783Z","updated_at":"2025-02-23T13:52:55.464Z","avatar_url":"https://github.com/tmpfs.png","language":"JavaScript","readme":"Table of Contents\n=================\n\n* [Spider parser](#spider-parser)\n  * [Usage](#usage)\n    * [wget-parser](#wget-parser)\n    * [wget-spider](#wget-spider)\n  * [Output](#output)\n  * [Developer](#developer)\n    * [Test](#test)\n    * [Cover](#cover)\n    * [Lint](#lint)\n    * [Clean](#clean)\n    * [Readme](#readme)\n\nSpider parser\n=============\n\n[\u003cimg src=\"https://travis-ci.org/tmpfs/wget-parser.svg?v=1\" alt=\"Build Status\"\u003e](https://travis-ci.org/tmpfs/wget-parser)\n[\u003cimg src=\"http://img.shields.io/npm/v/wget-parser.svg?v=1\" alt=\"npm version\"\u003e](https://npmjs.org/package/wget-parser)\n[\u003cimg src=\"https://coveralls.io/repos/tmpfs/wget-parser/badge.svg?branch=master\u0026service=github\u0026v=2\" alt=\"Coverage Status\"\u003e](https://coveralls.io/github/tmpfs/wget-parser?branch=master).\n\nParses the spider output from [wget](https://www.gnu.org/software/wget) into an object structure of links.\n\nThis object could then be processed further to create a tree structure of the hierarchy of a website such that sitemap generation could be implemented.\n\nTested using `wget v1.15` on linux.\n\n## Usage\n\n```javascript\nvar parser = require('wget-parser')\n  , buf = new Buffer(0);      // buffer should contain the spider output\nconsole.dir(parser(buf));\n```\n\n* `parser.Parser`: The parser class. \n* `parser.Link`: The class that represents a link. \n* `parser.ParseStream`: Parse stream class.\n\nStreams support is available, see the [test spec](https://github.com/tmpfs/wget-parser/blob/master/test/spec/parser.js) for example usage.\n\n### wget-parser\n\nA program that reads from `stdin` and prints the result of the parse as JSON, exits with error code 1 if any broken links are found.\n\n```\ncat test/fixtures/mock.txt | wget-parser\ncat test/fixtures/broken.txt | wget-parser; echo $?;\n```\n\n### wget-spider\n\nA program that performs a spider with [wget](https://www.gnu.org/software/wget) and pipes the output to `wget-parser`:\n\n```\nwget-spider http://google.com\n```\n\n## Output\n\nExample output from the parser:\n\n```json\n{\n  \"links\": [\n    {\n      \"url\": {\n        \"protocol\": \"http:\",\n        \"slashes\": true,\n        \"auth\": null,\n        \"host\": \"google.com\",\n        \"port\": null,\n        \"hostname\": \"google.com\",\n        \"hash\": null,\n        \"search\": null,\n        \"query\": null,\n        \"pathname\": \"/\",\n        \"path\": \"/\",\n        \"href\": \"http://google.com/\"\n      },\n      \"link\": \"http://google.com/\",\n      \"line\": \"--2016-02-10 16:11:57--  http://google.com/\"\n    },\n    {\n      \"url\": {\n        \"protocol\": \"http:\",\n        \"slashes\": true,\n        \"auth\": null,\n        \"host\": \"www.google.co.id\",\n        \"port\": null,\n        \"hostname\": \"www.google.co.id\",\n        \"hash\": null,\n        \"search\": \"?gws_rd=cr\u0026ei=zfC6Vv6KKYexuATc3pu4DQ\",\n        \"query\": \"gws_rd=cr\u0026ei=zfC6Vv6KKYexuATc3pu4DQ\",\n        \"pathname\": \"/\",\n        \"path\": \"/?gws_rd=cr\u0026ei=zfC6Vv6KKYexuATc3pu4DQ\",\n        \"href\": \"http://www.google.co.id/?gws_rd=cr\u0026ei=zfC6Vv6KKYexuATc3pu4DQ\"\n      },\n      \"link\": \"http://www.google.co.id/?gws_rd=cr\u0026ei=zfC6Vv6KKYexuATc3pu4DQ\",\n      \"line\": \"--2016-02-10 16:11:57--  http://www.google.co.id/?gws_rd=cr\u0026ei=zfC6Vv6KKYexuATc3pu4DQ\"\n    }\n  ],\n  \"broken\": []\n}\n```\n\n## Developer\n\n### Test\n\nTo run the test suite:\n\n```\nnpm test\n```\n\n### Cover\n\nTo generate code coverage run:\n\n```\nnpm run cover\n```\n\n### Lint\n\nRun the source tree through [jshint](http://jshint.com) and [jscs](http://jscs.info):\n\n```\nnpm run lint\n```\n\n### Clean\n\nRemove generated files:\n\n```\nnpm run clean\n```\n\n### Readme\n\nTo build the readme file from the partial definitions:\n\n```\nnpm run readme\n```\n\nGenerated by [mdp(1)](https://github.com/tmpfs/mdp).\n\n[wget]: https://www.gnu.org/software/wget\n[jshint]: http://jshint.com\n[jscs]: http://jscs.info\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftmpfs%2Fwget-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftmpfs%2Fwget-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftmpfs%2Fwget-parser/lists"}