{"id":19094430,"url":"https://github.com/risto-stevcev/robots-parser-combinator","last_synced_at":"2026-05-26T01:30:18.558Z","repository":{"id":65491000,"uuid":"52652991","full_name":"Risto-Stevcev/robots-parser-combinator","owner":"Risto-Stevcev","description":":beetle: A proper robots.txt parser and combinator that works with eulalie","archived":false,"fork":false,"pushed_at":"2016-02-27T16:47:49.000Z","size":13,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-03T13:17:43.388Z","etag":null,"topics":["combinator","parser","robots"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/robots-parser-combinator","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Risto-Stevcev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-02-27T06:08:16.000Z","updated_at":"2017-08-07T09:55:47.000Z","dependencies_parsed_at":"2023-01-25T18:35:10.577Z","dependency_job_id":null,"html_url":"https://github.com/Risto-Stevcev/robots-parser-combinator","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Risto-Stevcev%2Frobots-parser-combinator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Risto-Stevcev%2Frobots-parser-combinator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Risto-Stevcev%2Frobots-parser-combinator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Risto-Stevcev%2Frobots-parser-combinator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Risto-Stevcev","download_url":"https://codeload.github.com/Risto-Stevcev/robots-parser-combinator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240144104,"owners_count":19754851,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["combinator","parser","robots"],"created_at":"2024-11-09T03:29:07.888Z","updated_at":"2026-05-26T01:30:18.391Z","avatar_url":"https://github.com/Risto-Stevcev.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# robots-parser-combinator\n\n[![Build Status](https://travis-ci.org/Risto-Stevcev/robots-parser-combinator.svg)](https://travis-ci.org/Risto-Stevcev/robots-parser-combinator)\n\nA proper robots.txt parser and combinator that works with [eulalie](https://github.com/bodil/eulalie).\n\n\n## Usage\n```\nUser-agent: *\nAllow: /blog/index.html  # site blog\nDisallow: /cgi-bin/\nDisallow: /tmp/\nSitemap: http://www.mysite.com/sitemaps/profiles-sitemap.xml  # extra profile urls\n# save the robots\n```\n\n\n```javascript\n\u003e const parser = require('robots-parser-combinator')\n\u003e const robotstxt = fs.readFileSync('./robots.txt', 'utf8')\n\u003e\n\u003e var goodRobots = parser.parse(robotstxt)\n[ { useragent: { value: '*' } },\n  { allow: { value: '/blog/index.html', comment: 'site blog' } },\n  { disallow: { value: '/cgi-bin/' } },\n  { disallow: { value: '/tmp/' } },\n  { sitemap:\n     { value: 'http://www.mysite.com/sitemaps/profiles-sitemap.xml',\n       comment: 'extra profile urls' } },\n  'save the robots' ]\n\n\u003e var badRobots = parser.parse('')\n[]\n```\n\nOr you can feed the `parser.robotstxt` combinator into eulalie to parse robots.txt.\n\nYou can also parse `robots.txt` containing nonstandard extensions like `Crawl-delay` or `Host` by using the `parser.parseNS` function. The combinators for nonstandard extensions are also provided.\n\n\n## Implementation\n\nThe parser is an implementation of the BNF form for robots.txt based on the [Google spec](https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt), and references  [RFC 1945](http://www.ietf.org/rfc/rfc1945.txt) and [RFC 1808](http://www.ietf.org/rfc/rfc1808.txt) when appropriate.\n\nLWS (linear-white-space) is defined using the rule specified in [RFC 5234](http://www.ietf.org/rfc/rfc5234.txt), rather than RFC 1945. There is a small but very significant inconsistency between the rules:\n\nRFC 5234 linear-white-space:\n```\nWSP  = SP / HTAB\nLWSP = *(WSP / CRLF WSP)\n```\n\nRFC 1945 linear-white-space:\n```\nLWS = [CRLF] 1*( SP | HT )\n```\n\nThe RFC 1945 linear-white-space rule consumes at least one `space` or `tab` character, and RFC 5234 does not. Due to this inconsistency, the parser has chosen the more general rule in order to be more flexible. You can set the parser to use the stricter rule by setting `parser.setStrictLWS(true)` before parsing.\n\nAll of the BNF rules in the robots.txt spec are provided as combinators. Since the combinators are compatible with eulalie, you can use them to get partial aspects of a robots.txt file or as part of a larger combinator.\n\n\n## License\nLicensed under the MIT license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fristo-stevcev%2Frobots-parser-combinator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fristo-stevcev%2Frobots-parser-combinator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fristo-stevcev%2Frobots-parser-combinator/lists"}