Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/muratgozel/robotstxt-util
RFC 9309 spec compliant robots.txt builder and parser. 🦾 No dependencies, fully typed.
https://github.com/muratgozel/robotstxt-util
rfc-5234 robots-builder robots-exclusion-protocol robots-generator robots-parser robots-txt
Last synced: about 1 month ago
JSON representation
RFC 9309 spec compliant robots.txt builder and parser. 🦾 No dependencies, fully typed.
- Host: GitHub
- URL: https://github.com/muratgozel/robotstxt-util
- Owner: muratgozel
- License: mit
- Created: 2020-05-19T12:49:53.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-09-14T08:46:08.000Z (4 months ago)
- Last Synced: 2024-11-10T09:14:29.278Z (2 months ago)
- Topics: rfc-5234, robots-builder, robots-exclusion-protocol, robots-generator, robots-parser, robots-txt
- Language: TypeScript
- Homepage:
- Size: 136 KB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# robotstxt-util
RFC 9309 spec compliant robots.txt builder and parser. 🦾 No dependencies, fully typed.![NPM](https://img.shields.io/npm/l/robotstxt-util)
[![Build status](https://badge.buildkite.com/59019ef6df1cc44bcb5b790bd21f198d1e488c842624c62cd8.svg)](https://buildkite.com/gozel/robotstxt-util)Before using this library, I recommend you to read the following guide by Google:
[https://developers.google.com/search/docs/crawling-indexing/robots/intro](https://developers.google.com/search/docs/crawling-indexing/robots/intro)Note to myself (and contributors):
[https://www.rfc-editor.org/rfc/rfc9309.html](https://www.rfc-editor.org/rfc/rfc9309.html)## Install
```sh
npm i robotstxt-util
```## Use
Exports a parser `parseRobotsTxt` and an object `RobotsTxt` to create and manage robots.txt data.### Create robots.txt
```js
import { RobotsTxt } from 'robotstxt-util'const robotstxt = new RobotsTxt()
const allBots = robotstxt.newGroup('*')
allBots.disallow('/')const googleBot = robotstxt.newGroup('googlebot')
googleBot.allow('/abc')
googleBot.disallow('/def').disallow('/jkl')// specify multiple bots
const otherBots = robotstxt.newGroup(['abot', 'bbot', 'cbot'])
googleBot.allow('/qwe')
// specify custom rules
googleBot.addCustomRule('crawl-delay', 10)// add sitemaps
robotstxt.add('sitemap', 'https://yoursite/sitemap.en.xml')
robotstxt.add('sitemap', 'https://yoursite/sitemap.tr.xml')// and export
const json = robotstxt.json()
const txt = robotstxt.txt()
```### Parse robots.txt data
Parses the data and returns instance of `RobotsTxt`:
```js
import { parseRobotsTxt } from 'robotstxt-util'const data = `
# hello robotsUser-Agent: *
Disallow: *.gif$
Disallow: /example/
Allow: /publications/User-Agent: foobot
Disallow:/
crawl-delay: 10
Allow:/example/page.html
Allow:/example/allowed.gif# comments will be stripped out
User-Agent: barbot
User-Agent: bazbot
Disallow: /example/page.htmlSitemap: https://yoursite/sitemap.en.xml
Sitemap: https://yoursite/sitemap.tr.xml
`
const robotstxt = parseRobotsTxt(data)// update something in some group
robotstxt.findGroup('barbot').allow('/aaa').allow('/bbb')// store as json or do whatever you want
const json = robotstxt.json()
```## Contributing
If you're interested in contributing, read the [CONTRIBUTING.md](https://github.com/muratgozel/muratgozel/blob/main/CONTRIBUTING.md) first, please.---
Thanks for watching 🐬