https://github.com/antoinegagne/robots
A parser for robots.txt with support for wildcards. See also RFC 9309.
https://github.com/antoinegagne/robots
crawling erlang erlang-library parser parsing parsing-library rfc-9309 robots-exclusion-standard robots-parser robots-txt
Last synced: 13 days ago
JSON representation
A parser for robots.txt with support for wildcards. See also RFC 9309.
- Host: GitHub
- URL: https://github.com/antoinegagne/robots
- Owner: AntoineGagne
- License: other
- Created: 2019-11-28T02:14:09.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-11-21T21:07:56.000Z (over 1 year ago)
- Last Synced: 2025-04-01T04:07:24.986Z (3 months ago)
- Topics: crawling, erlang, erlang-library, parser, parsing, parsing-library, rfc-9309, robots-exclusion-standard, robots-parser, robots-txt
- Language: Erlang
- Homepage:
- Size: 30.3 KB
- Stars: 3
- Watchers: 1
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# robots
[](https://github.com/AntoineGagne/robots/actions)
[](https://hex.pm/packages/robots)
[](https://hexdocs.pm/robots)
[](https://github.com/AntoineGagne/robots/releases)
[](https://coveralls.io/github/AntoineGagne/robots?branch=master)A library that parses and validates rules from `robots.txt`.
## Installation
This library is available on [hex.pm](https://hex.pm/packages/robots).
To install this library, simply add the following lines to your
`rebar.config`:```erlang
{robots, "1.1.1"}
```## Usage
```erlang
Content = <<"User-Agent: bot\nAllow: /fish">>,
%% This will return an opaque type that contains all the rules and their agents
{ok, RulesIndex} = robots:parse(Content, 200),
true = robots:is_allowed(<<"bot/1.0.0">>, <<"/fish/salmon.html">>, RulesIndex),
true = robots:is_allowed(<<"bot/1.0.0">>, <<"/Fish.asp">>, RulesIndex),
```## Development
### Running all the tests and linters
You can run all the tests and linters with the `rebar3` alias:
```sh
rebar3 check
```