https://github.com/ravern/gollum
Robots.txt parser and fetcher for Elixir
https://github.com/ravern/gollum
crawler elixir robots-parser robots-txt
Last synced: 28 days ago
JSON representation
Robots.txt parser and fetcher for Elixir
- Host: GitHub
- URL: https://github.com/ravern/gollum
- Owner: ravern
- Created: 2017-10-06T15:03:21.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-03-30T22:08:47.000Z (almost 3 years ago)
- Last Synced: 2025-10-07T00:47:50.689Z (3 months ago)
- Topics: crawler, elixir, robots-parser, robots-txt
- Language: Elixir
- Homepage: https://hexdocs.pm/gollum
- Size: 29.3 KB
- Stars: 14
- Watchers: 1
- Forks: 12
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://semaphoreci.com/ravernkoh/gollum)
# Gollum
Robots.txt parser with caching. Modelled after Kryten. Docs can be found [here](https://hexdocs.pm/gollum/api-reference.html).
# Usage
Call Gollum.crawlable?/3 to obtain whether a certain URL is permitted for the specified user agent.
```elixir
iex> Gollum.crawlable?("hello", "https://google.com/")
:crawlable
iex> Gollum.crawlable?("hello", "https://google.com/m/")
:uncrawlable
```
Gollum is an OTP app (For the cache) so just remember to specify it in the extra_applications key in your mix.exs to ensure it is started.
Gollum allows for some configuration in your config.exs file. The following shows their default values. They are all optional.
```elixir
config :gollum,
name: Gollum.Cache, # Name of the Cache GenServer
refresh_secs: 86_400, # Amount of time before the robots.txt will be refetched
lazy_refresh: false, # Whether to setup a timer that auto-refetches, or to only refetch when requested
user_agent: "Gollum" # User agent to use when sending the GET request for the robots.txt
```
# Author
Ravern Koh - <>