https://github.com/alaz/legitbot

🤔 Is this Web request from a real search engine🕷 or from an impersonating agent 🕵️‍♀️?
https://github.com/alaz/legitbot

bot detect-crawlers fake googlebot impersonation protection ruby ruby-gem search-engine security

Last synced: about 1 month ago
JSON representation

🤔 Is this Web request from a real search engine🕷 or from an impersonating agent 🕵️‍♀️?

Host: GitHub
URL: https://github.com/alaz/legitbot
Owner: alaz
License: other
Created: 2016-12-19T15:39:16.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2026-02-14T07:36:13.000Z (2 months ago)
Last Synced: 2026-02-20T16:18:18.876Z (2 months ago)
Topics: bot, detect-crawlers, fake, googlebot, impersonation, protection, ruby, ruby-gem, search-engine, security
Language: Ruby
Homepage:
Size: 304 KB
Stars: 26
Watchers: 2
Forks: 10
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # Legitbot ![](https://github.com/alaz/legitbot/workflows/build/badge.svg) ![](https://badge.fury.io/rb/legitbot.svg)

Ruby gem to make sure that an IP really belongs to a bot, typically a search

engine.

## Usage

Suppose you have a Web request and you would like to check it is not diguised:

```ruby

bot = Legitbot.bot(userAgent, ip)

```

`bot` will be `nil` if no bot signature was found in the `User-Agent`.

Otherwise, it will be an object with methods

```ruby

bot.detected_as # => :google

bot.valid? # => true

bot.fake? # => false

```

Sometimes you already know which search engine to expect. For example, you might

be using [rack-attack](https://github.com/kickstarter/rack-attack):

```ruby

Rack::Attack.blocklist("fake Googlebot") do |req|

  req.user_agent =~ %r(Googlebot) && Legitbot::Google.fake?(req.ip)

end

```

Or if you do not like all those ghoulish crawlers stealing your content,

evaluating it and getting ready to invade your site with spammers, then block

them all:

```ruby

Rack::Attack.blocklist 'fake search engines' do |request|

  Legitbot.bot(request.user_agent, request.ip)&.fake?

end

```

## Versioning

[Semantic versioning](https://semver.org/) with the following clarifications:

- MINOR version is incremented when support for new bots is added.

- PATCH version is incremented when validation logic for a bot changes (IP list

  updated, for example).

## Supported

- [Ahrefs](https://ahrefs.com/robot)

- [AmazonAdBot](https://adbot.amazon.com/)

- [AmazonBot](https://developer.amazon.com/amazonbot)

- [Applebot](https://support.apple.com/en-us/119829)

- [Baidu spider](http://help.baidu.com/question?prod_en=master&class=498&id=1000973)

- [Bingbot](https://blogs.bing.com/webmaster/2012/08/31/how-to-verify-that-bingbot-is-bingbot/)

- [BLEXBot (WebMeUp)](http://webmeup-crawler.com/)

- [DataForSEO](https://dataforseo.com/dataforseo-bot)

- [DuckAssistBot](https://duckduckgo.com/duckduckgo-help-pages/results/duckassistbot)

- [DuckDuckBot](https://duckduckgo.com/duckduckgo-help-pages/results/duckduckbot)

- [Google crawlers](https://support.google.com/webmasters/answer/1061943)

- [IAS](https://integralads.com/ias-privacy-data-management/policies/site-indexing-policy/)

- [OpenAI GPTBot](https://platform.openai.com/docs/gptbot)

- [Oracle Data Cloud Crawler](https://www.oracle.com/corporate/acquisitions/grapeshot/crawler.html)

- [Marginalia](https://www.marginalia.nu/marginalia-search/for-webmasters/)

- [Meta / Facebook Web crawlers](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/)

- [Petal search engine](http://aspiegel.com/petalbot)

- [Pinterest](https://help.pinterest.com/en/articles/about-pinterest-crawler-0)

- [Twitterbot](https://developer.twitter.com/en/docs/tweets/optimize-with-cards/guides/getting-started),

  the list of IPs is in the

  [Troubleshooting page](https://developer.twitter.com/en/docs/tweets/optimize-with-cards/guides/troubleshooting-cards)

- [Yandex robots](https://yandex.com/support/webmaster/robot-workings/check-yandex-robots.xml)

## License

Apache 2.0

## Other projects

- Play Framework variant in Scala:

  [play-legitbot](https://github.com/osinka/play-legitbot)

- Article

  [When (Fake) Googlebots Attack Your Rails App](http://jessewolgamott.com/blog/2015/11/17/when-fake-googlebots-attack-your-rails-app/)

- [Voight-Kampff](https://github.com/biola/Voight-Kampff) is a Ruby gem that

  detects bots by `User-Agent`

- [crawler_detect](https://github.com/loadkpi/crawler_detect) is a Ruby gem and

  Rack middleware to detect crawlers by few different request headers, including

  `User-Agent`

- Project Honeypot's [http:BL](https://www.projecthoneypot.org/httpbl_api.php)

  can not only classify IP as a search engine, but also label them as suspicious

  and reports the number of days since the last activity. My implementation of

  the protocol in Scala is [here](https://github.com/osinka/httpbl).

- [CIDRAM](https://github.com/CIDRAM/CIDRAM) is a PHP routing manager with

  built-in support to validate bots.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alaz/legitbot

Awesome Lists containing this project

README