https://github.com/monperrus/crawler-user-agents

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
https://github.com/monperrus/crawler-user-agents

Last synced: 2 months ago
JSON representation

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:

Host: GitHub
URL: https://github.com/monperrus/crawler-user-agents
Owner: monperrus
License: mit
Created: 2014-03-07T20:44:56.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2025-04-23T16:25:40.000Z (3 months ago)
Last Synced: 2025-04-28T00:44:37.207Z (3 months ago)
Language: Go
Homepage:
Size: 679 KB
Stars: 1,269
Watchers: 43
Forks: 267
Open Issues: 12
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

favorite-link - 机器人/机器人/爬虫/刮刀/蜘蛛使用的 HTTP 用户代理的语法模式。
awesome-hacking-lists - monperrus/crawler-user-agents - Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star: (Go)

README

        # crawler-user-agents

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

* NPM package: 

* Go package: 

* PyPi package: 

Each `pattern` is a regular expression. It should work out-of-the-box wih your favorite regex library.

If you use this project in a commercial product, [please sponsor it](https://github.com/sponsors/monperrus).

## Install

### Direct download

Download the [`crawler-user-agents.json` file](https://raw.githubusercontent.com/monperrus/crawler-user-agents/master/crawler-user-agents.json) from this repository directly.

### Javascript

crawler-user-agents is deployed on npmjs.com: 

To use it using npm or yarn:

```sh

npm install --save crawler-user-agents

# OR

yarn add crawler-user-agents

```

In Node.js, you can `require` the package to get an array of crawler user agents.

```js

const crawlers = require('crawler-user-agents');

console.log(crawlers);

```

### Python

Install with `pip install crawler-user-agents`

Then:

```python

import crawleruseragents

if crawleruseragents.is_crawler("Googlebot/"):

   # do something

```

or:

```python

import crawleruseragents

indices = crawleruseragents.matching_crawlers("bingbot/2.0")

print("crawlers' indices:", indices)

print(

    "crawler's URL:",

    crawleruseragents.CRAWLER_USER_AGENTS_DATA[indices[0]]["url"]

)

```

Note that `matching_crawlers` is much slower than `is_crawler`, if the given User-Agent does indeed match any crawlers.

### Go

Go: use [this package](https://pkg.go.dev/github.com/monperrus/crawler-user-agents),

  it provides global variable `Crawlers` (it is synchronized with `crawler-user-agents.json`),

  functions `IsCrawler` and `MatchingCrawlers`.

Example of Go program:

```go

package main

import (

	"fmt"

	"github.com/monperrus/crawler-user-agents"

)

func main() {

	userAgent := "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"

	isCrawler := agents.IsCrawler(userAgent)

	fmt.Println("isCrawler:", isCrawler)

	indices := agents.MatchingCrawlers(userAgent)

	fmt.Println("crawlers' indices:", indices)

	fmt.Println("crawler's URL:", agents.Crawlers[indices[0]].URL)

}

```

Output:

```

isCrawler: true

crawlers' indices: [237]

crawler' URL: https://discordapp.com

```

## Contributing

I do welcome additions contributed as pull requests.

The pull requests should:

* contain a single addition

* specify a discriminant relevant syntactic fragment (for example "totobot" and not "Mozilla/5 totobot v20131212.alpha1")

* contain the pattern (generic regular expression), the discovery date (year/month/day) and the official url of the robot

* result in a valid JSON file (don't forget the comma between items)

Example:

    {

      "pattern": "rogerbot",

      "addition_date": "2014/02/28",

      "url": "http://moz.com/help/pro/what-is-rogerbot-",

      "instances" : ["rogerbot/2.3 example UA"]

    }

## License

The list is under a [MIT License](https://opensource.org/licenses/MIT). The versions prior to Nov 7, 2016 were under a [CC-SA](http://creativecommons.org/licenses/by-sa/3.0/) license.

## Related work

There are a few wrapper libraries that use this data to detect bots:

 * [Voight-Kampff](https://github.com/biola/Voight-Kampff) (Ruby)

 * [isbot](https://github.com/Hentioe/isbot) (Ruby)

 * [crawlers](https://github.com/Olical/crawlers) (Clojure)

 * [isBot](https://github.com/omrilotan/isbot) (Node.JS)

Other systems for spotting robots, crawlers, and spiders that you may want to consider are:

 * [Crawler-Detect](https://github.com/JayBizzle/Crawler-Detect) (PHP)

 * [BrowserDetector](https://github.com/mimmi20/BrowserDetector) (PHP)

 * [browscap](https://github.com/browscap/browscap) (JSON files)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/monperrus/crawler-user-agents

Awesome Lists containing this project

README