{"id":37660443,"url":"https://github.com/moskrc/crawlerdetect","last_synced_at":"2026-01-16T11:48:20.953Z","repository":{"id":35064949,"uuid":"202547825","full_name":"moskrc/crawlerdetect","owner":"moskrc","description":"🕷CrawlerDetect is a Python library designed to identify bots, crawlers, and spiders by analyzing their user agents.","archived":false,"fork":false,"pushed_at":"2025-07-09T16:52:34.000Z","size":2708,"stargazers_count":42,"open_issues_count":0,"forks_count":11,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-01-07T03:31:32.675Z","etag":null,"topics":["bot","crawler","detect","python","spider","user-agent"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/moskrc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-15T13:39:10.000Z","updated_at":"2025-11-09T20:41:05.000Z","dependencies_parsed_at":"2024-06-19T05:17:59.254Z","dependency_job_id":"96863bd7-e7f8-4f8c-98b0-d7d432241ec1","html_url":"https://github.com/moskrc/crawlerdetect","commit_stats":{"total_commits":21,"total_committers":3,"mean_commits":7.0,"dds":"0.38095238095238093","last_synced_commit":"1bf0864a2d54d15263f73cff86a7e5303b33f567"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/moskrc/crawlerdetect","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moskrc%2Fcrawlerdetect","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moskrc%2Fcrawlerdetect/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moskrc%2Fcrawlerdetect/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moskrc%2Fcrawlerdetect/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/moskrc","download_url":"https://codeload.github.com/moskrc/crawlerdetect/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moskrc%2Fcrawlerdetect/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478377,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot","crawler","detect","python","spider","user-agent"],"created_at":"2026-01-16T11:48:20.427Z","updated_at":"2026-01-16T11:48:20.945Z","avatar_url":"https://github.com/moskrc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![test](https://github.com/moskrc/crawlerdetect/actions/workflows/python-package.yml/badge.svg)](https://github.com/moskrc/crawlerdetect/actions/workflows/python-package.yml)\n\n# About CrawlerDetect\n\nThis is a Python wrapper for [CrawlerDetect](https://github.com/JayBizzle/Crawler-Detect) a web crawler detection library. It helps identify\nbots, crawlers, and spiders using the user agent and other HTTP headers. Currently, it can detect\nover 3,678 bots, spiders, and crawlers.\n\n# How to install\n```bash\n$ pip install crawlerdetect\n```\n\n# How to use\n\n## Method Reference\n| camelCase | snake_case | Description                       |\n|-----------|------------|-----------------------------------|\n| `isCrawler()` | `is_crawler()` | Check if user agent is a crawler  |\n| `getMatches()` | `get_matches()` | Get the name of detected crawlers |\n\n## Variant 1\n```Python\nfrom crawlerdetect import CrawlerDetect\ncrawler_detect = CrawlerDetect()\ncrawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')\n# true if crawler user agent detected\n```\n\n## Variant 2\n```Python\nfrom crawlerdetect import CrawlerDetect\ncrawler_detect = CrawlerDetect(user_agent='Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html)')\ncrawler_detect.isCrawler()\n# true if crawler user agent detected\n```\n\n## Variant 3\n```Python\nfrom crawlerdetect import CrawlerDetect\ncrawler_detect = CrawlerDetect(headers={'DOCUMENT_ROOT': '/home/test/public_html', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTP_ACCEPT': '*/*', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_CACHE_CONTROL': 'no-cache', 'HTTP_CONNECTION': 'Keep-Alive', 'HTTP_FROM': 'googlebot(at)googlebot.com', 'HTTP_HOST': 'www.test.com', 'HTTP_PRAGMA': 'no-cache', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36', 'PATH': '/bin:/usr/bin', 'QUERY_STRING': 'order=closingDate', 'REDIRECT_STATUS': '200', 'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '3360', 'REQUEST_METHOD': 'GET', 'REQUEST_URI': '/?test=testing', 'SCRIPT_FILENAME': '/home/test/public_html/index.php', 'SCRIPT_NAME': '/index.php', 'SERVER_ADDR': '127.0.0.1', 'SERVER_ADMIN': 'webmaster@test.com', 'SERVER_NAME': 'www.test.com', 'SERVER_PORT': '80', 'SERVER_PROTOCOL': 'HTTP/1.1', 'SERVER_SIGNATURE': '', 'SERVER_SOFTWARE': 'Apache', 'UNIQUE_ID': 'Vx6MENRxerBUSDEQgFLAAAAAS', 'PHP_SELF': '/index.php', 'REQUEST_TIME_FLOAT': 1461619728.0705, 'REQUEST_TIME': 1461619728})\ncrawler_detect.isCrawler()\n# true if crawler user agent detected\n```\n## Output the name of the bot that matched (if any)\n```Python\nfrom crawlerdetect import CrawlerDetect\ncrawler_detect = CrawlerDetect()\ncrawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')\n# true if crawler user agent detected\ncrawler_detect.getMatches()\n# Sosospider\n```\n\n## Get version of the library\n```Python\nimport crawlerdetect\ncrawlerdetect.__version__\n```\n\n# Contributing\n\nThe patterns and testcases are synced from the PHP repo. If you find a bot/spider/crawler user agent that crawlerdetect fails to detect, please submit a pull request with the regex pattern and a testcase to the [upstream PHP repo](https://github.com/JayBizzle/Crawler-Detect).\n\nFailing that, just create an issue with the user agent you have found, and we'll take it from there :)\n\n# Development\n\n## Setup\n```bash\n$ poetry install\n```\n\n## Running tests\n```bash\n$ poetry run pytest\n```\n\n## Update crawlers from upstream PHP repo\n```bash\n$ ./update_data.sh\n```\n\n## Bump version\n```bash\n$ poetry run bump-my-version bump [patch|minor|major]\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoskrc%2Fcrawlerdetect","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoskrc%2Fcrawlerdetect","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoskrc%2Fcrawlerdetect/lists"}