https://github.com/ecnepsnai/robots.txt-block-ai
A robots.txt to ask AI from stealing your content
https://github.com/ecnepsnai/robots.txt-block-ai
against-ai robots-txt
Last synced: 4 months ago
JSON representation
A robots.txt to ask AI from stealing your content
- Host: GitHub
- URL: https://github.com/ecnepsnai/robots.txt-block-ai
- Owner: ecnepsnai
- License: unlicense
- Created: 2023-09-29T18:13:31.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-29T18:27:25.000Z (over 1 year ago)
- Last Synced: 2025-01-18T09:53:14.562Z (5 months ago)
- Topics: against-ai, robots-txt
- Homepage:
- Size: 1000 Bytes
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Block AI with `robots.txt`
This is a community-built collection of robots.txt entries to ask AI services from scraping your sites content.
```
User-agent: CCBot
Disallow: /User-agent: ChatGPT-User
Disallow: /User-agent: GPTBot
Disallow: /User-agent: Google-Extended
Disallow: /User-agent: Omgilibot
Disallow: /User-agent: FacebookBot
Disallow: /User-agent: Amazonbot
Disallow: /
```## About robots.txt
`robots.txt` is a file that web crawlers _may choose_ to use to determine which parts of your website, if any, they can scrape. Many people use this to control which parts of their site a search engine will index.
**Importantly, however, robots.txt is merely a polite suggestion to the crawler.** Many web scraping services and tools do not honor robots.txt, including some AI services.
## Credits
- [Patrick Samphire on Mastodon](https://mastodon.social/@[email protected]/111147479365809824)