https://github.com/emanuelefavero/robots-txt-templates-
This is a collection of robots.txt templates
https://github.com/emanuelefavero/robots-txt-templates-
crawlers robots-txt template user-agent web
Last synced: 7 months ago
JSON representation
This is a collection of robots.txt templates
- Host: GitHub
- URL: https://github.com/emanuelefavero/robots-txt-templates-
- Owner: emanuelefavero
- Created: 2023-01-22T03:50:43.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-22T03:51:17.000Z (over 2 years ago)
- Last Synced: 2025-02-04T16:50:28.268Z (8 months ago)
- Topics: crawlers, robots-txt, template, user-agent, web
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# robots.txt templates
This is a collection of robots.txt templates
### What is robots.txt?
- Robots.txt is a file that tells search engines which pages or files the crawler can or can not request from your site
### What is a crawler?
- A crawler is a program that browses the web automatically. It is used by search engines to update their web index.
## **Add a comment in robot.txt**
```text
# This is a comment
```## **Allow all**
```text
User-agent: *
Allow: /
```## **Disallow all**
```text
User-agent: *
Disallow: /
```## **Block a folder**
```text
User-agent: *
Disallow: /folder/
```## **Block a file**
```text
User-agent: *
Disallow: /file.html
```## **Block a file type**
```text
User-agent: *
Disallow: /*.pdf$
```## **Allow only Google**
```text
User-agent: *
Disallow: /User-agent: Googlebot
Allow: /
```## **Disallow only Google**
```text
User-agent: *
Allow: /User-agent: Googlebot
Disallow: /
```## **Link to your sitemap**
```text
User-agent: *
Sitemap: https://example.com/sitemap.xml
```> The Sitemap directive tells the crawler where to find your sitemap.
>
> A sitemap is a file that lists the pages of your site. It is used by search engines to index your site.## Slow down the crawler
```text
User-agent: *
Crawl-delay: 10
```> The Crawl-delay directive tells the crawler to wait at least 10 seconds between requests to your site.
---
## **User Agents**
- **Googlebot** - Used for Google Search
- **Bingbot** - Used for Bing Search
- **Slurp** - Yahoo's web crawler
- **DuckDuckBot** - Used by the DuckDuckGo search engine
- **Baiduspider** - This is a Chinese search engine
- **YandexBot** - This is a Russian search engine
- **facebot** - Used by Facebook
- **Pinterestbot** - Used by Pinterest
- **TwitterBot** - Used by Twitter