https://github.com/werneror/robotstxtdetection
Automatically check whether a website has robots.txt
https://github.com/werneror/robotstxtdetection
chrome-extension firefox-addon firefox-extension firefox-webextension
Last synced: 3 months ago
JSON representation
Automatically check whether a website has robots.txt
- Host: GitHub
- URL: https://github.com/werneror/robotstxtdetection
- Owner: Werneror
- License: mit
- Created: 2019-11-12T13:47:11.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-11-16T04:20:35.000Z (over 5 years ago)
- Last Synced: 2025-01-21T01:42:00.687Z (4 months ago)
- Topics: chrome-extension, firefox-addon, firefox-extension, firefox-webextension
- Language: JavaScript
- Homepage:
- Size: 43.9 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RobotsTxtDetection
Automatically check whether a website has robots.txt.
## Installation
This browser extension is compatible with Firefox and Google Chrome. However, we recommend using it in Firefox.
### Firefox
Open , and click `Add to Firefox`.
### Google Chrome
Download source code.
```
git clone https://github.com/Werneror/RobotsTxtDetection.git
```Open Google Chrome, enter `Chrome://extensions/` in the address bar, enable `Developer mode`, click `Load unpacked` and chose `RobotsTxtDetection`。
## Usage
Surf the web as usual, and if a website has robots.txt, a small icon will appear automatically, or a gray icon will automatically light up.
## Configuration
By default, only `/robots.txt` is detected. You can configure to detect more paths. If the path starts with `/` absolute path is detected, otherwise relative path is detected.
The HTTP status code 404, 301, or 302 assumes that the path does not exist. You can also configure other HTTP status codes if necessary.
For example, your configuration is to detect:
```
/robots.txt
security.txt
```and you are visiting `https://www.example.com/test/index.html`, we actually detect:
```
https://www.example.com/robots.txt
https://www.example.com/test/security.txt
```## Details
### Cache
The path of a website is relatively stable, generally speaking, it will not change frequently. So we cache as much as we can. Requests for a specific URL will only be sent once, unless the browser is restarted.
### Redirect
In fact, we can't get the 301 or 302 status code because `XMLHttpRequest` automatically follows the redirection. We compare the URL of the request and the URL of the response to determine whether a redirect has occurred. So we can't distinguish 301 and 302 very well.
## License
MIT