Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/crwlrsoft/robots-txt
Robots Exclusion Standard/Protocol Parser for Web Crawling/Scraping
https://github.com/crwlrsoft/robots-txt
hacktoberfest robots-exclusion-protocol robots-exclusion-standard robots-txt robots-txt-parser web-crawling web-scraping
Last synced: about 11 hours ago
JSON representation
Robots Exclusion Standard/Protocol Parser for Web Crawling/Scraping
- Host: GitHub
- URL: https://github.com/crwlrsoft/robots-txt
- Owner: crwlrsoft
- License: mit
- Created: 2021-10-24T20:48:11.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-11-06T13:55:00.000Z (2 months ago)
- Last Synced: 2024-11-06T14:43:50.923Z (2 months ago)
- Topics: hacktoberfest, robots-exclusion-protocol, robots-exclusion-standard, robots-txt, robots-txt-parser, web-crawling, web-scraping
- Language: PHP
- Homepage:
- Size: 28.3 KB
- Stars: 9
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Robots Exclusion Standard/Protocol Parser
## for Web Crawling/ScrapingUse this library within crawler/scraper programs to parse robots.txt
files and check if your crawler user-agent is allowed to load certain
paths.## Documentation
You can find the documentation at [crwlr.software](https://www.crwlr.software/packages/robots-txt/getting-started).## Contributing
If you consider contributing something to this package, read the [contribution guide (CONTRIBUTING.md)](CONTRIBUTING.md).