https://github.com/benwebber/texting-robots-py
Python binding for the Texting Robots robots.txt parser
https://github.com/benwebber/texting-robots-py
robots-txt scraping
Last synced: 6 months ago
JSON representation
Python binding for the Texting Robots robots.txt parser
- Host: GitHub
- URL: https://github.com/benwebber/texting-robots-py
- Owner: benwebber
- License: apache-2.0
- Created: 2025-07-25T04:34:48.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-25T06:26:25.000Z (11 months ago)
- Last Synced: 2025-08-25T23:53:14.676Z (10 months ago)
- Topics: robots-txt, scraping
- Language: Python
- Homepage: https://pypi.org/project/texting-robots-py/
- Size: 22.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.Apache-2.0
Awesome Lists containing this project
README
# texting-robots-py
Python binding for the [Texting Robots](https://github.com/Smerity/texting_robots) `robots.txt` parser.
Refer to that project for details.
```
pip install texting-robots-py
```
## Usage
```python
from texting_robots import Robot
txt = b'''
User-Agent: FerrisCrawler
Allow: /ocean
Disallow: /rust
Disallow: /forest*.py
Crawl-Delay: 10
User-Agent: *
Disallow: /
Sitemap: https://www.example.com/site.xml
'''
robot = Robot('FerrisCrawler', txt)
assert robot.delay == 10
assert robot.sitemaps == ['https://www.example.com/site.xml']
assert robot.allowed('https://www.rust-lang.org/ocean')
assert robot.allowed('/ocean')
assert robot.allowed('/ocean/reef.html')
assert not robot.allowed('/rust')
assert not robot.allowed('/forest/tree/snake.py')
```
## Version numbering
The version of this project will equal the version of Texting Robots it depends on. For example, version 0.2.2 of `texting-robots-py` depends on version 0.2.2 of the `texting_robots` crate.
If necessary, this project will use [post releases](https://packaging.python.org/en/latest/specifications/version-specifiers/#post-releases) for changes specific to the Python package.
## Licence
This project is licensed under both the Apache 2.0 and MIT licences.