https://github.com/lainx86/latae
A simple python library to parse and read robots.txt files
https://github.com/lainx86/latae
library package python python-3 python-package scraper
Last synced: 10 months ago
JSON representation
A simple python library to parse and read robots.txt files
- Host: GitHub
- URL: https://github.com/lainx86/latae
- Owner: Lainx86
- License: mit
- Created: 2022-03-07T16:03:49.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2022-03-12T09:09:43.000Z (about 4 years ago)
- Last Synced: 2025-07-23T07:32:10.608Z (10 months ago)
- Topics: library, package, python, python-3, python-package, scraper
- Language: Python
- Homepage: https://pypi.org/project/latae/
- Size: 231 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Latae
> A pure Python library for parsing and reading robots.txt files
## 🛠Note
Latae is currently in heavy development, expect bugs! More features are planned.
## 💻 Usage
Via a file on your local system...
```python
import latae as lt
with open("robots.txt", "r") as f:
rb_file = f.readlines()
# Get disallowed paths in the form of a Dict
lt.get_disallowed(rb_file)
# Get the XML sitemap
lt.get_sitemap(rb_file)
```
...Or via the `requests` module
```python
import requests
import latae as lt
rb_file = requests.get("https://duckduckgo.com/robots.txt").text
# Get disallowed paths in the form of a Dict
lt.get_disallowed(rb_file.splitlines())
# Get the XML sitemap
lt.get_sitemap(rb_file.splitlines())
```