Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mazzasaverio/doccrawl
Simple document crawler that harvests PDFs and documents from configured web sources.
https://github.com/mazzasaverio/doccrawl
asyncpg data-engineering docker logfire playwright postgresql pydantic-v2 python3 scrapegraphai
Last synced: 9 days ago
JSON representation
Simple document crawler that harvests PDFs and documents from configured web sources.
- Host: GitHub
- URL: https://github.com/mazzasaverio/doccrawl
- Owner: mazzasaverio
- License: mit
- Created: 2024-10-27T11:23:24.000Z (10 days ago)
- Default Branch: master
- Last Pushed: 2024-10-27T12:00:29.000Z (10 days ago)
- Last Synced: 2024-10-27T13:40:55.679Z (10 days ago)
- Topics: asyncpg, data-engineering, docker, logfire, playwright, postgresql, pydantic-v2, python3, scrapegraphai
- Language: Python
- Homepage:
- Size: 118 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# doccrawl
Simple document crawler that harvests PDFs and documents from configured web sources.