Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mazzasaverio/scrapy-playwright-scrapegraphai
Web crawler using Scrapy + Playwright for dynamic content, featuring YAML-based configuration, PostgreSQL storage via aiosql, structured logging with logfire, and complete Docker/Terraform infrastructure. Built with uv package manager and Python 3.11+.
https://github.com/mazzasaverio/scrapy-playwright-scrapegraphai
aiosql crawler docker playwright scrapy scrapy-playwright terraform uv
Last synced: 1 day ago
JSON representation
Web crawler using Scrapy + Playwright for dynamic content, featuring YAML-based configuration, PostgreSQL storage via aiosql, structured logging with logfire, and complete Docker/Terraform infrastructure. Built with uv package manager and Python 3.11+.
- Host: GitHub
- URL: https://github.com/mazzasaverio/scrapy-playwright-scrapegraphai
- Owner: mazzasaverio
- License: mit
- Created: 2024-11-10T08:21:09.000Z (2 months ago)
- Default Branch: master
- Last Pushed: 2024-11-21T08:31:18.000Z (about 2 months ago)
- Last Synced: 2024-11-21T09:28:33.862Z (about 2 months ago)
- Topics: aiosql, crawler, docker, playwright, scrapy, scrapy-playwright, terraform, uv
- Language: Python
- Homepage:
- Size: 142 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Scrapy Frontier Crawler
A configurable web crawler built with Scrapy and Playwright for handling both static and dynamic content. The crawler can process different types of URLs and store results in a PostgreSQL database.
## Features
- 🔍 Three types of URL processing:
- Type 0: Direct target URL processing
- Type 1: Static page scanning for target URLs
- Type 2: Dynamic page scanning with depth navigation
- 🎭 Playwright integration for JavaScript-rendered content
- 📊 PostgreSQL storage for crawled URLs and stats
- 🔧 YAML-based configuration
- 📝 Structured logging with Logfire
- 🐳 Docker support
- ☁️ Azure deployment ready with Terraform## Prerequisites
- Python 3.11+
- PostgreSQL database
- [uv](https://github.com/astral-sh/uv) for package management
- Docker (optional)## Contributing
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request