Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://github.com/apify/crawlee-python
apify automation beautifulsoup crawler crawling headless headless-chrome pip playwright python scraper scraping web-crawler web-crawling web-scraping
Last synced: about 2 months ago
JSON representation
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
- Host: GitHub
- URL: https://github.com/apify/crawlee-python
- Owner: apify
- License: apache-2.0
- Created: 2024-01-10T08:44:47.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2024-07-31T11:45:10.000Z (about 2 months ago)
- Last Synced: 2024-07-31T16:36:23.940Z (about 2 months ago)
- Topics: apify, automation, beautifulsoup, crawler, crawling, headless, headless-chrome, pip, playwright, python, scraper, scraping, web-crawler, web-crawling, web-scraping
- Language: Python
- Homepage: https://crawlee.dev/python/
- Size: 20.8 MB
- Stars: 3,412
- Watchers: 25
- Forks: 229
- Open Issues: 59
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- AiTreasureBox - apify/crawlee-python - 09-17_3865_32](https://img.shields.io/github/stars/apify/crawlee-python.svg)|Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.| (Repos)
- awesome-ChatGPT-repositories - crawlee-python - Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation. (Browser-extensions)