Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lorien/grab
Web Scraping Framework
https://github.com/lorien/grab
asynchronous crawler crawling framework http-client network pycurl python python-library python3 scraping spider urllib3 web-scraping
Last synced: 1 day ago
JSON representation
Web Scraping Framework
- Host: GitHub
- URL: https://github.com/lorien/grab
- Owner: lorien
- License: mit
- Created: 2013-05-01T08:10:22.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2024-03-12T04:38:09.000Z (9 months ago)
- Last Synced: 2024-12-04T20:10:40.363Z (9 days ago)
- Topics: asynchronous, crawler, crawling, framework, http-client, network, pycurl, python, python-library, python3, scraping, spider, urllib3, web-scraping
- Language: Python
- Homepage: https://grab.readthedocs.io
- Size: 5.83 MB
- Stars: 2,395
- Watchers: 89
- Forks: 274
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- my-awesome-starred - grab - Web Scraping Framework (Python)
- awesome - lorien/grab - Web Scraping Framework (Python)
- awesome-python-resources - GitHub - 3% open · ⏱️ 01.03.2022): (HTML 处理)
- awesome-repositories - lorien/grab - Web Scraping Framework (Python)
- starred-awesome - grab - Web Scraping Framework (Python)
- best-of-web-python - GitHub - 9% open · ⏱️ 01.07.2023): (Web Scraping & Crawling)
README
# Grab Framework Project
[![Grab Test Status](https://github.com/lorien/grab/actions/workflows/test.yml/badge.svg)](https://github.com/lorien/grab/actions/workflows/test.yml)
[![Code Quality](https://github.com/lorien/grab/actions/workflows/check.yml/badge.svg)](https://github.com/lorien/grab/actions/workflows/test.yml)
[![Type Check](https://github.com/lorien/grab/actions/workflows/mypy.yml/badge.svg)](https://github.com/lorien/grab/actions/workflows/mypy.yml)
[![Grab Test Coverage Status](https://coveralls.io/repos/github/lorien/grab/badge.svg)](https://coveralls.io/github/lorien/grab)
[![Pypi Downloads](https://img.shields.io/pypi/dw/grab?label=Downloads)](https://pypistats.org/packages/grab)
[![Grab Documentation](https://readthedocs.org/projects/grab/badge/?version=latest)](https://grab.readthedocs.io/en/latest/)## Status of Project
I myself have not used Grab for many years. I am not sure it is being used by anybody at present time.
Nonetheless I decided to refactor the project, just for fun. I have annotated
whole code base with mypy type hints (in strict mode). Also the whole code base complies to
pylint and flake8 requirements. There are few exceptions: very large methods and classes with too many local
atributes and variables. I will refactor them eventually.The current and the only network backend is [urllib3](https://github.com/urllib3/urllib3).
I have refactored a few components into external packages: [proxylist](https://github.com/lorien/proxylist),
[procstat](https://github.com/lorien/procstat), [selection](https://github.com/lorien/selection),
[unicodec](https://github.com/lorien/unicodec), [user\_agent](https://github.com/lorien/user_agent)Feel free to give feedback in Telegram groups: [@grablab](https://t.me/grablab) and [@grablab\_ru](https://t.me/grablab_ru)
## Things to be done next
* Refactor source code to remove all pylint disable comments like:
* too-many-instance-attributes
* too-many-arguments
* too-many-locals
* too-many-public-methods
* Make 100% test coverage, it is about 95% now
* Release new version to pypi
* Refactor more components into external packages
* More abstract interfaces
* More data structures and types
* Decouple connections between internal components## Installation
That will install old Grab released in 2018 year: `pip install -U grab`
The updated Grab available in github repository is 100% not compatible with spiders and crawlers
written for Grab released in 2018 year.## Documentation
Updated documenation is here https://grab.readthedocs.io/en/latest/ Most updates are removings
content related to features I have removed from the Grab since 2018 year.Documentation for old Grab version 0.6.41 (released in 2018 year) is here https://grab.readthedocs.io/en/v0.6.41-doc/