{"id":13398279,"url":"https://github.com/lorien/grab","last_synced_at":"2025-05-14T04:07:31.518Z","repository":{"id":37774121,"uuid":"9787912","full_name":"lorien/grab","owner":"lorien","description":"Web Scraping Framework","archived":false,"fork":false,"pushed_at":"2024-03-12T04:38:09.000Z","size":6113,"stargazers_count":2403,"open_issues_count":1,"forks_count":275,"subscribers_count":87,"default_branch":"master","last_synced_at":"2025-05-11T16:41:40.757Z","etag":null,"topics":["asynchronous","crawler","crawling","framework","http-client","network","pycurl","python","python-library","python3","scraping","spider","urllib3","web-scraping"],"latest_commit_sha":null,"homepage":"https://grab.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lorien.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-05-01T08:10:22.000Z","updated_at":"2025-05-03T22:09:08.000Z","dependencies_parsed_at":"2023-02-07T15:02:05.770Z","dependency_job_id":"deaf83e8-7e32-4b96-a7c1-95ac689a3632","html_url":"https://github.com/lorien/grab","commit_stats":{"total_commits":2290,"total_committers":67,"mean_commits":34.17910447761194,"dds":"0.23013100436681222","last_synced_commit":"35e44c2405b7c944a47df67ba3f024113acce74f"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorien%2Fgrab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorien%2Fgrab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorien%2Fgrab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorien%2Fgrab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lorien","download_url":"https://codeload.github.com/lorien/grab/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254069132,"owners_count":22009494,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asynchronous","crawler","crawling","framework","http-client","network","pycurl","python","python-library","python3","scraping","spider","urllib3","web-scraping"],"created_at":"2024-07-30T19:00:21.686Z","updated_at":"2025-05-14T04:07:31.407Z","avatar_url":"https://github.com/lorien.png","language":"Python","readme":"# Grab Framework Project\n\n[![Grab Test Status](https://github.com/lorien/grab/actions/workflows/test.yml/badge.svg)](https://github.com/lorien/grab/actions/workflows/test.yml)\n[![Code Quality](https://github.com/lorien/grab/actions/workflows/check.yml/badge.svg)](https://github.com/lorien/grab/actions/workflows/test.yml)\n[![Type Check](https://github.com/lorien/grab/actions/workflows/mypy.yml/badge.svg)](https://github.com/lorien/grab/actions/workflows/mypy.yml)\n[![Grab Test Coverage Status](https://coveralls.io/repos/github/lorien/grab/badge.svg)](https://coveralls.io/github/lorien/grab)\n[![Pypi Downloads](https://img.shields.io/pypi/dw/grab?label=Downloads)](https://pypistats.org/packages/grab)\n[![Grab Documentation](https://readthedocs.org/projects/grab/badge/?version=latest)](https://grab.readthedocs.io/en/latest/)\n\n## Status of Project\n\nI myself have not used Grab for many years. I am not sure it is being used by anybody at present time.\nNonetheless I decided to refactor the project, just for fun. I have annotated\nwhole code base with mypy type hints (in strict mode). Also the whole code base complies to\npylint and flake8 requirements. There are few exceptions: very large methods and classes with too many local\natributes and variables. I will refactor them eventually.\n\nThe current and the only network backend is [urllib3](https://github.com/urllib3/urllib3).\n\nI have refactored a few components into external packages: [proxylist](https://github.com/lorien/proxylist),\n[procstat](https://github.com/lorien/procstat), [selection](https://github.com/lorien/selection),\n[unicodec](https://github.com/lorien/unicodec), [user\\_agent](https://github.com/lorien/user_agent)\n\nFeel free to give feedback in Telegram groups: [@grablab](https://t.me/grablab) and [@grablab\\_ru](https://t.me/grablab_ru)\n\n## Things to be done next\n\n* Refactor source code to remove all pylint disable comments like:\n    * too-many-instance-attributes\n    * too-many-arguments\n    * too-many-locals\n    * too-many-public-methods\n* Make 100% test coverage, it is about 95% now\n* Release new version to pypi\n* Refactor more components into external packages\n* More abstract interfaces\n* More data structures and types\n* Decouple connections between internal components\n\n## Installation\n\nThat will install old Grab released in 2018 year: `pip install -U grab`\n\nThe updated Grab available in github repository is 100% not compatible with spiders and crawlers\nwritten for Grab released in 2018 year.\n\n## Documentation\n\nUpdated documenation is here https://grab.readthedocs.io/en/latest/ Most updates are removings\ncontent related to features I have removed from the Grab since 2018 year.\n\nDocumentation for old Grab version 0.6.41 (released in 2018 year) is here https://grab.readthedocs.io/en/v0.6.41-doc/\n","funding_links":[],"categories":["Python","Web Crawling","Web Scraping Frameworks","Web爬行","Web Scraping \u0026 Crawling","🕸️ Web Scraping \u0026 Crawling","HTML 处理","Web Crawling [🔝](#readme)","📚 فهرست"],"sub_categories":["Tools","وب اسکرپینگ"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Florien%2Fgrab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Florien%2Fgrab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Florien%2Fgrab/lists"}