{"id":24362433,"url":"https://github.com/machu-gwu/crawlib-project","last_synced_at":"2025-09-20T07:17:07.480Z","repository":{"id":82575137,"uuid":"66882484","full_name":"MacHu-GWU/crawlib-project","owner":"MacHu-GWU","description":"tool set for crawler project.","archived":false,"fork":false,"pushed_at":"2019-12-31T03:34:14.000Z","size":653,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-19T15:07:42.727Z","etag":null,"topics":["crawler","framework","mongodb","python","scrapy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MacHu-GWU.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-08-29T21:35:35.000Z","updated_at":"2020-06-19T09:45:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"59d30868-a8a5-4640-92c4-f26e2a6e5e9d","html_url":"https://github.com/MacHu-GWU/crawlib-project","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MacHu-GWU%2Fcrawlib-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MacHu-GWU%2Fcrawlib-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MacHu-GWU%2Fcrawlib-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MacHu-GWU%2Fcrawlib-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MacHu-GWU","download_url":"https://codeload.github.com/MacHu-GWU/crawlib-project/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243188241,"owners_count":20250457,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","framework","mongodb","python","scrapy"],"created_at":"2025-01-18T22:52:07.376Z","updated_at":"2025-09-20T07:17:02.419Z","avatar_url":"https://github.com/MacHu-GWU.png","language":"Python","readme":"\n.. image:: https://readthedocs.org/projects/crawlib/badge/?version=latest\n    :target: https://crawlib.readthedocs.io/index.html\n    :alt: Documentation Status\n\n.. image:: https://circleci.com/gh/MacHu-GWU/crawlib-project.svg?style=svg\n    :target: https://circleci.com/gh/MacHu-GWU/crawlib-project\n\n.. image:: https://img.shields.io/pypi/v/crawlib.svg\n    :target: https://pypi.python.org/pypi/crawlib\n\n.. image:: https://img.shields.io/pypi/l/crawlib.svg\n    :target: https://pypi.python.org/pypi/crawlib\n\n.. image:: https://img.shields.io/pypi/pyversions/crawlib.svg\n    :target: https://pypi.python.org/pypi/crawlib\n\n.. image:: https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social\n    :target: https://github.com/MacHu-GWU/crawlib-project\n\n------\n\n\n.. image:: https://img.shields.io/badge/Link-Document-blue.svg\n      :target: https://crawlib.readthedocs.io/index.html\n\n.. image:: https://img.shields.io/badge/Link-API-blue.svg\n      :target: https://crawlib.readthedocs.io/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg\n      :target: https://crawlib.readthedocs.io/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Install-blue.svg\n      :target: `install`_\n\n.. image:: https://img.shields.io/badge/Link-GitHub-blue.svg\n      :target: https://github.com/MacHu-GWU/crawlib-project\n\n.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg\n      :target: https://github.com/MacHu-GWU/crawlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg\n      :target: https://github.com/MacHu-GWU/crawlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Download-blue.svg\n      :target: https://pypi.org/pypi/crawlib#files\n\n\nWelcome to ``crawlib`` Documentation\n==============================================================================\n\n``crawlib`` is a board-first-search crawler framework for targeting-crawler (For those you know where's your data located and how's been organized). You just need to focus on the data model and html extraction logic, and let the framework do the rest of things like:\n\n- duplicate filter\n- recursive crawling\n- status tracking\n- periodical update\n\nCurrently it supports mongodb as backend storage only.\n\n\n.. _install:\n\nInstall\n------------------------------------------------------------------------------\n\n``crawlib`` is released on PyPI, so all you need is:\n\n.. code-block:: console\n\n    $ pip install crawlib\n\nTo upgrade to latest version:\n\n.. code-block:: console\n\n    $ pip install --upgrade crawlib","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachu-gwu%2Fcrawlib-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmachu-gwu%2Fcrawlib-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmachu-gwu%2Fcrawlib-project/lists"}