https://github.com/machu-gwu/crawlib-project
tool set for crawler project.
https://github.com/machu-gwu/crawlib-project
crawler framework mongodb python scrapy
Last synced: 6 months ago
JSON representation
tool set for crawler project.
- Host: GitHub
- URL: https://github.com/machu-gwu/crawlib-project
- Owner: MacHu-GWU
- License: mit
- Created: 2016-08-29T21:35:35.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2019-12-31T03:34:14.000Z (over 6 years ago)
- Last Synced: 2025-02-19T15:07:42.727Z (about 1 year ago)
- Topics: crawler, framework, mongodb, python, scrapy
- Language: Python
- Homepage:
- Size: 638 KB
- Stars: 1
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE.txt
- Authors: AUTHORS.rst
Awesome Lists containing this project
README
.. image:: https://readthedocs.org/projects/crawlib/badge/?version=latest
:target: https://crawlib.readthedocs.io/index.html
:alt: Documentation Status
.. image:: https://circleci.com/gh/MacHu-GWU/crawlib-project.svg?style=svg
:target: https://circleci.com/gh/MacHu-GWU/crawlib-project
.. image:: https://img.shields.io/pypi/v/crawlib.svg
:target: https://pypi.python.org/pypi/crawlib
.. image:: https://img.shields.io/pypi/l/crawlib.svg
:target: https://pypi.python.org/pypi/crawlib
.. image:: https://img.shields.io/pypi/pyversions/crawlib.svg
:target: https://pypi.python.org/pypi/crawlib
.. image:: https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social
:target: https://github.com/MacHu-GWU/crawlib-project
------
.. image:: https://img.shields.io/badge/Link-Document-blue.svg
:target: https://crawlib.readthedocs.io/index.html
.. image:: https://img.shields.io/badge/Link-API-blue.svg
:target: https://crawlib.readthedocs.io/py-modindex.html
.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg
:target: https://crawlib.readthedocs.io/py-modindex.html
.. image:: https://img.shields.io/badge/Link-Install-blue.svg
:target: `install`_
.. image:: https://img.shields.io/badge/Link-GitHub-blue.svg
:target: https://github.com/MacHu-GWU/crawlib-project
.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg
:target: https://github.com/MacHu-GWU/crawlib-project/issues
.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg
:target: https://github.com/MacHu-GWU/crawlib-project/issues
.. image:: https://img.shields.io/badge/Link-Download-blue.svg
:target: https://pypi.org/pypi/crawlib#files
Welcome to ``crawlib`` Documentation
==============================================================================
``crawlib`` is a board-first-search crawler framework for targeting-crawler (For those you know where's your data located and how's been organized). You just need to focus on the data model and html extraction logic, and let the framework do the rest of things like:
- duplicate filter
- recursive crawling
- status tracking
- periodical update
Currently it supports mongodb as backend storage only.
.. _install:
Install
------------------------------------------------------------------------------
``crawlib`` is released on PyPI, so all you need is:
.. code-block:: console
$ pip install crawlib
To upgrade to latest version:
.. code-block:: console
$ pip install --upgrade crawlib