Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/inspirehep/inspire-crawler
Crawler integration with INSPIRE-HEP.
https://github.com/inspirehep/inspire-crawler
Last synced: 5 days ago
JSON representation
Crawler integration with INSPIRE-HEP.
- Host: GitHub
- URL: https://github.com/inspirehep/inspire-crawler
- Owner: inspirehep
- License: gpl-2.0
- Created: 2016-02-11T15:36:15.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2024-06-20T11:45:40.000Z (5 months ago)
- Last Synced: 2024-09-22T06:07:04.355Z (about 2 months ago)
- Language: Python
- Size: 172 KB
- Stars: 4
- Watchers: 16
- Forks: 14
- Open Issues: 6
-
Metadata Files:
- Readme: README.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
Awesome Lists containing this project
README
..
This file is part of Invenio.
Copyright (C) 2016 CERN.Invenio is free software; you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.Invenio is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.You should have received a copy of the GNU General Public License
along with Invenio; if not, write to the
Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307, USA.In applying this license, CERN does not
waive the privileges and immunities granted to it by virtue of its status
as an Intergovernmental Organization or submit itself to any jurisdiction.=================
inspire-crawler
=================.. image:: https://img.shields.io/travis/inspirehep/inspire-crawler.svg
:target: https://travis-ci.org/inspirehep/inspire-crawler.. image:: https://img.shields.io/coveralls/inspirehep/inspire-crawler.svg
:target: https://coveralls.io/r/inspirehep/inspire-crawler.. image:: https://img.shields.io/github/tag/inspirehep/inspire-crawler.svg
:target: https://github.com/inspirehep/inspire-crawler/releases.. image:: https://img.shields.io/pypi/dm/inspire-crawler.svg
:target: https://pypi.python.org/pypi/inspire-crawler.. image:: https://img.shields.io/github/license/inspirehep/inspire-crawler.svg
:target: https://github.com/inspirehep/inspire-crawler/blob/master/LICENSECrawler integration with INSPIRE-HEP using scrapy project `HEPCrawl`_.
This module allows scheduling of crawler jobs to a `Scrapyd`_ instance serving
a `Scrapy`_ project. E.g. in this case the default scrapy project is `HEPCrawl`_.It integrates directly with `invenio-workflows`_ module to create workflows for every
record harvested by the crawler.This module is meant to use only with `INSPIRE-HEP`_ overlay. **Use at own risk.**
Full documentation is hosted here: http://pythonhosted.org/inspire-crawler/
See also documentation of HEPCrawl: http://pythonhosted.org/hepcrawl/
.. _HEPCrawl: http://pythonhosted.org/hepcrawl/
.. _Scrapyd: http://scrapyd.readthedocs.io/
.. _Scrapy: http://doc.scrapy.org/
.. _invenio-workflows: http://pythonhosted.org/invenio-workflows/
.. _INSPIRE-HEP: http://inspirehep.readthedocs.io