Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries
https://github.com/idlesign/gallerycrawler

crawler gallery images python3

Last synced: about 2 months ago
JSON representation

Generic crawling for galleries

Host: GitHub
URL: https://github.com/idlesign/gallerycrawler
Owner: idlesign
License: bsd-3-clause
Created: 2020-07-31T13:02:58.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2020-09-14T13:35:42.000Z (over 4 years ago)
Last Synced: 2024-11-30T03:17:30.657Z (2 months ago)
Topics: crawler, gallery, images, python3
Language: Python
Homepage:
Size: 19.5 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG
- Contributing: CONTRIBUTING
- License: LICENSE

Awesome Lists containing this project

README

        gallerycrawler

==============

https://github.com/idlesign/gallerycrawler

|release| |lic|

.. |release| image:: https://img.shields.io/pypi/v/gallerycrawler.svg

    :target: https://pypi.python.org/pypi/gallerycrawler

.. |lic| image:: https://img.shields.io/pypi/l/gallerycrawler.svg

    :target: https://pypi.python.org/pypi/gallerycrawler

Description

-----------

*Generic crawling for galleries*

1. Crawler starts from gallery listing URL;

2. It visits every details page mentioned on current listing page;

3. It gathers information from each details page;

4. It moves to the next listing URL.

5. Etc.

.. code-block:: python

    from galerycrawler.toolbox import Crawler, dump

    # Define crawler.

    class MyCrawler(Crawler):

        selector_listing_next: str = '.page-next a'

        selector_listing_thumbnails: str = '.thumbnail img'

        selector_details: str = '.page-details a'

        selector_details_title: str = '.page-title'

        selector_details_img: str = '.image img'

        selector_details_author: str = '.image-author'

    # Run dumping.

    dump(

        crawler=MyCrawler,

        url='https://mysite.some/gallery/',

        fpath='dumped.html',

        probe=True,  # Use this to quick test your crawler

    )