Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/idlesign/gallerycrawler
Generic crawling for galleries
https://github.com/idlesign/gallerycrawler
crawler gallery images python3
Last synced: 7 days ago
JSON representation
Generic crawling for galleries
- Host: GitHub
- URL: https://github.com/idlesign/gallerycrawler
- Owner: idlesign
- License: bsd-3-clause
- Created: 2020-07-31T13:02:58.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-09-14T13:35:42.000Z (about 4 years ago)
- Last Synced: 2024-10-07T18:21:15.833Z (30 days ago)
- Topics: crawler, gallery, images, python3
- Language: Python
- Homepage:
- Size: 19.5 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG
- Contributing: CONTRIBUTING
- License: LICENSE
Awesome Lists containing this project
README
gallerycrawler
==============
https://github.com/idlesign/gallerycrawler|release| |lic|
.. |release| image:: https://img.shields.io/pypi/v/gallerycrawler.svg
:target: https://pypi.python.org/pypi/gallerycrawler.. |lic| image:: https://img.shields.io/pypi/l/gallerycrawler.svg
:target: https://pypi.python.org/pypi/gallerycrawlerDescription
-----------*Generic crawling for galleries*
1. Crawler starts from gallery listing URL;
2. It visits every details page mentioned on current listing page;
3. It gathers information from each details page;
4. It moves to the next listing URL.
5. Etc... code-block:: python
from galerycrawler.toolbox import Crawler, dump
# Define crawler.
class MyCrawler(Crawler):selector_listing_next: str = '.page-next a'
selector_listing_thumbnails: str = '.thumbnail img'
selector_details: str = '.page-details a'
selector_details_title: str = '.page-title'
selector_details_img: str = '.image img'
selector_details_author: str = '.image-author'# Run dumping.
dump(
crawler=MyCrawler,
url='https://mysite.some/gallery/',
fpath='dumped.html',
probe=True, # Use this to quick test your crawler
)