Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dannyben/snapcrawl
Crawl a website and take screenshots
https://github.com/dannyben/snapcrawl
capture crawler gem ruby screenshot
Last synced: 8 days ago
JSON representation
Crawl a website and take screenshots
- Host: GitHub
- URL: https://github.com/dannyben/snapcrawl
- Owner: DannyBen
- License: mit
- Created: 2015-11-21T15:01:49.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2023-07-27T04:35:49.000Z (over 1 year ago)
- Last Synced: 2024-12-09T18:13:07.016Z (17 days ago)
- Topics: capture, crawler, gem, ruby, screenshot
- Language: Ruby
- Size: 135 KB
- Stars: 59
- Watchers: 3
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Snapcrawl - crawl a website and take screenshots
[![Gem Version](https://badge.fury.io/rb/snapcrawl.svg)](http://badge.fury.io/rb/snapcrawl)
[![Build Status](https://github.com/DannyBen/snapcrawl/workflows/Test/badge.svg)](https://github.com/DannyBen/snapcrawl/actions?query=workflow%3ATest)
[![Code Climate](https://codeclimate.com/github/DannyBen/snapcrawl/badges/gpa.svg)](https://codeclimate.com/github/DannyBen/snapcrawl)---
Snapcrawl is a command line utility for crawling a website and saving
screenshots.## Features
- Crawls a website to any given depth and saves screenshots
- Can capture the full length of the page
- Can use a specific resolution for screenshots
- Skips capturing if the screenshot was already saved recently
- Uses local caching to avoid expensive crawl operations if not needed
- Reports broken links## Install
**Using Docker**
You can run Snapcrawl by using this docker image (which contains all the
necessary prerequisites):```shell
$ alias snapcrawl='docker run --rm -it --network host --volume "$PWD:/app" dannyben/snapcrawl'
```For more information on the Docker image, refer to the [docker-snapcrawl][3] repository.
**Using Ruby**
```shell
$ gem install snapcrawl
```Note that Snapcrawl requires [PhantomJS][1] and [ImageMagick][2].
## Usage
Snapcrawl can be configured either through a configuration file (YAML), or by specifying options in the command line.
```shell
$ snapcrawl
Usage:
snapcrawl URL [--config FILE] [SETTINGS...]
snapcrawl -h | --help
snapcrawl -v | --version
```The default configuration filename is `snapcrawl.yml`.
Using the `--config` flag will create a template configuration file if it is not present:
```shell
$ snapcrawl example.com --config snapcrawl
```### Specifying options in the command line
All configuration options can be specified in the command line as `key=value` pairs:
```shell
$ snapcrawl example.com log_level=0 depth=2 width=1024
```### Sample configuration file
```yaml
# All values below are the default values# log level (0-4) 0=DEBUG 1=INFO 2=WARN 3=ERROR 4=FATAL
log_level: 1# log_color (yes, no, auto)
# yes = always show log color
# no = never use colors
# auto = only use colors when running in an interactive terminal
log_color: auto# number of levels to crawl, 0 means capture only the root URL
depth: 1# screenshot width in pixels
width: 1280# screenshot height in pixels, 0 means the entire height
height: 0# number of seconds to consider the page cache and its screenshot fresh
cache_life: 86400# where to store the HTML page cache
cache_dir: cache# where to store screenshots
snaps_dir: snaps# screenshot filename template, where '%{url}' will be replaced with a
# slug version of the URL (no need to include the .png extension)
name_template: '%{url}'# urls not matching this regular expression will be ignored
url_whitelist:# urls matching this regular expression will be ignored
url_blacklist:# take a screenshot of this CSS selector only
css_selector:# when true, ignore SSL related errors
skip_ssl_verification: false# set to any number of seconds to wait for the page to load before taking
# a screenshot, leave empty to not wait at all (only needed for pages with
# animations or other post-load events).
screenshot_delay:
```## Contributing / Support
If you experience any issue, have a question or a suggestion, or if you wish
to contribute, feel free to [open an issue][issues].---
[1]: http://phantomjs.org/download.html
[2]: https://imagemagick.org/script/download.php
[3]: https://github.com/DannyBen/docker-snapcrawl
[issues]: https://github.com/DannyBen/snapcrawl/issues