{"id":15406780,"url":"https://github.com/dannyben/snapcrawl","last_synced_at":"2025-04-04T19:09:48.068Z","repository":{"id":49571564,"uuid":"46619860","full_name":"DannyBen/snapcrawl","owner":"DannyBen","description":"Crawl a website and take screenshots","archived":false,"fork":false,"pushed_at":"2025-01-16T16:54:44.000Z","size":141,"stargazers_count":61,"open_issues_count":0,"forks_count":11,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-28T18:11:56.064Z","etag":null,"topics":["capture","crawler","gem","ruby","screenshot"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DannyBen.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-11-21T15:01:49.000Z","updated_at":"2025-01-16T16:54:45.000Z","dependencies_parsed_at":"2024-10-19T12:33:29.056Z","dependency_job_id":"64743f0a-d7bc-4908-9a8c-b38b4fb1ff12","html_url":"https://github.com/DannyBen/snapcrawl","commit_stats":{"total_commits":130,"total_committers":2,"mean_commits":65.0,"dds":"0.30000000000000004","last_synced_commit":"197dc1787101c4c5f251096224030c6ebd776448"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DannyBen%2Fsnapcrawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DannyBen%2Fsnapcrawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DannyBen%2Fsnapcrawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DannyBen%2Fsnapcrawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DannyBen","download_url":"https://codeload.github.com/DannyBen/snapcrawl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247234921,"owners_count":20905854,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["capture","crawler","gem","ruby","screenshot"],"created_at":"2024-10-01T16:25:21.630Z","updated_at":"2025-04-04T19:09:48.048Z","avatar_url":"https://github.com/DannyBen.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Snapcrawl - crawl a website and take screenshots\n\n[![Gem Version](https://badge.fury.io/rb/snapcrawl.svg)](http://badge.fury.io/rb/snapcrawl)\n[![Build Status](https://github.com/DannyBen/snapcrawl/workflows/Test/badge.svg)](https://github.com/DannyBen/snapcrawl/actions?query=workflow%3ATest)\n[![Code Climate](https://codeclimate.com/github/DannyBen/snapcrawl/badges/gpa.svg)](https://codeclimate.com/github/DannyBen/snapcrawl)\n\n---\n\nSnapcrawl is a command line utility for crawling a website and saving\nscreenshots. \n\n---\n\n## :warning: Project Status: On Hold\n\nSnapcrawl relies on two deprecated libraries:  \n\n- [Webshot](https://github.com/vitalie/webshot) (last updated in August 2020)  \n- [PhantomJS](https://github.com/ariya/phantomjs) (last updated around 2020)  \n\nAs such, the project is **no longer actively maintained** and is unlikely to\nreceive updates or bug fixes.  \n\nIf you are interested in contributing and have ideas for replacing these\nlibraries with modern alternatives, you are welcome to propose changes via\npull requests or issues.\n\n---\n\n## Features\n\n- Crawls a website to any given depth and saves screenshots\n- Can capture the full length of the page\n- Can use a specific resolution for screenshots\n- Skips capturing if the screenshot was already saved recently\n- Uses local caching to avoid expensive crawl operations if not needed\n- Reports broken links\n\n## Install\n\n**Using Docker**\n\nYou can run Snapcrawl by using this docker image (which contains all the\nnecessary prerequisites):\n\n```shell\n$ alias snapcrawl='docker run --rm -it --network host --volume \"$PWD:/app\" dannyben/snapcrawl'\n```\n\nFor more information on the Docker image, refer to the [docker-snapcrawl][3] repository.\n\n**Using Ruby**\n\n```shell\n$ gem install snapcrawl\n```\n\nNote that Snapcrawl requires [PhantomJS][1] and [ImageMagick][2].\n\n## Usage\n\nSnapcrawl can be configured either through a configuration file (YAML), or by specifying options in the command line.\n\n```shell\n$ snapcrawl\nUsage:\n  snapcrawl URL [--config FILE] [SETTINGS...]\n  snapcrawl -h | --help\n  snapcrawl -v | --version\n```\n\nThe default configuration filename is `snapcrawl.yml`.\n\nUsing the `--config` flag will create a template configuration file if it is not present:\n\n```shell\n$ snapcrawl example.com --config snapcrawl\n```\n\n### Specifying options in the command line\n\nAll configuration options can be specified in the command line as `key=value` pairs:\n\n```shell\n$ snapcrawl example.com log_level=0 depth=2 width=1024\n```\n\n### Sample configuration file\n\n```yaml\n# All values below are the default values\n\n# log level (0-4) 0=DEBUG 1=INFO 2=WARN 3=ERROR 4=FATAL\nlog_level: 1\n\n# log_color (yes, no, auto)\n# yes  = always show log color\n# no   = never use colors\n# auto = only use colors when running in an interactive terminal\nlog_color: auto\n\n# number of levels to crawl, 0 means capture only the root URL\ndepth: 1\n\n# screenshot width in pixels\nwidth: 1280\n\n# screenshot height in pixels, 0 means the entire height\nheight: 0\n\n# number of seconds to consider the page cache and its screenshot fresh\ncache_life: 86400\n\n# where to store the HTML page cache\ncache_dir: cache\n\n# where to store screenshots\nsnaps_dir: snaps\n\n# screenshot filename template, where '%{url}' will be replaced with a \n# slug version of the URL (no need to include the .png extension)\nname_template: '%{url}'\n\n# urls not matching this regular expression will be ignored\nurl_whitelist: \n\n# urls matching this regular expression will be ignored\nurl_blacklist: \n\n# take a screenshot of this CSS selector only\ncss_selector: \n\n# when true, ignore SSL related errors\nskip_ssl_verification: false\n\n# set to any number of seconds to wait for the page to load before taking\n# a screenshot, leave empty to not wait at all (only needed for pages with\n# animations or other post-load events).\nscreenshot_delay: \n```\n\n## Contributing / Support\nIf you experience any issue, have a question or a suggestion, or if you wish\nto contribute, feel free to [open an issue][issues].\n\n---\n\n[1]: http://phantomjs.org/download.html\n[2]: https://imagemagick.org/script/download.php\n[3]: https://github.com/DannyBen/docker-snapcrawl\n[issues]: https://github.com/DannyBen/snapcrawl/issues\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdannyben%2Fsnapcrawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdannyben%2Fsnapcrawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdannyben%2Fsnapcrawl/lists"}