Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/michaelcurrin/html-screenshot-py
Take fullpage screenshots for a batch of URLs with this easy CLI tool
https://github.com/michaelcurrin/html-screenshot-py
html image python screenshot selenium webscraper
Last synced: 3 months ago
JSON representation
Take fullpage screenshots for a batch of URLs with this easy CLI tool
- Host: GitHub
- URL: https://github.com/michaelcurrin/html-screenshot-py
- Owner: MichaelCurrin
- License: mit
- Created: 2021-10-26T09:50:19.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-08-08T08:42:10.000Z (6 months ago)
- Last Synced: 2024-10-12T19:36:38.784Z (4 months ago)
- Topics: html, image, python, screenshot, selenium, webscraper
- Language: Python
- Homepage: https://michaelcurrin.github.io/html-screenshot-py/
- Size: 104 KB
- Stars: 4
- Watchers: 3
- Forks: 2
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# HTML Screenshot PY 🌐 🖼 🐍
> Take fullpage screenshots for a batch of URLs with this easy CLI tool[![GitHub tag](https://img.shields.io/github/tag/MichaelCurrin/html-screenshot-py?include_prereleases=&sort=semver&color=blue)](https://github.com/MichaelCurrin/html-screenshot-py/releases/)
[![License](https://img.shields.io/badge/License-MIT-blue)](#license)[![Made with Python](https://img.shields.io/badge/Python->=3.6-blue?logo=python&logoColor=white)](https://python.org)
[![dependency - selenium](https://img.shields.io/badge/selenium-3-blue)](https://pypi.org/project/selenium)
[![dependency - requests](https://img.shields.io/badge/requests-2-blue)](https://pypi.org/project/requests)## About
An easy Python CLI tool. Provide it a batch of one or more URLs as webpages to scrape. Whether for your own sites or by someone else.
### Formats
It uses two approaches, depending on the format:
- **HTML pages** - The tool will go through each to load the page, take a screenshot of the _entire_ page and save it a PNG file. Using _selenium_.
- **Binary data** - For files with a PDF or image extension, the file will be downloaded directly (for speed and reliability) instead of trying to take a screenshot (which could be massive for PDFs with many pages). Using _requests_.### Use-cases
When you should use this tool.
- **Archive** - Save a once-off copy of an article or a page design that inspires you, before it moves or disappears from the internet. Add as many URLS you like and download all of them.
- **Software development** - Create visual snapshots of a page on your website to track improvements and fixes over time. Or watch how a competitor's website changes.## Sample usage
For one webpage:
```sh
$ python -m htmlscreenshot.scrape 'https://example.com'
```For multiple pages:
```sh
$ python -m htmlscreenshot ~/path/to/urls.txt
```Then find your screenshots as PNGs in the project's output directory.
## Documentation
[![view - Documentation](https://img.shields.io/badge/view-Online_Documentation-blue?style=for-the-badge)](https://michaelcurrin.github.io/html-screenshot-py/ "Go to docs site")
## License
Released under [MIT](/LICENSE) by [@MichaelCurrin](https://github.com/MichaelCurrin).