Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mrseanryan/style-scraper
Scrape styles from a website using reliable computed styling via a headless browser [fonts, colors, border styles ...]
https://github.com/mrseanryan/style-scraper
puppeteer scraping scraping-styling scraping-websites styling-css
Last synced: about 2 months ago
JSON representation
Scrape styles from a website using reliable computed styling via a headless browser [fonts, colors, border styles ...]
- Host: GitHub
- URL: https://github.com/mrseanryan/style-scraper
- Owner: mrseanryan
- License: mit
- Created: 2024-02-09T09:29:00.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2024-02-13T15:26:17.000Z (10 months ago)
- Last Synced: 2024-04-18T01:53:38.673Z (8 months ago)
- Topics: puppeteer, scraping, scraping-styling, scraping-websites, styling-css
- Language: JavaScript
- Homepage:
- Size: 432 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# style-scraper
Scrape styles from a website using reliable computed styling via a headless browser [fonts, colors, border styles ...]- programatically inspect elements of the website, without a visible browser
- extract correct *actual* computed styles for elements on the webpage
- elements found by co-ordinates (offset from top-left of page)
- screenshot of the full vertical length of the webpage (automatic scrolling)## Example execution
COMMAND:
```
./go.sh https://everydayphotos.net 180 300
```OUTPUT:
```
> [email protected] start
> node src/scrape-via-puppeteer/index.js https://everydayphotos.net 180 300Saving screenshot to: ./temp/screenshot.jpg
Get element by co-ordinates (180, 300)
A
HTML > IMG > BODY > DIV > DIV > DIV > A
{
backgroundColors: [ 'rgba(0, 0, 0, 0.6)', 'rgb(0, 0, 0)' ],
borderColors: [ 'rgb(255, 255, 255)', 'rgb(0, 0, 238)' ],
colors: [ 'rgb(255, 255, 255)', 'rgb(0, 0, 238)' ],
fontFamilies: [
'Frutiger, "Frutiger Linotype", Univers, Calibri, "Gill Sans", "Gill Sans MT", "Myriad Pro", Myriad, "DejaVu Sans Condensed", "Liberation Sans", "Nimbus Sans L", Tahoma, Geneva, "Helvetica Neue", Helvetica, Arial, sans-serif'
],
fontSizes: [ '20px' ]
}
```SCREENSHOT (cropped):
![everydayphotos.net screenshot](./images/screenshot-edp-truncated.jpg)
- the screenshot includes the full vertical length of the website, by automatically scrolling
## Setup
### Pre-requisites
- OS: Ubuntu
- nodejs
- node version 18```
curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash
source ~/.bashrc
source ~/.profilenvm install 18
nvm use 18node --version
```- C++ libraries
```
sudo apt-get update
sudo apt-get install -yq --no-install-recommends libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 libnss3
```### Install
```
chmod +x ./install.sh
./install.sh
``````
npm test
```## Usage
```
./go.sh
```- where x and y are the co-ordinates at which to inspect the webpage for style details.
## Trouble-shooting
1. Error on Ubuntu: "error while loading shared libraries: libgbm.so.1: cannot open shared object file: Puppeteer in Nodejs on AWS EC2 instance"
- solution: `sudo apt-get install -y libgbm-dev`## References
- https://pptr.dev/
- https://www.toptal.com/puppeteer/headless-browser-puppeteer-tutorial