https://github.com/stopsopa/html-scraper-browserless
https://github.com/stopsopa/html-scraper-browserless
Last synced: over 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/stopsopa/html-scraper-browserless
- Owner: stopsopa
- Created: 2018-07-11T13:07:26.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2023-09-01T23:29:55.000Z (almost 3 years ago)
- Last Synced: 2025-02-06T08:13:16.734Z (over 1 year ago)
- Language: JavaScript
- Size: 558 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# Deprecated
(Deprecated -> use better https://github.com/stopsopa/html-scraper-browserless) Microservice tool to scraping html from "any" page
I wouldn't suggest it now - it's just old. But I'll leave it here.
# Installation:
git clone this repository and go to main directory
make install
cp config.js.dist config.js
# manually change password in config.js for basic auth
make start
# Using:
Just visit:
http://localhost:8811/generate
# Current execution environment:
- node v8.9.4
- yarn
- Docker version 18.03.1-ce, build 9ee9f40
# Ping:
http://xx.xx.xx.xx:8811/html-scraper-ping
http://slowwly.robertomurray.co.uk/delay/32000/url/https://github.com/stopsopa/docker-puppeteer-pdf-generator
# Useful things (irrelevant now):
docker run -it --rm puppeteer-alpine-generate-pdf /usr/bin/chromium-browser --version
$ Chromium 64.0.3282.168
or if you follow node:8-slim : https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md#running-puppeteer-in-docker
docker run -it --rm --cap-add=SYS_ADMIN --rm puppeteer-chrome-linux /usr/bin/google-chrome-unstable --version
Google Chrome 68.0.3438.3 dev
on mac:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --version
$ Google Chrome 66.0.3359.181
/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --version
$ Google Chrome 69.0.3445.0 canary