https://github.com/markoczy/crawler

A Web Crawler based on Go and Chromedp
https://github.com/markoczy/crawler

cli crawler golang

Last synced: 5 months ago
JSON representation

A Web Crawler based on Go and Chromedp

Host: GitHub
URL: https://github.com/markoczy/crawler
Owner: markoczy
Created: 2020-11-06T16:11:04.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-04-19T13:13:15.000Z (about 4 years ago)
Last Synced: 2024-06-19T16:47:06.292Z (about 2 years ago)
Topics: cli, crawler, golang
Language: Go
Homepage:
Size: 96.7 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Crawler

A powerful Web Crawler based on Go and [Rod](https://github.com/go-rod/rod) for experienced users.

## Features

- **Chromium based:** Renders and analyzes websites using chromium headless (using Rod) to ensure that the pages are rendered just like in a web browser, this allows the crawler to analyze Javascript-Only pages just like normal html pages. Links are retreived by running JS scripts on the rendered page after the browser sends the "Dom Tree Loaded" event.
- **Recursive link scanning:** Visits a page and retreives all links from the page. Recursively visits all links up to the specified depth.
- **Recursive Download:** Downloads files from all retreived links.
- **Regex powered customizability:** Configure regular expressions to decide which links to follow or download. Capture tokens from url naming patterns and bake them into your desired output file names.
- **HTTP Headers:** Add any http header by file or in the command line by the `-header` switch. Also supports easy basic auth with the `-auth` switch and easy user agent setting with the `-user-agent` switch.
- **URL Permutations:** URLs to scan can be configured by permutative scemes e.g. `myfile-[1-99]` would create an url for `myfile-1`, `myfile-2` ... `myfile-99`. Multiple permutative scemes in one url (such as `mypage-[a,b,c,d]/myfile-[1-99]`) are also supported.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/markoczy/crawler

Awesome Lists containing this project

README