Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/datek/grawler
Simple web crawler in Go
https://github.com/datek/grawler
Last synced: about 1 month ago
JSON representation
Simple web crawler in Go
- Host: GitHub
- URL: https://github.com/datek/grawler
- Owner: DAtek
- License: mit
- Created: 2023-04-10T11:02:47.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-23T13:30:35.000Z (3 months ago)
- Last Synced: 2024-10-20T12:26:49.110Z (2 months ago)
- Language: Go
- Size: 18.6 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: changelog.md
- License: LICENSE
Awesome Lists containing this project
README
[![codecov](https://codecov.io/gh/DAtek/grawler/graph/badge.svg?token=3WMKQDRJ95)](https://codecov.io/gh/DAtek/grawler) [![Go Report Card](https://goreportcard.com/badge/github.com/DAtek/grawler)](https://goreportcard.com/report/github.com/DAtek/grawler)
# Grawler
## Simple and performant web crawler in Go
### How it works
The crawler is using 2 types of workers:
- **Page loaders**
- **Page analyzers****Page loaders** are consuming the **remaining URL channel** and are downloading pages from the internet and putting them into a **cache**, also putting the downloaded page's URL into the **downloaded URL channel**.
**Page analyzers** are consuming the **downloaded URL channel** and reading the page's content from the **cache**, then analyzing the content, extracting additional URLs and the wanted model (if possible). The extracted new URLs are being put into the **remaining URL channel**, the found model in the **result channel**.
The whole process is being started with putting the starting URL into the **remaining URL channel**.
The number of **Page loaders** and **Page analyzers** are configurable.
Your possibilities are endless: you can implement your own **cache**, **page loader** and **analyzer**, the mocks and interfaces in the source will help you.
For guidance, please have a look at `crawler_test.go`.
The gopher was made with the [Gopher Konstructor](https://quasilyte.dev/gopherkon)