Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/foomo/walker
Crawls website and collect SEO relevant data
https://github.com/foomo/walker
apache-benchmark benchmarking foomo foomo-walker siege spider website-crawler
Last synced: about 15 hours ago
JSON representation
Crawls website and collect SEO relevant data
- Host: GitHub
- URL: https://github.com/foomo/walker
- Owner: foomo
- Created: 2019-02-21T10:41:27.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2022-09-27T17:14:50.000Z (about 2 years ago)
- Last Synced: 2024-11-04T17:46:57.817Z (9 days ago)
- Topics: apache-benchmark, benchmarking, foomo, foomo-walker, siege, spider, website-crawler
- Language: Go
- Homepage: https://www.foomo.org
- Size: 188 KB
- Stars: 2
- Watchers: 17
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Walker
Walker walkes aka as crawls through websites and collects performance and SEO relevant data. The results can be browsed through a very simple web interface. Apart from that they are exposed as prometheus metrics (not implemented yet).
**Be careful when crawling your website with walker with aggressive settings, it might take your site down**
## Configuration
```yaml
---
# target of your scrape
target: http://www.bestbytes.de
# number of concurrent go routines
concurrency: 2
# where to run the webinterface
addr: ":3001"
# if you want to ignore
ignorerobots: true
# in some cases using cookies is friendlier to the server
usecookies: true# ignoring urls
## based on query parameters in this example all links, that contain a queryparameter foo
ignorequerieswith:
- foo
## skip everything that has a query
ignoreallqueries: true
# what paths (that would be a prefixes)
ignore:
- /foomo
...
```## error detection
- everything greater than 400 will be tracked as an error
## external link validation (not implemented yet)
- check external links
- forbidden sites like a stage system## seo validation
- missing title, description, h1
- duplication title, description, h1### seo validation schemata
WIP
## metrics
Work in progress exposed on /metrics
- vector of status codes
- performance buckets