https://github.com/sohunn/status-crawler
A tool to detect dead links on a website and summarize their HTTP statuses in a clear table, written using Golang.
https://github.com/sohunn/status-crawler
concurrent-programming deadlink-finder golang playwright webscraper webscraping
Last synced: 5 months ago
JSON representation
A tool to detect dead links on a website and summarize their HTTP statuses in a clear table, written using Golang.
- Host: GitHub
- URL: https://github.com/sohunn/status-crawler
- Owner: sohunn
- License: mit
- Created: 2024-12-01T08:37:24.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-12-10T14:52:24.000Z (about 1 year ago)
- Last Synced: 2025-03-25T10:42:19.666Z (9 months ago)
- Topics: concurrent-programming, deadlink-finder, golang, playwright, webscraper, webscraping
- Language: Go
- Homepage: https://sohunn.me
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
#
StatusCrawler
This is a simple tool used to detect dead links on a website and summarize their HTTP statuses in a clear table, written in Golang.
## Features✨
- Supports and validates links using `http` and `https` schemes.
- Uses [playwright](https://pkg.go.dev/github.com/playwright-community/playwright-go) to perform efficient web scraping.
- Leverages the power of go-routines with mutexes, wait groups and distributed locking mechanisms to increase performance and concurrency 🚀
- Clean summary in a tabular format.
## How to use❓
- Make sure you have the latest version of [go](https://go.dev/dl/) installed.
- Clone the repository using the following command:
```
git clone https://github.com/sohunn/status-crawler.git
```
- Install dependencies:
```
go mod tidy
```
- Make sure to install the browsers and OS dependencies:
```
go run github.com/playwright-community/playwright-go/cmd/playwright@latest install --with-deps
```
- From the root of the project:
```
go run ./
```
## Example
```
go run ./ "https://sohunn.me"
```
## Building 🛠️
Check your Go env variables (`GOOS` and `GOPATH`) to make sure you are building the executable for the right platform. Once verified, run:
```
go build -o crawler.exe ./
```
**Note:** You can call your executable whatever you want. I have specified `crawler` in the example
Once done, simply run the executable with the arguments like you normally would.
```
crawler.exe "https://sohunn.me"
```