https://github.com/beenotung/scan-link
Scan given website recursively and report 404 links
https://github.com/beenotung/scan-link
404-errors broken-links cli csv-report http-status link-analyzer link-checker link-scanner link-validator npx seo-tools url-scanner web-crawler web-scrapping web-tools website-scanner
Last synced: 3 months ago
JSON representation
Scan given website recursively and report 404 links
- Host: GitHub
- URL: https://github.com/beenotung/scan-link
- Owner: beenotung
- License: bsd-2-clause
- Created: 2024-05-18T12:27:53.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-05-18T12:54:21.000Z (about 1 year ago)
- Last Synced: 2025-02-01T14:47:51.136Z (4 months ago)
- Topics: 404-errors, broken-links, cli, csv-report, http-status, link-analyzer, link-checker, link-scanner, link-validator, npx, seo-tools, url-scanner, web-crawler, web-scrapping, web-tools, website-scanner
- Language: TypeScript
- Homepage: https://www.npmjs.com/package/scan-link
- Size: 22.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# scan-link
Scan given website recursively and report 404 links
[](https://www.npmjs.com/package/scan-link)
## Features
- Start scanning from a specified entry URL
- Follow links within specified origins
- Report links that lead to 404 pages
- Export 404 error report as a CSV file## Remark
The links and page status code are stored in the `db.sqlite3` file of the current working directory. You may run `mkdir` and `cd` to a specific directory to avoid storing it in the home directory.
## Installation (optional)
You can install scan-link for version control, or execute it via `npx` without installation.
To install `scan-link`, use npm:
```bash
npm install scan-link
```You may install it as dev dependency or global dependency based on your preference.
## Usage
You can use `scan-link` from the command line via `npx`. The configuration can be provided via environment variables or interactively during execution.
Usage with dev/global installation:
```bash
npx scan-link [entryUrl]
```Usage without installation:
```bash
npx -y scan-link [entryUrl]
```The `entryUrl` can be specified in argument, loaded from environment variable, or answered in the interactive prompt.
### Environment Variables
- SITE_URL: The entry URL for the scan
- ORIGINS: A comma-separated list of origins to limit the scan
- REPORT_404_CSV_FILE: Path of the CSV file where the 404 error report will be savedExample content of `.env` file:
```bash
SITE_DIR=https://example.com
ORIGINS=https://example.com,https://sub.example.com
REPORT_404_CSV_FILE=report.csv
```### Interactive Usage
If environment variables are not set, `scan-link` will prompt you for the necessary information.
```bash
npx scan-link
```You will be prompted to setup above variables.
### Example Interactive Session
```bash
$ npx -y scan-link
entryUrl: http://localhost:8200/Please specified the origins of links to follow.
Multiple origins can be delimited by comma (",").
origins (default: "http://localhost:8200"):
origins: [ 'http://localhost:8200' ]path of CSV file to be saved (default "404.csv"): report.csv
scanned: 12 | pending: 85 | scanning: http://localhost:8200/about
...
scanned: 119 pages
{
'404 link count': 1447,
'total link count': 5036,
'page count with 404 link': 11,
'total page count': 119
}
exported 404 pages to file: report.csv
```## API
For advanced usage, you can import and use the `scanAndFollow()` functions programmatically.
```typescript
export function scanAndFollow(options: {
/** @example 'http://localhost:8200/' */
entryUrl: string/** @default same as entryUrl */
origins?: string[]/** @description report stats on 404 pages and links */
report_404_stats?: boolean/** @description specified filename to report 404 links. Skip reporting if not specified. */
export_404_csv_file?: string/**
* @description auto close browser after all scanning
* @default true
*/
close_browser?: boolean
}): Promise/** @description called by `scanAndFollow()` if `options.report_404_stats` is true */
export function get404Report(options: { origin: string }): {
'404 link count': number
'total link count': number
'page count with 404 link': number
'total page count': number
}/** @description called by `scanAndFollow()` if `options.export_404_csv_file` is specified */
export function export404Pages(options: {
csv_file: string
origin: string
}): void/** @description close the lazy loaded browser instance if it's launched */
export function closeBrowser(): Promise
```## License
This project is licensed with [BSD-2-Clause](./LICENSE)
This is free, libre, and open-source software. It comes down to four essential freedoms [[ref]](https://seirdy.one/2021/01/27/whatsapp-and-the-domestication-of-users.html#fnref:2):
- The freedom to run the program as you wish, for any purpose
- The freedom to study how the program works, and change it so it does your computing as you wish
- The freedom to redistribute copies so you can help others
- The freedom to distribute copies of your modified versions to others