https://github.com/tvanreenen/batch-url-validator

A simple utility script to validate multiple URLs in parallel.
https://github.com/tvanreenen/batch-url-validator

batch-processing url-validator

Last synced: 11 months ago
JSON representation

A simple utility script to validate multiple URLs in parallel.

Host: GitHub
URL: https://github.com/tvanreenen/batch-url-validator
Owner: tvanreenen
Created: 2025-05-01T23:19:23.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-05-05T18:07:07.000Z (about 1 year ago)
Last Synced: 2025-06-02T07:15:22.847Z (12 months ago)
Topics: batch-processing, url-validator
Language: Python
Homepage:
Size: 10.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Batch URL Validator

A simple utility script to validate multiple URLs in parallel. I created this when I needed to check the status of a large number of links from documentation.

## Features

- Validates URLs in parallel using an adjustable amount of workers
- Updates a CSV file with status codes and timestamps

## Usage

1. Prepare a CSV file with a `url` column containing the links to check
2. Run the script:
```bash
uv run src/batch_url_validator.py your_list_of_urls.csv
```

Optional arguments:
- `--max-workers`: Number of concurrent requests (default: 10)
```bash
uv run src/batch_url_validator.py your_list_of_urls.csv --max-workers 20
```

## Output

The script will:
1. Update the input CSV file with:
- `code`: HTTP status code (or None if the request failed)
- `datetime`: Timestamp of when the check was performed
2. Print a summary of the results including:
- Total number of links checked
- Distribution of status codes

## Example

Input: `your_list_of_urls.csv`
```csv
url
https://example.com
https://nonexistent.example
```

After running:
```csv
url,code,datetime
https://example.com,200,2024-03-21 14:30:45
https://nonexistent.example,None,2024-03-21 14:30:46
```

## Notes

- The script checks all unique URLs in the file each time it runs
- Supports both HEAD and GET requests (falls back to GET if HEAD fails)
- Timeout is set to 2 seconds per request
- If a URL appears multiple times in the CSV, it will only be checked once, but all instances will be updated with the same status code and timestamp

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tvanreenen/batch-url-validator

Awesome Lists containing this project

README