https://github.com/tvanreenen/batch-url-validator
A simple utility script to validate multiple URLs in parallel.
https://github.com/tvanreenen/batch-url-validator
batch-processing url-validator
Last synced: 11 months ago
JSON representation
A simple utility script to validate multiple URLs in parallel.
- Host: GitHub
- URL: https://github.com/tvanreenen/batch-url-validator
- Owner: tvanreenen
- Created: 2025-05-01T23:19:23.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-05T18:07:07.000Z (about 1 year ago)
- Last Synced: 2025-06-02T07:15:22.847Z (12 months ago)
- Topics: batch-processing, url-validator
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Batch URL Validator
A simple utility script to validate multiple URLs in parallel. I created this when I needed to check the status of a large number of links from documentation.
## Features
- Validates URLs in parallel using an adjustable amount of workers
- Updates a CSV file with status codes and timestamps
## Usage
1. Prepare a CSV file with a `url` column containing the links to check
2. Run the script:
```bash
uv run src/batch_url_validator.py your_list_of_urls.csv
```
Optional arguments:
- `--max-workers`: Number of concurrent requests (default: 10)
```bash
uv run src/batch_url_validator.py your_list_of_urls.csv --max-workers 20
```
## Output
The script will:
1. Update the input CSV file with:
- `code`: HTTP status code (or None if the request failed)
- `datetime`: Timestamp of when the check was performed
2. Print a summary of the results including:
- Total number of links checked
- Distribution of status codes
## Example
Input: `your_list_of_urls.csv`
```csv
url
https://example.com
https://nonexistent.example
```
After running:
```csv
url,code,datetime
https://example.com,200,2024-03-21 14:30:45
https://nonexistent.example,None,2024-03-21 14:30:46
```
## Notes
- The script checks all unique URLs in the file each time it runs
- Supports both HEAD and GET requests (falls back to GET if HEAD fails)
- Timeout is set to 2 seconds per request
- If a URL appears multiple times in the CSV, it will only be checked once, but all instances will be updated with the same status code and timestamp