https://github.com/tokenmill/fast-url-access-checker
Easily run HTTP GET requests against a list of URLs to check their HTTP status.
https://github.com/tokenmill/fast-url-access-checker
clojure http-redirect http-status java url-checker url-cleaning
Last synced: about 1 month ago
JSON representation
Easily run HTTP GET requests against a list of URLs to check their HTTP status.
- Host: GitHub
- URL: https://github.com/tokenmill/fast-url-access-checker
- Owner: tokenmill
- License: other
- Created: 2019-05-27T13:52:43.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-09-04T09:36:26.000Z (over 5 years ago)
- Last Synced: 2025-04-02T02:02:04.326Z (about 2 months ago)
- Topics: clojure, http-redirect, http-status, java, url-checker, url-cleaning
- Language: Clojure
- Homepage:
- Size: 42 KB
- Stars: 12
- Watchers: 2
- Forks: 4
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# URL Access Checker
This tool will take a list URLs of the sites, identify a correct form of the URL and run HTTP GET request against the URL to check its HTTP status. In cases where the address is not completely specified - a protocol is missing, 'www' part is not included when it is needed - a correct form of the URL will be identified. The library will also validate the correctness of the URL and in cases of redirection will return a target URL.
It is a Clojure library. Additionaly an interface to call it from Java is provided. As well as native binary distribution to be used as a command line tool.
# Features
* Provides the interface for a single URL check.
* Provides the interface for bulk URL checks.
* In the case of bulk URL check library parallelizes checking to ensure maximum speed of the entire process.
* In cases of incompletely formed URLs correct protocol (http or https) will be detected. Access with 'www' part if it is missing will also be tested.
* Redirection will be detected and target URL returned.
* URL check returns the following data: HTTP status, target URL, response time.# How to Use
## Command Line
Use URL checker can be started from the command line with the following instruction
```
./url-checker test/resources/bulk-test.txt
```See bellow for the output sample.
URL checker can be executed via command line using intalled [Clojure](https://clojure.org/) tools. Execution example with project's test url set:
```
clojure -m fast-url-check.core test/resources/bulk-test.txt
```Or via project's Makefile
```
make check-urls file-name=test/resources/bulk-test.txt
```This will result in CSV formated output of URL checking results
```
timestamp,seed,url,status,status-type,response-time,exception
2019-05-30T10:43:47.674Z,cameron.slb.com,https://www.products.slb.com,302,redirect,431,
2019-05-30T10:43:47.691Z,co.williams.com,https://co.williams.com/,200,accessible,622,
2019-05-30T10:43:47.691Z,company.ingersollrand.com,https://www.company.ingersollrand.com/,200,accessible,645,
...
2019-05-30T10:43:51.950Z,http://aes.com,https://aes.com/,200,accessible,3632,
```## Clojure
Singe URL check example.
```
(require '[fast-url-check.core :refer :all])(check-access "tokenmill.lt")
=>
{:url "http://www.tokenmill.lt/",
:seed "tokenmill.lt",
:status 200,
:response-time 7,
:status-type :accessible}
```Bulk URL check example
```
(check-access-bulk ["tokenmill.lt" "15min.lt" "https://news.ycombinator.com"])
=>
({:url "http://www.tokenmill.lt/",
:seed "tokenmill.lt",
:status 200,
:response-time 10,
:status-type :accessible}
{:url "https://www.15min.lt/",
:seed "15min.lt",
:status 200,
:response-time 46,
:status-type :accessible}
{:url "https://news.ycombinator.com/",
:seed "https://news.ycombinator.com",
:status 301,
:response-time 379,
:status-type :redirect})```
## Java
Java code example:
```
import crawl.tools.URLCheck;import java.util.Map;
import java.util.Arrays;
import java.util.Collection;public class MyClass {
public static void main(String[] args) {
System.out.println(URLCheck.checkAccess("tokenmill.lt"));String[] urls = {"15min.lt", "https://news.ycombinator.com"};
Collection validatedUrls = URLCheck.checkAccessBulk(Arrays.asList(urls));
for(Map validatedUrl : validatedUrls) {
System.out.println(validatedUrl);
}
}
}```
# Benchmark
This tool aims to provide top performance in bulk URL checking. This repository includes a [reference set](https://github.com/tokenmill/fast-url-access-checker/blob/master/test/resources/bulk-test.txt) of 1000 URLs for consistent performance checking.
Benchmark test executed against the reference URL set performs with average _0.3 seconds per URL_. Execution times are subject to the network conditions and hardware the tests are executed on.
Benchmark test can be launched with `make benchmark`
## License
Copyright © 2019 [TokenMill UAB](http://www.tokenmill.lt).
Distributed under the The Apache License, Version 2.0.