An open API service indexing awesome lists of open source software.

https://github.com/imacat/chklinks

A non-threaded Perl link checker
https://github.com/imacat/chklinks

link-checker

Last synced: 11 months ago
JSON representation

A non-threaded Perl link checker

Awesome Lists containing this project

README

          

`chklinks` - A Non-Threaded Perl Link Checker
=============================================

Description
-----------

`chklinks` is a non-threaded Perl link checker. It helps to find
broken links on your website.

`chklinks` differs from [linkchecker] in that `chklinks` is
non-threaded. It does not raise many simultaneously connections for
its job. It won’t run out of the resources and crash your system in a
moment. This is certainly more desirable for most webmasters and
users.

`chklinks` respects `robots.txt`. If you disallow robots from your
website and experience problems, you need to allow `chklinks`. Add
the following lines to your `robots.txt` file to allow `chklinks`:

User-agent: chklinks
Disallow:

`chklinks` uses [LWP::RobotUA] and supports the following schemes:
`http`, `https`, `ftp`, `gopher` and `file`. You can also specify a
local file. (To use `https`, you need to install [Crypt::SSLeay].
This is the requirement of LWP::RobotUA.)

`chklinks` supports cookies.

[linkchecker]: https://wummel.github.io/linkchecker
[LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA
[Crypt::SSLeay]: https://metacpan.org/release/Crypt-SSLeay

System Requirement
------------------

1. Perl, version 5.6 or above. I have not successfully run this on
earlier versions. Please tell me if you can.

You can run `perl -v` to check your current Perl version. If you
do not have Perl, or if you have an older version of Perl, you can
download and install/upgrade it from the [Perl website]. For
MS-Windows, you can download and install [Strawberry Perl] or
[ActivePerl].

2. Required Perl modules:

* [URI]

This is used to parse and process the found URLs. You can
download and install URI from the CPAN archive, or install it
with the CPAN shell:

cpan URI

or with the CPANPLUS shell:

cpanp i URI

For Debian/Ubuntu:

sudo apt install liburi-perl

For Red Hat/Fedora/CentOS:

sudo yum install perl-URI

For FreeBSD:

ports install p5-URI

For ActivePerl:

ppm install URI

* [HTML::LinkExtor]

This is used to extract links from the web pages.
HTML::LinkExtor is contained in the [HTML-Parser] distribution.
You can download and install HTML::LinkExtor from the CPAN
archive, or install it with the CPAN shell:

cpan HTML::LinkExtor

or with the CPANPLUS shell:

cpanp i HTML::LinkExtor

For Debian/Ubuntu:

sudo apt install libhtml-parser-perl

For Red Hat/Fedora/CentOS:

sudo yum install perl-HTML-Parser

For FreeBSD:

ports install p5-HTML-Parser

For ActivePerl:

ppm install HTML-Parser

* [LWP::RobotUA]

This is used to request web pages. LWP::RobotUA is contained in
the [libwww-perl] distribution. You can download and install
LWP::RobotUA from the CPAN archive, or install it with the CPAN
shell:

cpan LWP::RobotUA

or with the CPANPLUS shell:

cpanp i HTML::LinkExtor

For Debian/Ubuntu:

sudo apt install libwww-perl

For Red Hat/Fedora/CentOS:

sudo yum install perl-libwww-perl

For FreeBSD:

ports install p5-libwww

For ActivePerl:

ppm install libwww-perl

3. Optional Perl modules:

* [Crypt::SSLeay]

This is needed by LWP::RobotUA to support HTTPS. You can
download and install HTML::LinkExtor from the CPAN archive, or
install it with the CPAN shell:

cpan Crypt::SSLeay

or with the CPANPLUS shell:

cpanp i Crypt::SSLeay

For Debian/Ubuntu:

sudo apt install libcrypt-ssleay-perl

For Red Hat/Fedora/CentOS:

sudo yum install perl-Crypt-SSLeay

For FreeBSD:

ports install p5-Crypt-SSLeay

For ActivePerl:

ppm install Crypt-SSLeay

[Perl website]: https://www.perl.org
[Strawberry Perl]: https://strawberryperl.com
[ActivePerl]: https://www.activestate.com/products/perl/
[URI]: https://metacpan.org/pod/URI
[HTML::LinkExtor]: https://metacpan.org/pod/HTML::LinkExtor
[HTML-Parser]: https://metacpan.org/release/HTML-Parser
[LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA
[libwww-perl]: https://metacpan.org/release/libwww-perl
[Crypt::SSLeay]: https://metacpan.org/pod/Crypt::SSLeay

Download
--------

`chklinks` is hosted is on…

* [chklinks project on GitHub]

* [chklinks project on SourceForge]

You can always download the newest version of `chklinks` from…

* [chklinks download on SourceForge]

* [Tavern IMACAT’s FTP directory]

imacat’s PGP public key is at…

* [imacat’s PGP key at Tavern IMACAT’s]

[chklinks project on GitHub]: https://github.com/imacat/chklinks
[chklinks project on SourceForge]: https://sf.net/p/chklinks
[chklinks download on SourceForge]: https://sourceforge.net/projects/chklinks/files
[Tavern IMACAT’s FTP directory]: https://ftp.imacat.idv.tw/pub/chklinks/
[imacat’s PGP key at Tavern IMACAT’s]: https://www.imacat.idv.tw/me/pgpkey.asc

Install
-------

% perl Makefile.PL
% make
% make test
% make install

When running `make install`, make sure you have the privilege to
write to the installation locations. This usually requires the `root`
privilege.

If you want to install into another location, you can set the
`PREFIX`. For example, to install into your home when you are not
`root`:

% perl Makefile.PL PREFIX=/home/jessica

Refer to the documentation of ExtUtils::MakeMaker for more
installation options (by running `perldoc ExtUtils::MakeMaker`).

For MS-Windows, since `make` is not universally available,
Module::Build is preferred to ExtUtils::MakeMaker. See the
instructions below.

### Install with [Module::Build]

% perl Build.PL
% ./Build
% ./Build test
% ./Build install

When running `./Build install`, make sure you have the privilege to
write to the installation locations. This usually requires the `root`
privilege.

If you want to install into another location, you can set the
`--prefix`. For example, to install into your home when you are not
`root`:

% perl Build.PL --prefix=/home/jessica

Refer to the documentation of Module::Build for more installation
options (by running `perldoc Module::Build`).

[ExtUtils::MakeMaker]: https://metacpan.org/release/ExtUtils-MakeMaker
[Module::Build]: https://metacpan.org/release/Module-Build

Options
-------

./chklinks [options] URL1 [URL2 [URL3 …]]
./chklinks [-h|-v]

* `-1`, `--onelevel`

Check the links on this page and stops.

* `-r`, `--recursive`

Recursively check through this website. This is the default.

* `-b`, `--below`

Only check the links below this directory. This is the default.

* `-p`, `--parent`

Trace back to the parent directories.

* `-l`, `--local`

Only check the links on this same host.

* `-s`, `--span`

Check the links to other hosts (without recursion). This is the
default.

* `-e`, `--exclude path`

Exclude this path. Check for their existence but not check the
links on them, just like they are on a foreign site. Multiple
`--exclude` are OK.

* `-i`, `--include path`

Include this path. An opposite of `--exclude` that cancels its
effect. The latter specified has a higher priority.

* `-d`, `--debug`

Display debug messages. Multiple `--debug` to debug more.

* `-q`, `--quiet`

Disable debug messages. An opposite that cancels the effect of
`--debug`.

* `-h`, `--help`

Display the help message and exit.

* `-v`, `--version`

Output version information and exit.

* `URL1`, `URL2`, `URL3`

The URLs of the websites to check against.

Notes
-----

* `chklinks` does not obey `Crawl-delay:` in `robots.txt` yet. This
is a problem in [WWW::RobotRules], but not `chklinks` itself.

* If you encounter warnings like this:

Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/share/perl5/LWP/Protocol.pm line 114.

This is an issue of [LWP::Protocol] version ≤ 1.43 (in libwww-perl
version ≤ 5.805) when working with HTML::Parser version ≥ 3.40 and
Perl version ≥ 5.8. This issue is solved in LWP::Protocol
version ≥ 1.46 (in libwww-perl version ≥ 5.806). You can upgrade
your LWP::Protocol to the current version. If you cannot upgrade
it, see [CPAN RT Bug#20274] for an LWP::Protocol patch on this.

[WWW::RobotRules]: https://metacpan.org/pod/WWW::RobotRules
[LWP::Protocol]: https://metacpan.org/pod/LWP::Protocol
[CPAN RT Bug#20274]: https://rt.cpan.org/Public/Bug/Display.html?id=20274

Bugs
----

`chklinks` does not support authentication yet. [W3C-LinkChecker]
supports this. As a workaround, You can use the syntax
`http://user:pass@some.where.com/some/path` for Basic Authentication,
but this does not work on Digest Authentication. This practice is
not encouraged. Your password would be visible to anyone on this
system using `ps`, including hidden intruders. Also, what you type
in your shell will be saved to your shell history file.

`mailto:` URLs should be supported by checking the validity of its
DNS/MX record. Bastian Kleineidam's [linkchecker] have support on
this.

Local file checking has only been tested on Unix and MSWin32. More
platforms should be tested, especially VMS and Mac.

[W3C-LinkChecker]: https://validator.w3.org/checklink
[linkchecker]: https://wummel.github.io/linkchecker

See Also
--------

[LWP::UserAgent], [LWP::RobotUA], [WWW::RobotRules], [URI],
[HTML::LinkExtor], Bastian Kleineidam’s [linkchecker] and
W3C-LinkChecker [checklink].

[LWP::UserAgent]: https://metacpan.org/pod/LWP::UserAgent
[LWP::RobotUA]: https://metacpan.org/pod/LWP::RobotUA
[WWW::RobotRules]: https://metacpan.org/pod/WWW::RobotRules
[URI]: https://metacpan.org/release/URI
[HTML::LinkExtor]: https://metacpan.org/pod/HTML::LinkExtor
[linkchecker]: https://wummel.github.io/linkchecker
[checklink]: https://validator.w3.org/checklink

Release Notes
-------------

Please read the `NEWS` for the new functions and bug fixes.

Support
-------

The `chklinks` project is hosted on GitHub. Address your issues on the
GitHub issue tracker https://github.com/imacat/chklinks/issues.

Thanks
------

* Thanks to [SourceForge] for providing compiling farm for projects to
test on different platforms.

* Thanks to [Stefan Seifert] for pointing
out redirection loops problem when cookies are not activated.
(2005-11-07)

* Thanks to [nsnake] for reporting warnings from [HTML::Parser]
version >= 3.40 when checking UTF-8 pages. (2007-06-06)

[SourceForge]: https://sf.net
[Stefan Seifert]: mailto:stefan.seifert@atikon.com
[nsnake]: mailto:loveme1314@gmail.com
[HTML::Parser]: https://metacpan.org/pod/HTML::Parser

License
-------

Copyright (C) 2003-2021 imacat.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

imacat ^_*'
2003/5/25

https://www.imacat.idv.tw