Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hexfusion/sitemapper
Fork of http://search.cpan.org/~awrigley/sitemapper-1.019/lib/WWW/Sitemap.pm
https://github.com/hexfusion/sitemapper
Last synced: 4 days ago
JSON representation
Fork of http://search.cpan.org/~awrigley/sitemapper-1.019/lib/WWW/Sitemap.pm
- Host: GitHub
- URL: https://github.com/hexfusion/sitemapper
- Owner: hexfusion
- Created: 2017-04-14T16:50:38.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-04-14T16:52:56.000Z (over 7 years ago)
- Last Synced: 2024-11-10T20:46:41.460Z (2 months ago)
- Language: Perl
- Homepage:
- Size: 32.2 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README
- Changelog: Changes
Awesome Lists containing this project
README
Sitemapper Version 1.008
========================Description
-----------sitemapper.pl is a simple perl script which generated an HTML site map from a
given URL. It does this by traversing the site, getting the home page,
extracting links from it, getting all the pages linked, and so on.The default sitemap generated is an HTML bulleted list. The first level
indented list item is the home page; the next level are all the pages linked
from the home page. The next level are all the pages linked from each of these
pages, and so on. If a page is linked from more than one page, it is show in
the "highest" place in the tree it is linked from.Alternative sitemap formats are:
* a dynamic HTML version (see below) which generates a collapsable folding
tree.* a text version, which generates a simple formated text file
* an XML graph version, which prints out all the URLs and links in the site
in an XML formatsitemapper.pl should correctly deal with framesets, client side image maps, and
tags. It ignores all "off site" links - i.e. all absolute URLs that do
not start with the original "base" URL of the home page.Modules
-------sitemapper.pl includes two modules that it requires in its distribution:
WWW::Sitemap
LWP::AuthenAgentWWW::Sitemap is the module that is used to generate the sitemap structure from
which the various output formats are generated. The interface provides access
to list of URLs for a site, and links from each of these URLs. It also supports
a traverse method, which allows the caller to specify a callback, so that other
formats of sitemap can be generated, or other sitemap related functionality
implemented. See the documentation of this module for more details.LWP::AuthenAgent is a simple subclass of the LWP::UserAgent module, which
allows requests to be made for URLs that require autentication, by requiring
the user to type the username / password information for the relevant realm.
This information is stored in the LWP::AuthenAgent object, so that repeated
requests to the same realm can be made without re-typing the authenication
details (a bit like a web browser, in fact). tty echo is switched off for the
password.Installation
------------Just the basic Makefile.PL stuff; i.e.:
> perl Makefile.PL
> make
> make test
> make installUsage
-----To use sitemapper.pl, just type:
./sitemapper.pl -url http://www.mysite.com/
to get output to stdout, or
./sitemapper.pl -url http://www.mysite.com/ -output mysitemap.html
to output to a file. Type
./sitemapper.pl -help
to get full usage instructions, or
.sitemapper.pl -doc
to output the pod documentation
Examples
--------example.html contains an example of sitemapper.pl output, for the Canon Research
Europe Ltd Perl Pages (http://www.cre.canon.co.uk/perl/); i.e. by running:./sitemapper.pl -o example.html -url http://www.cre.canon.co.uk/
example.js.html contains an example of a dynamic HMTL version of the site map
for the CRE site. This is generated using Jef Pearlman's (jef@mit.edu)
javascript Tree class.http://developer.netscape.com/docs/examples/dynhtml/tree.html
Many thanks to Jef for allowing this to be distributed with sitemapper.pl! This
is generated by running:./sitemapper.pl -o example.js.html -url http://www.cre.canon.co.uk/ -format js
exampl.xml contains the output from:
./sitemapper.pl -o example.xml -url http://www.cre.canon.co.uk/ -format xml
The XML format for this file is pretty ad hoc - probably not of interest to
anyone apart from me!Finally, a plain text version can be generated using the -format text
option; for example:./sitemapper.pl -o example.txt -url http://www.cre.canon.co.uk/ -format text
CPAN Modules
------------sitemapper.pl uses the following CPAN modules, that need to be installed before
it will work:WWW::Robot
HTML::Summary
Digest::MD5
Date::Format
Getopt::Long
HTML::Entities
IO::File
LWP::UserAgent
URI::URL
Term::ReadKeySee http://www.perl.com/CPAN/ for details of how to download / install these
modules.Bugs
----Please send any bugs / comments / suggestions to Ave.Wrigley@itn.co.uk