Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/perigrin/xml-sax-machines

manage collections of SAX processors
https://github.com/perigrin/xml-sax-machines

Last synced: 22 days ago
JSON representation

manage collections of SAX processors

Awesome Lists containing this project

README

        

README for XML-SAX-Machines

XML::SAX::Machines is a collection of APIs that allow complex SAX machines
to be constructed without a huge amount of extra typing.

This distribution contains three kinds of modules: machines, helpers, and
filters. Here's how they are laid out:

- XML::SAX::* contains machines and helpers.
- XML::SAX::Machines lets you import the "classic" constructor
functions like Tap(), Pipeline(), Manifold(), and ByRecord().
- Each machine type has a class that implements it, like
XML::SAX::Tap, XML::SAX::Pipeline, etc.
- There is currently only one available helper,
XML::SAX::EventMethodMaker, which is most useful for building a
collection of methods to handle different events in the same way,
without having to know all of their names. It is also useful as a
reference for all of the SAX events by looking at the source code,
which contains simple tables of what events occur for what kind of
handler (compiled by Robin Berjon).

- XML::Filter::* contains filters that are used by ByRecord and Manifold
machines to handle SAX events (machines don't handle SAX events, they
delegate to the generators/filters/handlers they contain).
- XML::Filter::DocSplitter - Splits one doc in to multiple
documents, optionally coordinating with an aggregator like
XML::Filter::Merger to reassemble them. ByRecord uses this.
- XML::Filter::Distributor - buffers a document and reemits it to
each handler in turn. Used by Manifold.
- XML::Filter::Tee - a dynamically reconfigurable tee fitting. Does
not buffer. Used by Tap. Morally equivalent to
XML::Filter::SAXT but more flexible.
- XML::Filter::Merger - collects multiple documents and merges them,
inserting all secondary documents in to one master document.
Used by both ByRecord and Manifold.

All of the XML::Filter::* classes are useful outside of the machines
that use them. For instance, XML::Filter::DocSplitter has been used
(not by me) in a Pipeline to split a huge record oriented file in to
individual files containing single records (using a custom class derived
from XML::SAX::Writer). XML::Filter::Merger is useful as a general way
to implement style processing when XInclude is not a good
fit.

See the examples/ directory for, well, examples (and feel free to write
up creative examples, eventually I'd like to compile a cookbook).

To give a more concrete idea of how SAX machines are typically used,
here's how to build a pipeline of SAX processors:

use XML::SAX::Machines qw( Pipeline );
use My::SAX::Filter2;

my $p = Pipeline(
"My::SAX::Filter1",
My::SAX::Filter2->new( ... ),
\$output
);

$p->parse_uri( $ARGV[0] );

That loads (if need be) XML::SAX::Writer and calls it's new() function
with an Output => \$output option, calls the passed-in instance of
XML::SAX::Filter2 and calls its set_handler() method to point it to the
XML::SAX::Writer that was just created, and then loads (if need be)
My::SAX::Filter1 and calls it's new() function with a Handler => option
pointing to the XML::SAX::Filter2 instance.