https://github.com/mons/xml-fast

Very fast simple xml to hash parser
https://github.com/mons/xml-fast

Last synced: 8 months ago
JSON representation

Very fast simple xml to hash parser

Host: GitHub
URL: https://github.com/mons/xml-fast
Owner: Mons
License: other
Created: 2010-06-15T21:40:57.000Z (about 16 years ago)
Default Branch: master
Last Pushed: 2017-06-29T23:04:00.000Z (almost 9 years ago)
Last Synced: 2025-08-13T20:55:42.430Z (10 months ago)
Language: C
Homepage: http://search.cpan.org/dist/XML-Fast
Size: 1.68 MB
Stars: 4
Watchers: 1
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README
- Changelog: Changes
- License: LICENSE

Awesome Lists containing this project

README

          NAME

    XML::Fast - Simple and very fast XML - hash conversion

SYNOPSIS

      use XML::Fast;

  

      my $hash = xml2hash $xml;

      my $hash2 = xml2hash $xml, attr => '.', text => '~';

DESCRIPTION

    This module implements simple, state machine based, XML parser written

    in C.

    It could parse and recover some kind of broken XML's. If you need XML

    validator, use XML::LibXML

RATIONALE

    Another similar module is XML::Bare. I've used it for some time, but it

    have some failures:

    *   If your XML have node with TextNode, then CDATANode, then again

        TextNode, you'll got broken value

    *   It doesn't support charsets

    *   It doesn't support any kind of entities.

    So, after count of tries to fix XML::Bare I've decided to write parser

    from scratch.

    Here is some features and principles:

    *   It uses minimal count of memory allocations.

    *   All XML is parsed in 1 scan.

    *   All values are copied from source XML only once (to destination

        keys/values)

    *   If some types of nodes (for ex comments) are ignored, there are no

        memory allocations/copy for them.

    I've removed benchmark results, since they are very different for

    different xml's. Sometimes XML::Bare is faster, sometimes not. So,

    XML::Fast mainly should be considered not "faster-than-bare", but

    "format-other-than-bare"

EXPORT

  xml2hash $xml, [ %options ]

  hash2xml $hash, [ %options ]

OPTIONS

    order [ = 0 ]

        Not implemented yet. Strictly keep the output order. When enabled,

        structures become more complex, but xml could be completely

        reverted.

    attr [ = '-' ]

        Attribute prefix

              =>  { node => { -attr => "test" } }

    text [ = '#text' ]

        Key name for storing text

        When undef, text nodes will be ignored

            text  =>  { node => { sub => '', '#text' => "test" } }

    join [ = '' ]

        Join separator for text nodes, splitted by subnodes

        Ignored when "order" in effect

            # default:

            xml2hash( 'Test1Test2' )

            : { item => { sub => '', '~' => 'Test1Test2' } };

    

            xml2hash( 'Test1Test2', join => '+' )

            : { item => { sub => '', '~' => 'Test1+Test2' } };

    trim [ = 1 ]

        Trim leading and trailing whitespace from text nodes

    cdata [ = undef ]

        When defined, CDATA sections will be stored under this key

            # cdata = undef

              =>  { node => 'test' }

            # cdata = '#'

              =>  { node => { '#' => 'test' } }

    comm [ = undef ]

        When defined, comments sections will be stored under this key

        When undef, comments will be ignored

            # comm = undef

              =>  { node => { sub => '' } }

            # comm = '/'

              =>  { node => { sub => '', '/' => 'comm' } }

    array => 1

        Force all nodes to be kept as arrays.

            # no array

              =>  { node => { sub => '' } }

            # array = 1

              =>  { node => [ { sub => [ '' ] } ] }

    array => [ 'node', 'names']

        Force nodes with names to be stored as arrays

            # no array

              =>  { node => { sub => '' } }

            # array => ['sub']

              =>  { node => { sub => [ '' ] } }

    utf8decode => 1

        Force decoding of utf8 sequences, instead of just upgrading them

        (may be useful for broken xml)

SEE ALSO

    *   XML::Bare

        Another fast parser

    *   XML::LibXML

        The most powerful XML parser for perl. If you don't need to parse

        gigabytes of XML ;)

    *   XML::Hash::LX

        XML parser, that uses XML::LibXML for parsing and then constructs

        hash structure, identical to one, generated by this module. (At

        least, it should ;)). But of course it is much more slower, than

        XML::Fast

LIMITATIONS

    *   Does not support wide charsets (UTF-16/32) (see RT71534

        )

TODO

    *   Ordered mode (as implemented in XML::Hash::LX)

    *   Create hash2xml, identical to one in XML::Hash::LX

    *   Partial content event-based parsing (I need this for reading XML

        streams)

    Patches, propositions and bug reports are welcome ;)

AUTHOR

    Mons Anderson, 

COPYRIGHT AND LICENSE

    Copyright (C) 2010 Mons Anderson

    This library is free software; you can redistribute it and/or modify it

    under the same terms as Perl itself.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mons/xml-fast

Awesome Lists containing this project

README