https://github.com/librecat/catmandu-html
  
  
    Catmandu modules to process HTML files 
    https://github.com/librecat/catmandu-html
  
        Last synced: 8 months ago 
        JSON representation
    
Catmandu modules to process HTML files
- Host: GitHub
- URL: https://github.com/librecat/catmandu-html
- Owner: LibreCat
- License: other
- Created: 2018-05-14T07:49:20.000Z (over 7 years ago)
- Default Branch: main
- Last Pushed: 2021-02-09T07:49:05.000Z (over 4 years ago)
- Last Synced: 2025-01-12T14:36:42.945Z (10 months ago)
- Language: HTML
- Size: 29.3 KB
- Stars: 1
- Watchers: 13
- Forks: 2
- Open Issues: 1
- 
            Metadata Files:
            - Readme: README.md
- Changelog: Changes
- License: LICENSE
 
Awesome Lists containing this project
README
          # NAME
Catmandu::HTML - Modules for handling HTML data within the Catmandu framework
# SYNOPSIS
Command line client `catmandu`:
    catmandu convert HTML to JSON < ex/test.html > data/test.json
    catmandu convert JSON to HTML < data/test.json > ex/test.html
    # Keep the text
    curl -s http://librecat.org |
      catmandu convert HTML to Text --fix "html_text()"
    # Filter out the anchors
    curl -s http://librecat.org |
      catmandu convert HTML to Text --fix "html_filter_tag(a); html_text(join:\"\n\")"
    # Filter out the meta tags as YAML
    curl -s http://librecat.org |
      catmandu convert HTML to YAML --fix "html_filter_tag(meta, group_by:name)"
See documentation of modules for more examples.
# AVAILABLE MODULES
- [Catmandu::Exporter::HTML](https://metacpan.org/pod/Catmandu::Exporter::HTML)
    Serialize HTML data
- [Catmandu::Importer::HTML](https://metacpan.org/pod/Catmandu::Importer::HTML)
    Parse HTML data
- [Catmandu::Fix::html\_filter\_tag](https://metacpan.org/pod/Catmandu::Fix::html_filter_tag)
    Filter HTML by name and optional group them
- [Catmandu::Fix::html\_filter\_type](https://metacpan.org/pod/Catmandu::Fix::html_filter_type)
    Filter HTML by type
- [Catmandu::Fix::html\_text](https://metacpan.org/pod/Catmandu::Fix::html_text)
    Select only the literal text values from the HTML
# SEE ALSO
This module is based on [Catmandu](https://metacpan.org/pod/Catmandu)
# AUTHORS
- Patrick Hochstenbach, `patrick.hochstenbach at ugent.be`
# COPYRIGHT AND LICENSE
The Perl software is copyright (c) 2018 by Patrick Hochstenbach.
This is free software; you can redistribute it and/or modify it under the same
terms as the Perl 5 programming language system itself.