https://github.com/dotpot/custom-string-parser

Custom string data parser written in python
https://github.com/dotpot/custom-string-parser

Last synced: over 1 year ago
JSON representation

Custom string data parser written in python

Host: GitHub
URL: https://github.com/dotpot/custom-string-parser
Owner: dotpot
Created: 2012-02-05T17:56:48.000Z (over 14 years ago)
Default Branch: master
Last Pushed: 2013-03-07T10:43:01.000Z (about 13 years ago)
Last Synced: 2025-01-05T22:12:10.246Z (over 1 year ago)
Language: Python
Size: 129 KB
Stars: 4
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          
Custom String Parser


You can use this component if you need to parse any information from any string value which has some syntax logics.


The easiest way to parse data from string in python.


Overview


CustomStringParser, the missing simple string parser for python developers.


Usage


Parsing HTML


Note: for html based parsing you should consider using xpath



Imagine you have this kind of content in your string_data with this content:


<div class="section-item">

    <div class="section-title">

        title1

    </div> <!-- end section-title -->

    <div class="section-comments">

        15

    </div> <!-- end section-comments -->

</div> <!--end section-item-->

<div class="section-item">

    <div class="section-title">

        title2

    </div> <!-- end section-title -->

    <div class="section-comments">

        16

    </div> <!-- end section-comments -->

</div> <!--end section-item-->

<div class="section-item">

    <div class="section-title">

            title3

    </div> <!-- end section-title -->

    <div class="section-comments">

        17

    </div> <!-- end section-comments -->

</div> <!--end section-item-->



We need to parse these items:




title

comments count



Code to parse this looks like this:


parser = CustomStringParserCore(string_data)

item_parser = ParsingNode('item', '<div class="section-item">', '</div> <!--end section-item-->')

title_parser = ParsingNode('title', '<div class="section-title">', '</div> <!-- end section title -->')

comments_parser = ParsingNode('comments', '<div class="section-comments">', '</div> <!-- end section-comments -->')

# note: our item result will have title and comments inside of it, so we can do this:

item_parser.add_parser(title_parser)

item_parser.add_parser(comments_parser)

# add main parser to the parsing core

parser.add_parser(item_parser)

# call the parse

parser.parse()

<..>



output (print_results(item_parser.results)):


item:


<div class="section-title">

        title1

    </div> <!-- end section-title -->

    <div class="section-comments">

        15

    </div> <!-- end section-comments -->



title:


title1



comments:


15



item:


<div class="section-title">

        title2

    </div> <!-- end section-title -->

    <div class="section-comments">

        16

    </div> <!-- end section-comments -->



title:


title2



comments:


16



item:


<div class="section-title">

            title3

    </div> <!-- end section-title -->

    <div class="section-comments">

        17

    </div> <!-- end section-comments -->



title:


title3



comments:


17



This is very generic, so you can parse practically any structure.


Unit tests

This library suppose to be fully unit tested. So if you want to participate keep that in mind.

Feature ideas ( not yet implemented )

* Regex based parsers possibility.

* Grouped regex based parsers possibility.

* XPath based parsers possibility.

* Filtering out results by parser name.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dotpot/custom-string-parser

Awesome Lists containing this project

README

Custom String Parser

Overview

Usage

Parsing HTML

This is very generic, so you can parse practically any structure.

Unit tests

Feature ideas ( not yet implemented )