Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/teverett/htmlparser

HTML Parser
https://github.com/teverett/htmlparser

antlr html-parser java

Last synced: about 1 month ago
JSON representation

HTML Parser

Awesome Lists containing this project

README

        

[![Travis](https://travis-ci.org/teverett/HTMLParser.svg?branch=master)](https://travis-ci.org/teverett/HTMLParser)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/9ebea7ee219e4210bf17ac5f99b73303)](https://www.codacy.com/app/teverett/HTMLParser?utm_source=github.com&utm_medium=referral&utm_content=teverett/HTMLParser&utm_campaign=Badge_Grade)

HTMLParser
==========

A simple HTML Parser using [ANTLR4](http://www.antlr.org/)

Maven Coordinates
--------


com.khubla.htmlparser
htmlparser
1.0
jar
compile

Fetching and Validating a Page
---------

HTMLParser can be used as a command-line jar file to fetch a single page and parse it. Parse errors will be logged to the console. For example


sh fetch.sh http://www.slashdot.org

Example Usage of the Library
---------

To parse an arbitrary HTML document using the callback parser, provide an implementation of [HTMLParserListener](https://github.com/teverett/HTMLParser/blob/master/src/main/java/com/khubla/htmlparser/grammar/HTMLParserListener.java) along with an InputStream of HTML to [HTMLDocumentParser:parse](https://github.com/teverett/HTMLParser/blob/master/src/main/java/com/khubla/htmlparser/HTMLDocumentParser.java)


final InputStream inputStream = TestTreeWalk.class.getResourceAsStream("/example1.html");
final HTMLParserListener htmlParserListener = new ExampleListener();
HTMLDocumentParser.parse(inputStream, htmlParserListener);

Licensing
---------

HTMLParser is licensed under the [GPLv2](https://github.com/teverett/HTMLParser/blob/master/LICENSE)