{"id":16842904,"url":"https://github.com/teverett/htmlparser","last_synced_at":"2025-10-15T18:50:36.253Z","repository":{"id":15971771,"uuid":"18714577","full_name":"teverett/HTMLParser","owner":"teverett","description":"HTML Parser","archived":false,"fork":false,"pushed_at":"2021-08-02T16:06:18.000Z","size":462,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-24T12:27:11.455Z","etag":null,"topics":["antlr","html-parser","java"],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/teverett.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-04-12T21:02:51.000Z","updated_at":"2021-08-02T16:06:13.000Z","dependencies_parsed_at":"2022-08-04T07:15:20.115Z","dependency_job_id":null,"html_url":"https://github.com/teverett/HTMLParser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teverett%2FHTMLParser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teverett%2FHTMLParser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teverett%2FHTMLParser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teverett%2FHTMLParser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/teverett","download_url":"https://codeload.github.com/teverett/HTMLParser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244166639,"owners_count":20409177,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["antlr","html-parser","java"],"created_at":"2024-10-13T12:48:57.415Z","updated_at":"2025-10-15T18:50:31.223Z","avatar_url":"https://github.com/teverett.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Travis](https://travis-ci.org/teverett/HTMLParser.svg?branch=master)](https://travis-ci.org/teverett/HTMLParser)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/9ebea7ee219e4210bf17ac5f99b73303)](https://www.codacy.com/app/teverett/HTMLParser?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=teverett/HTMLParser\u0026amp;utm_campaign=Badge_Grade)\n\nHTMLParser\n==========\n\nA simple HTML Parser using [ANTLR4](http://www.antlr.org/)\n\nMaven Coordinates\n--------\n\n\t\u003cdependency\u003e\n\t\t\u003cgroupId\u003ecom.khubla.htmlparser\u003c/groupId\u003e\n\t\t\u003cartifactId\u003ehtmlparser\u003c/artifactId\u003e\n\t\t\u003cversion\u003e1.0\u003c/version\u003e\n\t\t\u003ctype\u003ejar\u003c/type\u003e\n\t\t\u003cscope\u003ecompile\u003c/scope\u003e\n\t\u003c/dependency\u003e\n\n\nFetching and Validating a Page\n---------\n\nHTMLParser can be used as a command-line jar file to fetch a single page and parse it.  Parse errors will be logged to the console. For example\n\n\u003cpre\u003e\nsh fetch.sh http://www.slashdot.org\n\u003c/pre\u003e\n\nExample Usage of the Library\n---------\n\nTo parse an arbitrary HTML document using the callback parser, provide an implementation of [HTMLParserListener](https://github.com/teverett/HTMLParser/blob/master/src/main/java/com/khubla/htmlparser/grammar/HTMLParserListener.java) along with an InputStream of HTML to [HTMLDocumentParser:parse](https://github.com/teverett/HTMLParser/blob/master/src/main/java/com/khubla/htmlparser/HTMLDocumentParser.java)\n\n\u003cpre\u003e\n  final InputStream inputStream = TestTreeWalk.class.getResourceAsStream(\"/example1.html\");\n  final HTMLParserListener htmlParserListener = new ExampleListener();\n  HTMLDocumentParser.parse(inputStream, htmlParserListener);\n\u003c/pre\u003e\n\nLicensing\n---------\n\nHTMLParser is licensed under the [GPLv2](https://github.com/teverett/HTMLParser/blob/master/LICENSE)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteverett%2Fhtmlparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteverett%2Fhtmlparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteverett%2Fhtmlparser/lists"}