Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/z505/fast-html-parser

Fast HTML Parser for FPC and Delphi
https://github.com/z505/fast-html-parser

Last synced: about 2 months ago
JSON representation

Fast HTML Parser for FPC and Delphi

Lists

README

        

# Fast HTML Parser
HTML Parser for FPC and Delphi originally written by Jazarsoft

* Modified for use as a pure command line unit (no dialogs) for freepascal.
* Also added UPPERCASE tags so that when you check for i.e. it returns
all tags like < FONT > and < FoNt > and < font >

## Versions
Revision 18 is Version 1 of this tool

After revision 18 version 2 of the tool is being worked on with more object methods to access
elements by Name or ID for example just like a DOM.

## Todo
* keep the entire HTML file in an array for later usage: htmltags[] and text[]
* parse like this: OnSection(opentag, text, closetag); as a different parser
kind so that globals are not needed to keep track of InTag booleans, etc.
so that all are together, tag, text, closing tag, in the same procedure
* associate a number (open tag) with the text label using a record or such
i.e. < body > < b >some text< / b >< / body >
where < b > is tag "2" and some text is text "1"
* turn into a DLL using FPC or C so that other languages can use a callback
to parse html fast in that language (i.e. golang, python, etc.)

Use this parser for what reasons:
* make your own web browsers,
* make your own text copies of web pages for caching purposes
* Grab content from websites -without- using regular expressions
* Seems to be MUCH MUCH FASTER than regular expressions, as it is after all
a true parser
* convert website tables into spreadsheets (parse TD and TR, turn in to
CSV or similar)
* convert websites into txt files
* convert website tables into CSV/Database (parse TD and TR)
* find certain info from a web page.. i.e. all the bold text or hyperlinks in
a page.
* Parse websites remotely from a CGI app using something like Sockets or
Synapse and SynWrap to first get the HTML site. This would allow you to
dynamically parse info from websites and display data on your site in real
time.
* HTML editor.. WYSIWYG or a partial WYSIWYG editor. Ambitious, but possible.
* HTML property editor. Not completely wysiwyg but ability to edit proprties
of tags. Work would need to be done to parse each property in a tag.