Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mozilla/fathom

A framework for extracting meaning from web pages
https://github.com/mozilla/fathom

Last synced: 3 days ago
JSON representation

A framework for extracting meaning from web pages

Awesome Lists containing this project

README

        

# Fathom

Fathom is a supervised-learning system for recognizing parts of web pages—pop-ups, address forms, slideshows—or for classifying a page as a whole. A DOM flows in one side, and DOM nodes flow out the other, tagged with types and probabilities that those types are correct. A Prolog-like language makes it straightforward to specify the “smells” that suggest each type, and a neural-net-based trainer determines the optimal contribution of each smell. Finally, the FathomFox web extension lets you collect and label a corpus of web pages for training.

Continue reading at .

__[Documentation](https://mozilla.github.io/fathom)__