Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/html-extract/hext
Domain-specific language for extracting structured data from HTML documents
cpp data-extraction dsl html html-extraction node php python ruby scraping
Last synced: 01 Jul 2024
![](https://github.com/html-extract.png)
https://github.com/miso-belica/sumy
Module for automatic summarization of text documents and HTML pages.
html-extraction html-extractor html-page lsa nlp pagerank-algorithm python reduction summarization summarizer summary sumy text-extraction textteaser
Last synced: 28 Apr 2024
![](https://github.com/miso-belica.png)
https://github.com/bookieio/breadability
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
html-extraction html-extractor html-parsing python text-extraction text-mining
Last synced: 27 Mar 2024
![](https://github.com/bookieio.png)