Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/html-extract/hext

Domain-specific language for extracting structured data from HTML documents

cpp data-extraction dsl html html-extraction node php python ruby scraping

Last synced: 01 Jul 2024

https://github.com/bookieio/breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

html-extraction html-extractor html-parsing python text-extraction text-mining

Last synced: 27 Mar 2024