Projects in Awesome Lists tagged with html-extraction
A curated list of projects in awesome lists tagged with html-extraction .
https://github.com/miso-belica/sumy
Module for automatic summarization of text documents and HTML pages.
html-extraction html-extractor html-page lsa nlp pagerank-algorithm python reduction summarization summarizer summary sumy text-extraction textteaser
Last synced: 14 Feb 2026
https://github.com/bookieio/breadability
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
html-extraction html-extractor html-parsing python text-extraction text-mining
Last synced: 21 Oct 2025
https://github.com/html-extract/hext
Domain-specific language for extracting structured data from HTML documents
cpp data-extraction dsl html html-extraction node php python ruby scraping
Last synced: 15 Apr 2025
https://github.com/whomrx666/xtract-html
Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.
html html-extraction html-extractor kali-linux linux termux termux-tool xtract-html
Last synced: 11 Mar 2026
https://github.com/whomrx666/xtract-htmlv2
Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version
extract html-extraction html-extractor kali-linux linux termux termux-tool xtract-htmlv2
Last synced: 16 Mar 2025
https://github.com/reasonkit/reasonkit-web
High-performance MCP server for browser automation, web capture, and content extraction. Rust-powered CDP client for AI agents.
agent-tools ai-agent async automation browser-automation cdp chrome-devtools-protocol chromium developer-tools headless-browser html-extraction llm-tools mcp model-context-protocol pdf rust screenshot tokio web-automation web-scraping
Last synced: 08 Jan 2026