Projects in Awesome Lists tagged with html2text
A curated list of projects in awesome lists tagged with html2text .
https://github.com/adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
article-extractor corpus corpus-builder corpus-tools crawler html-to-markdown html2text news news-aggregator news-crawler nlp readability rss-feed scraping tei text-cleaning text-extraction text-mining text-preprocessing web-scraping
Last synced: 24 Dec 2025
https://github.com/jaytaylor/html2text
Golang HTML to plaintext conversion library
go golang html-emails html2text plaintext
Last synced: 14 May 2025
https://github.com/weblyzard/inscriptis
A python based HTML to text conversion library, command line client and Web service.
client converter html html2text library python web-service
Last synced: 14 May 2025
https://github.com/inaridiy/webforai
The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.
article-extractor extractor html-to-markdown html2markdown html2md html2text readability scraping text-mining
Last synced: 06 Apr 2025
https://github.com/rxnlp/nlp-cloud-apis
RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
html2text mashape natural-language-processing nlp nlp-apis opinosis-summarization rxnlp-apis sentence-clustering text-mining topic-extraction
Last synced: 06 Feb 2026
https://github.com/thatxliner/unmarkd
An extremely configurable markdown reverser for Python3.
beautifulsoup flexible html html2text markdown markdown-reverser parser python python3 reverse-engineering reverse-markdown reverser
Last synced: 18 Mar 2025
https://github.com/deedy5/html2text_rs
Python library for converting HTML to markup or plain text
html-to-markdown html-to-text html2markdown html2md html2text markdown python
Last synced: 13 Feb 2026
https://github.com/pH-7/Html2Text
A very simple (but efficient) "HTML to plain text" converter ✍️
converter convertor email-text-parsing html-converter html-text-conversion html2text htmltotext php php7 plain-text symfony-mailer text text-converter text-convertor
Last synced: 18 Jul 2025
https://github.com/ph-7/html2text
A very simple (but efficient) "HTML to plain text" converter ✍️
converter convertor email-text-parsing html-converter html-text-conversion html2text htmltotext php php7 plain-text symfony-mailer text text-converter text-convertor
Last synced: 09 Apr 2025
https://github.com/x28/inscriptis-java
inscriptis - HTML to text conversion library for Java
converter html2text java library
Last synced: 14 Jan 2026
https://github.com/andythefactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
article-extractor corpus corpus-builder corpus-tools dataset datasets html-to-markdown html2text news news-aggregator news-crawler readability scraping scraping-websites text-cleaning text-extraction text-mining text-preprocessing web-scraping
Last synced: 27 Jan 2026
https://github.com/importcjj/go-readability
Go package that cleans a HTML page for better readability.
extractor go golang html html-extractor html2text readability text text-extraction
Last synced: 14 Jan 2026
https://github.com/kr1shnasomani/webscrub
Python code which extracts the html content, converts it to clean text and pre-processes the text
beautifulsoup html2text natural-language-processing pypi scikit-learn selenium spacy
Last synced: 07 Apr 2025
https://github.com/masroore/php-html2text
A PHP package to convert HTML into a plain text format
Last synced: 21 Mar 2025
https://github.com/puhoy/readability_cli
a cli tool to fetch webpages main content and print it as markdown
fetch-webpages html-to-markdown html2text markdown python3 readability readability-cli readability-lxml
Last synced: 21 Apr 2026
https://github.com/luminati-io/rag-chatbot
A Python-based RAG chatbot leveraging GPT-4o and Bright Data's SERP API to deliver contextually rich and up-to-date AI responses using real-time search engine data.
ai api beautifulsoup4 bright-data chatbot chatbots chatgpt html2text json playwright python rag serp serp-api
Last synced: 08 Apr 2026
https://github.com/gemichelst/notesconverter
converts any .html file in a specified folder into a .txt file and combines all single .txt files into one big text file
apple-notes bash bash-script export-notes google-keep html2text macos notes windows
Last synced: 14 Apr 2026
https://github.com/cerebnismus/web2pcat
html2text and pygments
eecs html2text lecture-notes osx pcat pygmentize pygments python python-3 python-script python3 sakarya sakarya-universitesi sistem-programlama system-programming
Last synced: 18 Mar 2025
https://github.com/cerebnismus/html2pcat
html2text and pygments
c eecs html2text lecture-notes makefile pcat pip pip3 pygmentize pygments python python-3 python3 sakarya sakarya-universitesi sistem-programlama system-programming
Last synced: 08 Jul 2025