Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aaronsw/html2text
Convert HTML to Markdown-formatted text.
https://github.com/aaronsw/html2text
Last synced: 6 days ago
JSON representation
Convert HTML to Markdown-formatted text.
- Host: GitHub
- URL: https://github.com/aaronsw/html2text
- Owner: aaronsw
- License: gpl-3.0
- Created: 2011-01-28T15:06:09.000Z (almost 14 years ago)
- Default Branch: master
- Last Pushed: 2024-02-27T18:49:46.000Z (10 months ago)
- Last Synced: 2024-11-28T22:03:51.528Z (13 days ago)
- Language: Python
- Homepage: http://www.aaronsw.com/2002/html2text/
- Size: 505 KB
- Stars: 2,650
- Watchers: 79
- Forks: 413
- Open Issues: 68
-
Metadata Files:
- Readme: README.md
- License: COPYING
Awesome Lists containing this project
- project-awesome - aaronsw/html2text - Convert HTML to Markdown-formatted text. (Python)
README
# [html2text](http://www.aaronsw.com/2002/html2text/)
html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: `html2text.py [(filename|url) [encoding]]`
Options:
--version show program's version number and exit
-h, --help show this help message and exit
--ignore-links don't include any formatting for links
--ignore-images don't include any formatting for images
-g, --google-doc convert an html-exported Google Document
-d, --dash-unordered-list
use a dash rather than a star for unordered list items
-b BODY_WIDTH, --body-width=BODY_WIDTH
number of characters per output line, 0 for no wrap
-i LIST_INDENT, --google-list-indent=LIST_INDENT
number of pixels Google indents nested lists
-s, --hide-strikethrough
hide strike-through text. only relevent when -g is
specified as wellOr you can use it from within Python:
import html2text
print html2text.html2text("Hello, world.
")Or with some configuration options:
import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("Hello, world!")
_Originally written by Aaron Swartz. This code is distributed under the GPLv3._
## How to do a release
1. Update the version in `html2text.py`
2. Update the version in `setup.py`
3. Run `python setup.py sdist upload`## How to run unit tests
cd test/
python run_tests.py[![Build Status](https://secure.travis-ci.org/aaronsw/html2text.png)](http://travis-ci.org/aaronsw/html2text)