Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Alir3z4/html2text
Convert HTML to Markdown-formatted text.
https://github.com/Alir3z4/html2text
markdown markdown-parser python
Last synced: 5 days ago
JSON representation
Convert HTML to Markdown-formatted text.
- Host: GitHub
- URL: https://github.com/Alir3z4/html2text
- Owner: Alir3z4
- License: gpl-3.0
- Created: 2014-02-19T22:41:11.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2024-07-25T15:51:38.000Z (3 months ago)
- Last Synced: 2024-10-29T11:02:24.074Z (5 days ago)
- Topics: markdown, markdown-parser, python
- Language: Python
- Homepage: alir3z4.github.io/html2text/
- Size: 1.21 MB
- Stars: 1,830
- Watchers: 26
- Forks: 273
- Open Issues: 97
-
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog.rst
- Contributing: docs/contributing.md
- License: COPYING
- Authors: AUTHORS.rst
Awesome Lists containing this project
- awesome-python-resources - GitHub - 39% open · ⏱️ 22.02.2022): (网络)
- starred-awesome - html2text - Convert HTML to Markdown-formatted text. (Python)
- project-awesome - Alir3z4/html2text - Convert HTML to Markdown-formatted text. (Python)
- best-of-web-python - GitHub - 39% open · ⏱️ 28.02.2024): (Markdown)
README
# html2text
[![CI](https://github.com/Alir3z4/html2text/actions/workflows/main.yml/badge.svg?branch=master)](https://github.com/Alir3z4/html2text/actions/workflows/main.yml)
[![codecov](https://codecov.io/gh/Alir3z4/html2text/graph/badge.svg?token=OoxiyymjgU)](https://codecov.io/gh/Alir3z4/html2text)html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: `html2text [filename [encoding]]`
| Option | Description
|--------------------------------------------------------|---------------------------------------------------
| `--version` | Show program's version number and exit
| `-h`, `--help` | Show this help message and exit
| `--ignore-links` | Don't include any formatting for links
|`--escape-all` | Escape all special characters. Output is less readable, but avoids corner case formatting issues.
| `--reference-links` | Use reference links instead of links to create markdown
| `--mark-code` | Mark preformatted and code blocks with [code]...[/code]For a complete list of options see the [docs](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)
Or you can use it from within `Python`:
```
>>> import html2text
>>>
>>> print(html2text.html2text("Zed's dead baby, Zed's dead.
"))
**Zed's** dead baby, _Zed's_ dead.```
Or with some configuration options:
```
>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("Hello, world!")
Hello, world!>>> print(h.handle("
Hello, world!"))
Hello, world!
>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("Hello, world!"))
Hello, [world](https://www.google.com/earth/)!```
*Originally written by Aaron Swartz. This code is distributed under the GPLv3.*
## How to install
`html2text` is available on pypi
https://pypi.org/project/html2text/```
$ pip install html2text
```## How to run unit tests
tox
To see the coverage results:
coverage html
then open the `./htmlcov/index.html` file in your browser.
## Documentation
Documentation lives [here](https://github.com/Alir3z4/html2text/blob/master/docs/usage.md)