Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dlon/html2markdown
Conservatively convert html to markdown
https://github.com/dlon/html2markdown
Last synced: 4 days ago
JSON representation
Conservatively convert html to markdown
- Host: GitHub
- URL: https://github.com/dlon/html2markdown
- Owner: dlon
- License: mit
- Created: 2017-01-21T21:22:12.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2020-09-17T20:46:01.000Z (over 4 years ago)
- Last Synced: 2025-01-14T13:13:08.415Z (11 days ago)
- Language: Python
- Size: 31.3 KB
- Stars: 98
- Watchers: 9
- Forks: 65
- Open Issues: 7
-
Metadata Files:
- Readme: README.rst
- License: LICENSE.txt
Awesome Lists containing this project
- starred-awesome - html2markdown - Conservatively convert html to markdown (Python)
README
=============
html2markdown
=============.. image:: https://travis-ci.com/dlon/html2markdown.svg?branch=master
:target: https://travis-ci.com/dlon/html2markdown**Experimental**
**Purpose**: Converts html to markdown while preserving unsupported html markup. The goal is to generate markdown that can be converted back into html. This is the major difference between html2markdown and html2text. The latter doesn't purport to be reversible.
Usage example
=============
::import html2markdown
print html2markdown.convert('Test
')Here is some code
Output::
## Test
Here is some codeInformation and caveats
=======================Does not convert the content of block-type tags other than ``
`` -- such as ``
`` tags -- into Markdown
-------------------------------------------------------------------------------------------------------------It does convert to markdown the content of inline-type tags, e.g. ````.
**Input**: ``
this is stuff. stuff``**Result**: ``
this is stuff. stuff``**Input**: ``
this is stuff. stuff
``**Result**: ``this is stuff. __stuff__`` (surrounded by a newline on either side)
**Input**: ``strike through some text here``
**Result**: ``strike __through__ some text here``
Except in unprocessed block-type tags, formatting characters are escaped
------------------------------------------------------------------------**Input**: ``
**escape me?**
`` (in html, we would use \ here)**Result**: ``\*\*escape me?\*\*``
**Input**: ``**escape me?**``
**Result**: ``\*\*escape me?\*\*``
**Input**: ``
**escape me?**``**Result**: ``
**escape me?**`` (block-type)Attributes not supported by Markdown are kept
---------------------------------------------**Example**: ``link``
**Result**: ``[__link__](http://myaddress "click me")``
**Example**: ``link``
**Result**: ``__link__`` (the attribute *onclick* is not supported, so the tag is left alone)
Limitations
===========- Tables are kept as html.
Changes
=======0.1.7:
- Improved handling of inline tags.
- Fix: Ignore ```` tags without an href attribute.
- Improve escaping.0.1.6: Added tests and support for Python versions below 2.7.
0.1.5: Fix Unicode issue in Python 3.
0.1.0: First version.