https://github.com/mysticmind/reversemarkdown-net
ReverseMarkdown.Net is a Html to Markdown converter library in C#. Conversion is very reliable since HtmlAgilityPack (HAP) library is used for traversing the Html DOM
https://github.com/mysticmind/reversemarkdown-net
converter-library dotnet html markdown markdown-to-html netcore netstandard
Last synced: 20 days ago
JSON representation
ReverseMarkdown.Net is a Html to Markdown converter library in C#. Conversion is very reliable since HtmlAgilityPack (HAP) library is used for traversing the Html DOM
- Host: GitHub
- URL: https://github.com/mysticmind/reversemarkdown-net
- Owner: mysticmind
- License: mit
- Created: 2015-07-06T06:16:38.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2024-06-29T00:57:12.000Z (over 1 year ago)
- Last Synced: 2024-10-23T12:22:56.815Z (over 1 year ago)
- Topics: converter-library, dotnet, html, markdown, markdown-to-html, netcore, netstandard
- Language: C#
- Homepage:
- Size: 464 KB
- Stars: 279
- Watchers: 6
- Forks: 66
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
- awesome-dotnet-core - ReverseMarkdown - Html to Markdown converter library. (Frameworks, Libraries and Tools / Misc)
- awesome-dotnet-core - ReverseMarkdown - Html到Markdown转换器库,附带一些unix shell终端优势。 (框架, 库和工具 / 大杂烩)
- fucking-awesome-dotnet-core - ReverseMarkdown - Html to Markdown converter library. (Frameworks, Libraries and Tools / Misc)
- awesome-dotnet-core - ReverseMarkdown - Html to Markdown converter library. (Frameworks, Libraries and Tools / Misc)
- system-architecture-awesome - ReverseMarkdown - Html to Markdown converter library. (Misc)
README
# Meet ReverseMarkdown
[](https://github.com/mysticmind/reversemarkdown-net/actions/workflows/ci.yaml) [](https://www.nuget.org/packages/ReverseMarkdown/)
ReverseMarkdown is a Html to Markdown converter library in C#. Conversion is very reliable since the HtmlAgilityPack (HAP) library is used for traversing the HTML DOM.
If you have used and benefitted from this library. Please feel free to sponsor me!

## Usage
Install the package from NuGet using `Install-Package ReverseMarkdown` or clone the repository and build it yourself.
```cs
var converter = new ReverseMarkdown.Converter();
string html = "This a sample paragraph from my site";
string result = converter.Convert(html);
```
snippet source | anchor
Will result in:
```txt
This a sample **paragraph** from [my site](http://test.com)
```
snippet source | anchor
The conversion can also be customized:
```cs
var config = new ReverseMarkdown.Config
{
// Include the unknown tag completely in the result (default as well)
UnknownTags = Config.UnknownTagsOption.PassThrough,
// generate GitHub flavoured markdown, supported for BR, PRE and table tags
GithubFlavored = true,
// will ignore all comments
RemoveComments = true,
// remove markdown output for links where appropriate
SmartHrefHandling = true
};
var converter = new ReverseMarkdown.Converter(config);
```
snippet source | anchor
## Configuration options
* `DefaultCodeBlockLanguage` - Option to set the default code block language for Github style markdown if class based language markers are not available
* `GithubFlavored` - Github style markdown for br, pre and table. Default is false
* `SlackFlavored` - Slack style markdown formatting. When enabled, uses `*` for bold, `_` for italic, `~` for strikethrough, and `•` for list bullets. Default is false
* `CleanupUnnecessarySpaces` - Cleanup unnecessary spaces in the output. Default is true
* `SuppressDivNewlines` - Removes prefixed newlines from `div` tags. Default is false
* `ListBulletChar` - Allows you to change the bullet character. Default value is `-`. Some systems expect the bullet character to be `*` rather than `-`, this config allows you to change it. Note: This option is ignored when `SlackFlavored` is enabled
* `RemoveComments` - Remove comment tags with text. Default is false
* `SmartHrefHandling` - How to handle `` tag href attribute
* `false` - Outputs `[{name}]({href}{title})` even if the name and href is identical. This is the default option.
* `true` - If the name and href equals, outputs just the `name`. Note that if the Uri is not well formed as per [`Uri.IsWellFormedUriString`](https://docs.microsoft.com/en-us/dotnet/api/system.uri.iswellformeduristring) (i.e string is not correctly escaped like `http://example.com/path/file name.docx`) then markdown syntax will be used anyway.
If `href` contains `http/https` protocol, and `name` doesn't but otherwise are the same, output `href` only
If `tel:` or `mailto:` scheme, but afterwards identical with name, output `name` only.
* `UnknownTags` - handle unknown tags.
* `UnknownTagsOption.PassThrough` - Include the unknown tag completely into the result. That is, the tag along with the text will be left in output. This is the default
* `UnknownTagsOption.Drop` - Drop the unknown tag and its content
* `UnknownTagsOption.Bypass` - Ignore the unknown tag but try to convert its content
* `UnknownTagsOption.Raise` - Raise an error to let you know
* `PassThroughTags` - Pass a list of tags to pass through as-is without any processing.
* `WhitelistUriSchemes` - Specify which schemes (without trailing colon) are to be allowed for `` and `
` tags. Others will be bypassed (output text or nothing). By default allows everything.
If `string.Empty` provided and when `href` or `src` schema couldn't be determined - whitelists
Schema is determined by `Uri` class, with exception when url begins with `/` (file schema) and `//` (http schema)
* `TableWithoutHeaderRowHandling` - handle table without header rows
* `TableWithoutHeaderRowHandlingOption.Default` - First row will be used as header row (default)
* `TableWithoutHeaderRowHandlingOption.EmptyRow` - An empty row will be added as the header row
* `TableHeaderColumnSpanHandling` - Set this flag to handle or process table header column with column spans. Default is true
* `Base64Images` - Control how base64-encoded images (inline data URIs) are handled during conversion
* `Base64ImageHandling.Include` - Include base64-encoded images in the markdown output as-is (default behavior)
* `Base64ImageHandling.Skip` - Skip/ignore base64-encoded images entirely
* `Base64ImageHandling.SaveToFile` - Save base64-encoded images to disk and reference the saved file path in markdown. Requires `Base64ImageSaveDirectory` to be set
* `Base64ImageSaveDirectory` - When `Base64Images` is set to `SaveToFile`, specifies the directory path where images should be saved
* `Base64ImageFileNameGenerator` - When `Base64Images` is set to `SaveToFile`, this function generates a filename for each saved image. The function receives the image index (int) and MIME type (string), and should return a filename without extension. If not specified, images will be named as `image_0`, `image_1`, etc.
### Base64 Image Handling Examples
ReverseMarkdown provides flexible options for handling base64-encoded images (inline data URIs) during HTML to Markdown conversion.
**Include Base64 Images (Default)**
By default, base64-encoded images are included in the markdown output as-is:
```cs
var converter = new ReverseMarkdown.Converter();
string html = "
";
string result = converter.Convert(html);
// Output: 
```
snippet source | anchor
**Skip Base64 Images**
To ignore base64-encoded images entirely:
```cs
var config = new ReverseMarkdown.Config
{
Base64Images = Config.Base64ImageHandling.Skip
};
var converter = new ReverseMarkdown.Converter(config);
string html = "
";
string result = converter.Convert(html);
// Output: (empty - image is skipped)
```
snippet source | anchor
**Save Base64 Images to Disk**
To extract and save base64-encoded images to disk:
```cs
var config = new ReverseMarkdown.Config
{
Base64Images = Config.Base64ImageHandling.SaveToFile,
Base64ImageSaveDirectory = "/path/to/images"
};
var converter = new ReverseMarkdown.Converter(config);
string html = "
";
string result = converter.Convert(html);
// Output: 
// Image file saved to: /path/to/images/image_0.png
```
snippet source | anchor
**Custom Filename Generator**
You can provide a custom filename generator for saved images:
```cs
var config = new ReverseMarkdown.Config
{
Base64Images = Config.Base64ImageHandling.SaveToFile,
Base64ImageSaveDirectory = "/path/to/images",
Base64ImageFileNameGenerator = (index, mimeType) =>
{
var timestamp = DateTime.Now.ToString("yyyyMMdd_HHmmss");
return $"converted_{timestamp}_{index}";
}
};
var converter = new ReverseMarkdown.Converter(config);
// Images will be saved as: converted_20260108_143022_0.png, converted_20260108_143022_1.jpg, etc.
```
snippet source | anchor
**Supported Image Formats:**
- PNG (`image/png`)
- JPEG (`image/jpeg`, `image/jpg`)
- GIF (`image/gif`)
- BMP (`image/bmp`)
- TIFF (`image/tiff`)
- WebP (`image/webp`)
- SVG (`image/svg+xml`)
## Features
* Supports all the established html tags like h1, h2, h3, h4, h5, h6, p, em, strong, i, b, blockquote, code, img, a, hr, li, ol, ul, table, tr, th, td, br
* Supports nested lists
* Github Flavoured Markdown conversion supported for br, pre, tasklists and table. Use `var config = new ReverseMarkdown.Config(githubFlavoured:true);`. By default the table will always be converted to Github flavored markdown immaterial of this flag
* Slack Flavoured Markdown conversion supported. Use `var config = new ReverseMarkdown.Config { SlackFlavored = true };`
* Improved performance with optimized text writer approach and O(1) ancestor lookups
* Support for nested tables (converted as HTML inside markdown)
* Support for table captions (rendered as paragraph above table)
* Base64-encoded image handling with options to include as-is, skip, or save to disk
## Breaking Changes
### v5.0.0
**Configuration Changes:**
* `WhitelistUriSchemes` - Changed from `string[]` to `HashSet` (read-only property). Use `.Add()` method to add schemes instead of array assignment
* `PassThroughTags` - Changed from `string[]` to `HashSet`
**API Changes:**
* `IConverter` interface signature changed from `string Convert(HtmlNode node)` to `void Convert(TextWriter writer, HtmlNode node)`. If you have custom converters, you'll need to update them to write to the TextWriter instead of returning a string
**Target Framework Changes:**
* Removed support for legacy and end-of-life .NET versions. Only actively supported .NET versions are now targeted i.e. .NET 8, .NET 9 and .NET 10.
### v2.0.0
* `UnknownTags` config has been changed to an enumeration
## Acknowledgements
This library's initial implementation ideas from the Ruby based Html to Markdown converter [xijo/reverse_markdown](https://github.com/xijo/reverse_markdown).
## Copyright
Copyright © Babu Annamalai
## License
ReverseMarkdown is licensed under [MIT](http://www.opensource.org/licenses/mit-license.php "Read more about the MIT license form"). Refer to [License file](https://github.com/mysticmind/reversemarkdown-net/blob/master/LICENSE) for more information.