{"id":13413950,"url":"https://github.com/JohannesKaufmann/html-to-markdown","last_synced_at":"2025-03-14T20:30:52.401Z","repository":{"id":39229231,"uuid":"133520039","full_name":"JohannesKaufmann/html-to-markdown","owner":"JohannesKaufmann","description":"⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.","archived":false,"fork":false,"pushed_at":"2024-11-10T14:30:50.000Z","size":1259,"stargazers_count":1733,"open_issues_count":10,"forks_count":99,"subscribers_count":12,"default_branch":"main","last_synced_at":"2024-11-11T09:22:24.783Z","etag":null,"topics":["cli","converter","go","golang","html","html-to-markdown","markdown"],"latest_commit_sha":null,"homepage":"https://html-to-markdown.com","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JohannesKaufmann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-15T13:26:26.000Z","updated_at":"2024-11-11T09:15:44.000Z","dependencies_parsed_at":"2024-09-05T23:19:44.806Z","dependency_job_id":"455d5e55-bf12-444f-9125-6ad42347ab2c","html_url":"https://github.com/JohannesKaufmann/html-to-markdown","commit_stats":{"total_commits":110,"total_committers":12,"mean_commits":9.166666666666666,"dds":"0.24545454545454548","last_synced_commit":"8f50162e337b1efd7632c1fb31941b2f2bdd0c84"},"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohannesKaufmann%2Fhtml-to-markdown","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohannesKaufmann%2Fhtml-to-markdown/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohannesKaufmann%2Fhtml-to-markdown/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohannesKaufmann%2Fhtml-to-markdown/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JohannesKaufmann","download_url":"https://codeload.github.com/JohannesKaufmann/html-to-markdown/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243642030,"owners_count":20323951,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","converter","go","golang","html","html-to-markdown","markdown"],"created_at":"2024-07-30T20:01:53.488Z","updated_at":"2025-03-14T20:30:52.394Z","avatar_url":"https://github.com/JohannesKaufmann.png","language":"Go","funding_links":[],"categories":["开源类库","Specific Formats","Text Processing","Go","Open source library","文本处理","Markdown parser","markdown","转换工具","Template Engines","文本处理`解析和操作文本的代码库`","Bot Building"],"sub_categories":["文本处理","Markup Languages","Word Processing","查询语","标记语言","HTTP Clients","转成图片"],"readme":"# html-to-markdown\n\nA robust html-to-markdown converter that transforms HTML (even entire websites) into clean, readable Markdown. It supports complex formatting, customizable options, and plugins for full control over the conversion process.\n\nUse the fully extendable [Golang library](#golang-library) or a quick [CLI command](#cli---using-it-on-the-command-line). Alternatively, try the [Online Demo](https://html-to-markdown.com/demo) or [REST API](https://html-to-markdown.com/api) to see it in action!\n\nHere are some _cool features_:\n\n- **Bold \u0026 Italic:** Supports bold and italic—even within single words.\n\n  ![](./.github/images/point_bold_italic.png)\n\n- **List:** Handles ordered and unordered lists with full nesting support.\n\n  ![](./.github/images/point_list.png)\n\n- **Blockquote:** Blockquotes can include other elements, with seamless support for nested quotes.\n\n  ![](./.github/images/point_blockquote.png)\n\n- **Inline Code \u0026 Code Block:** Correctly handles backticks and multi-line code blocks, preserving code structure.\n\n  ![](./.github/images/point_code.png)\n\n- **Link \u0026 Image:** Properly formats multi-line links, adding escapes for blank lines where needed.\n\n  ![](./.github/images/point_link_image.png)\n\n- **Smart Escaping:** Escapes special characters only when necessary, to avoid accidental Markdown rendering.\n  🗒️ [ESCAPING.md](/ESCAPING.md)\n\n  ![](./.github/images/point_escaping.png)\n\n- **Remove/Keep HTML:** Choose to strip or retain specific HTML tags for ultimate control over output.\n\n  ![](./.github/images/point_wrapper.png)\n\n- **Plugins:** Easily extend with plugins. Or create custom ones to enhance functionality.\n\n  ![](./.github/images/point_strikethrough.png)\n\n- **Table Plugin:** Converts tables with support for alignment, rowspan and colspan.\n\n  ![](./.github/images/point_table.png)\n\n---\n\n## Usage\n\n[💻 Golang library](#golang-library) | [📦 CLI](#cli---using-it-on-the-command-line) | [▶️ Hosted Demo](https://html-to-markdown.com/demo) | [🌐 Hosted REST API](https://html-to-markdown.com/api)\n\n\u003e [!TIP]\n\u003e Looking for an all in one cloud solution? We're _sponsored_ by [🔥 Firecrawl](https://html-to-markdown.com/sponsor/firecrawl), where you can scrape any website and turn it into AI friendly markdown with one API call.\n\n---\n\n## Golang Library\n\n### Installation\n\n```bash\ngo get -u github.com/JohannesKaufmann/html-to-markdown/v2\n```\n\n_Or if you want a specific commit add the suffix `/v2@commithash`_\n\n\u003e [!NOTE]  \n\u003e This is the documentation for the v2 library. For the old version switch to the [\"v1\" branch](https://github.com/JohannesKaufmann/html-to-markdown/tree/v1).\n\n### Usage\n\n[![Go V2 Reference](https://pkg.go.dev/badge/github.com/JohannesKaufmann/html-to-markdown/v2.svg)](https://pkg.go.dev/github.com/JohannesKaufmann/html-to-markdown/v2)\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\n\thtmltomarkdown \"github.com/JohannesKaufmann/html-to-markdown/v2\"\n)\n\nfunc main() {\n\tinput := `\u003cstrong\u003eBold Text\u003c/strong\u003e`\n\n\tmarkdown, err := htmltomarkdown.ConvertString(input)\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tfmt.Println(markdown)\n\t// Output: **Bold Text**\n}\n```\n\n- 🧑‍💻 [Example code, basics](/examples/basics/main.go)\n\nUse `WithDomain` to convert _relative_ links to _absolute_ links:\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\n\thtmltomarkdown \"github.com/JohannesKaufmann/html-to-markdown/v2\"\n\t\"github.com/JohannesKaufmann/html-to-markdown/v2/converter\"\n)\n\nfunc main() {\n\tinput := `\u003cimg src=\"/assets/image.png\" /\u003e`\n\n\tmarkdown, err := htmltomarkdown.ConvertString(\n\t\tinput,\n\t\tconverter.WithDomain(\"https://example.com\"),\n\t)\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tfmt.Println(markdown)\n\t// Output: ![](https://example.com/assets/image.png)\n}\n```\n\nThe function `htmltomarkdown.ConvertString()` is a _small wrapper_ around `converter.NewConverter()` and the _base_ and _commonmark_ plugins. If you want more control, use the following:\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\n\t\"github.com/JohannesKaufmann/html-to-markdown/v2/converter\"\n\t\"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/base\"\n\t\"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/commonmark\"\n)\n\nfunc main() {\n\tinput := `\u003cstrong\u003eBold Text\u003c/strong\u003e`\n\n\tconv := converter.NewConverter(\n\t\tconverter.WithPlugins(\n\t\t\tbase.NewBasePlugin(),\n\t\t\tcommonmark.NewCommonmarkPlugin(\n\t\t\t\tcommonmark.WithStrongDelimiter(\"__\"),\n\t\t\t\t// ...additional configurations for the plugin\n\t\t\t),\n\n\t\t\t// ...additional plugins (e.g. table)\n\t\t),\n\t)\n\n\tmarkdown, err := conv.ConvertString(input)\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tfmt.Println(markdown)\n\t// Output: __Bold Text__\n}\n```\n\n- 🧑‍💻 [Example code, options](/examples/options/main.go)\n\n\u003e [!NOTE]  \n\u003e If you use `NewConverter` directly make sure to also **register the commonmark and base plugin**.\n\n---\n\n### Collapse \u0026 Tag Type\n\n![](./.github/images/tag_type_renderer.png)\n\nYou can specify how different HTML tags should be handled during conversion.\n\n- **Tag Types:** When _collapsing_ whitespace it is useful to know if a node is _block_ or _inline_.\n  - So if you have Web Components/Custom Elements remember to register the type using `TagType` or `RendererFor`.\n  - Additionally, you can _remove_ tags completely from the output.\n- **Pre-built Renderers:** There are several pre-built renderers available. For example:\n  - `RenderAsHTML` will render the node (including children) as HTML.\n  - `RenderAsHTMLWrapper` will render the node as HTML and render the children as markdown.\n\n\u003e [!NOTE]  \n\u003e By default, some tags are automatically removed (e.g. `\u003cstyle\u003e`). You can override existing configuration by using a different _priority_. For example, you could keep `\u003cstyle\u003e` tags by registering them with `PriorityEarly`.\n\nHere are the examples for the screenshot above:\n\n```go\nconv.Register.TagType(\"nav\", converter.TagTypeRemove, converter.PriorityStandard)\n\nconv.Register.RendererFor(\"b\", converter.TagTypeInline, base.RenderAsHTML, converter.PriorityEarly)\n\nconv.Register.RendererFor(\"article\", converter.TagTypeBlock, base.RenderAsHTMLWrapper, converter.PriorityStandard)\n```\n\n### Plugins\n\n#### Published Plugins\n\nThese are the plugins located in the [plugin folder](/plugin):\n\n| Name                  | Description                                                                                        |\n| --------------------- | -------------------------------------------------------------------------------------------------- |\n| Base                  | Implements basic shared functionality (e.g. removing nodes)                                        |\n| Commonmark            | Implements Markdown according to the [Commonmark Spec](https://spec.commonmark.org/)               |\n|                       |                                                                                                    |\n| GitHubFlavored        | _planned_                                                                                          |\n| TaskListItems         | _planned_                                                                                          |\n| Strikethrough         | Converts `\u003cstrike\u003e`, `\u003cs\u003e`, and `\u003cdel\u003e` to the `~~` syntax.                                        |\n| Table                 | Implements Tables according to the [GitHub Flavored Markdown Spec](https://github.github.com/gfm/) |\n|                       |                                                                                                    |\n| VimeoEmbed            | _planned_                                                                                          |\n| YoutubeEmbed          | _planned_                                                                                          |\n|                       |                                                                                                    |\n| ConfluenceCodeBlock   | _planned_                                                                                          |\n| ConfluenceAttachments | _planned_                                                                                          |\n\n\u003e [!NOTE]  \n\u003e Not all the plugins from v1 are already ported to v2. These will soon be implemented...\n\nThese are the plugins in other repositories:\n\n| Name                         | Description         |\n| ---------------------------- | ------------------- |\n| \\[Plugin Name\\]\\(Your Link\\) | A short description |\n\n#### Writing Plugins\n\nYou want to write custom logic?\n\n1. Write your logic and **register** it.\n\n   ![](./.github/images/autocomplete_register.png)\n\n   - 🧑‍💻 [Example code, register](/examples/register/main.go)\n\n2. _Optional:_ Package your logic into a **plugin** and publish it.\n\n   - 🗒️ [WRITING_PLUGINS.md](/WRITING_PLUGINS.md)\n\n---\n\n---\n\n## CLI - Using it on the command line\n\nUsing the Golang library provides the most customization, while the CLI is the simplest way to get started.\n\n### Installation\n\n#### Homebrew Tap\n\n```bash\nbrew install JohannesKaufmann/tap/html2markdown\n```\n\n#### Debian\n\nA `deb` package is available. See the [Setup Instructions](https://cloudsmith.io/~html-to-markdown/repos/stable/setup/#formats-deb).\n\n_Note: Support for other Linux distributions is tracked in [#119](https://github.com/JohannesKaufmann/html-to-markdown/issues/119)_\n\n#### Pre-compiled Binaries\n\nDownload pre-compiled binaries for Linux, macOS or Windows from the [releases page](https://github.com/JohannesKaufmann/html-to-markdown/releases). Extract the archive and copy the executable to a location in your system PATH (e.g. `/usr/local/bin`).\n\n#### Installation via Go\n\nIf you have Go installed, you can install the CLI directly using:\n\n```bash\ngo install github.com/JohannesKaufmann/html-to-markdown/v2/cli/html2markdown@latest\n```\n\nThis will download the source code and compile it into an executable in your Go binary directory (typically `$GOPATH/bin`).\n\n#### Build from Source\n\nBinaries are automatically built via [GoReleaser](https://goreleaser.com/) and attached to each [release](https://github.com/JohannesKaufmann/html-to-markdown/releases).\n\nTo build locally (requires Go):\n\n```bash\ngo build ./cli/html2markdown\n```\n\n### Version\n\n```bash\nhtml2markdown --version\n```\n\n\u003e [!NOTE]  \n\u003e Make sure that `--version` prints `2.X.X` as there is a different CLI for V2 of the converter.\n\n### Usage\n\n```bash\n$ echo \"\u003cstrong\u003eimportant\u003c/strong\u003e\" | html2markdown\n\n**important**\n```\n\n```text\n$ curl --no-progress-meter http://example.com | html2markdown\n\n# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)\n```\n\n```bash\n$ html2markdown --input file.html --output file.md\n\n$ html2markdown --input \"src/*.html\" --output \"dist/\"\n```\n\nUse `--help` to learn about the configurations, for example:\n\n- `--domain=\"https://example.com\"` to convert _relative_ links to _absolute_ links.\n- `--exclude-selector=\".ad\"` to exclude the html elements with `class=\"ad\"` from the conversion.\n- `--include-selector=\"article\"` to only include the `\u003carticle\u003e` html elements in the conversion.\n- `--plugin-strikethrough` or `--plugin-table` to enable plugins.\n\n_(The cli does not support every option yet. Over time more customization will be added)_\n\n---\n\n---\n\n## FAQ\n\n### Extending with Plugins\n\n- Need your own logic? Write your own code and then **register** it.\n\n  - Don't like the **defaults** that the library uses? You can use `PriorityEarly` to run you logic _earlier_ than others.\n\n  - 🧑‍💻 [Example code, register](/examples/register/main.go)\n\n- If you believe that you logic could also benefit others, you can package it up into a **plugin**.\n\n  - 🗒️ [WRITING_PLUGINS.md](/WRITING_PLUGINS.md)\n\n### Bugs\n\nYou found a bug?\n\n[Open an issue](https://github.com/JohannesKaufmann/html-to-markdown/issues/new/choose) with the HTML snippet that does not produce the expected results. Please, please, plase _submit the HTML snippet_ that caused the problem. Otherwise it is very difficult to reproduce and fix...\n\n### Security\n\nThis library produces markdown that is readable and can be changed by humans.\n\nOnce you convert this markdown back to HTML (e.g. using [goldmark](https://github.com/yuin/goldmark) or [blackfriday](https://github.com/russross/blackfriday)) you need to be careful of malicious content.\n\nThis library does NOT sanitize untrusted content. Use an HTML sanitizer such as [bluemonday](https://github.com/microcosm-cc/bluemonday) before displaying the HTML in the browser.\n\n🗒️ [SECURITY.md](/SECURITY.md) if you find a security vulnerability\n\n### Goroutines\n\nYou can use the `Converter` from (multiple) goroutines. Internally a mutex is used \u0026 there is a test to verify that behaviour.\n\n### Escaping \u0026 Backslash\n\nSome characters have a special meaning in markdown (e.g. \"\\*\" for emphasis). The backslash `\\` character is used to \"escape\" those characters. That is perfectly safe and won't be displayed in the final render.\n\n🗒️ [ESCAPING.md](/ESCAPING.md)\n\n### Contributing\n\nYou want to contribute? Thats great to hear! There are many ways to help:\n\nHelping to answer questions, triaging issues, writing documentation, writing code, ...\n\nIf you want to make a code change: Please first discuss the change you wish to make, by opening an issue. I'm also happy to guide you to where a change is most likely needed. There are also extensive tests (see below) so you can freely experiment 🧑‍🔬\n\n_Note: The outside API should not change because of backwards compatibility..._\n\n### Testing\n\nYou don't have to be afraid of breaking the converter, since there are many \"Golden File\" tests:\n\nAdd your problematic HTML snippet to one of the `.in.html` files in the `testdata` folders. Then run `go test -update` and have a look at which `.out.md` files changed in GIT.\n\nYou can now change the internal logic and inspect what impact your change has by running `go test -update` again.\n\n_Note: Before submitting your change as a PR, make sure that you run those tests and check the files into GIT..._\n\n### License\n\nUnless otherwise specified, the project is licensed under the terms of the MIT license.\n\n🗒️ [LICENSE](/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJohannesKaufmann%2Fhtml-to-markdown","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJohannesKaufmann%2Fhtml-to-markdown","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJohannesKaufmann%2Fhtml-to-markdown/lists"}