{"id":35143291,"url":"https://github.com/kreuzberg-dev/html-to-markdown","last_synced_at":"2026-05-28T20:01:12.592Z","repository":{"id":284809753,"uuid":"926648217","full_name":"kreuzberg-dev/html-to-markdown","owner":"kreuzberg-dev","description":"High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.","archived":false,"fork":false,"pushed_at":"2026-05-23T21:11:57.000Z","size":118100,"stargazers_count":732,"open_issues_count":1,"forks_count":58,"subscribers_count":7,"default_branch":"main","last_synced_at":"2026-05-23T22:14:45.246Z","etag":null,"topics":["hocr","html","html-converter","markdown","markdown-converter","rag","text-extraction","text-processing"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kreuzberg-dev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-02-03T16:18:12.000Z","updated_at":"2026-05-23T21:12:01.000Z","dependencies_parsed_at":"2025-04-24T02:38:10.368Z","dependency_job_id":"c703b28f-8bb2-44b7-b873-460a7b6063ff","html_url":"https://github.com/kreuzberg-dev/html-to-markdown","commit_stats":null,"previous_names":["goldziher/html-to-markdown","kreuzberg-dev/html-to-markdown"],"tags_count":348,"template":false,"template_full_name":null,"purl":"pkg:github/kreuzberg-dev/html-to-markdown","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kreuzberg-dev%2Fhtml-to-markdown","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kreuzberg-dev%2Fhtml-to-markdown/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kreuzberg-dev%2Fhtml-to-markdown/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kreuzberg-dev%2Fhtml-to-markdown/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kreuzberg-dev","download_url":"https://codeload.github.com/kreuzberg-dev/html-to-markdown/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kreuzberg-dev%2Fhtml-to-markdown/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33624221,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hocr","html","html-converter","markdown","markdown-converter","rag","text-extraction","text-processing"],"created_at":"2025-12-28T12:55:33.044Z","updated_at":"2026-05-28T20:01:12.549Z","avatar_url":"https://github.com/kreuzberg-dev.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# html-to-markdown\n\n\u003cdiv align=\"center\" style=\"display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;\"\u003e\n  \u003ca href=\"https://github.com/kreuzberg-dev/alef\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Bindings-alef%20%D7%90-007ec6\" alt=\"Bindings\"\u003e\n  \u003c/a\u003e\n  \u003c!-- Language Bindings --\u003e\n  \u003ca href=\"https://crates.io/crates/html-to-markdown-rs\"\u003e\n    \u003cimg src=\"https://img.shields.io/crates/v/html-to-markdown-rs?label=Rust\u0026color=007ec6\" alt=\"Rust\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/html-to-markdown/\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/html-to-markdown?label=Python\u0026color=007ec6\" alt=\"Python\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node\"\u003e\n    \u003cimg src=\"https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-node?label=Node.js\u0026color=007ec6\" alt=\"Node.js\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm\"\u003e\n    \u003cimg src=\"https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-wasm?label=WASM\u0026color=007ec6\" alt=\"WASM\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown\"\u003e\n    \u003cimg src=\"https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown?label=Java\u0026color=007ec6\" alt=\"Java\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/v/tag/kreuzberg-dev/html-to-markdown?label=Go\u0026color=007ec6\u0026filter=v3*\" alt=\"Go\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/\"\u003e\n    \u003cimg src=\"https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown?label=C%23\u0026color=007ec6\" alt=\"C#\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://packagist.org/packages/kreuzberg-dev/html-to-markdown\"\u003e\n    \u003cimg src=\"https://img.shields.io/packagist/v/kreuzberg-dev/html-to-markdown?label=PHP\u0026color=007ec6\" alt=\"PHP\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://rubygems.org/gems/html-to-markdown\"\u003e\n    \u003cimg src=\"https://img.shields.io/gem/v/html-to-markdown?label=Ruby\u0026color=007ec6\" alt=\"Ruby\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://hex.pm/packages/html_to_markdown\"\u003e\n    \u003cimg src=\"https://img.shields.io/hexpm/v/html_to_markdown?label=Elixir\u0026color=007ec6\" alt=\"Elixir\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://kreuzberg-dev.r-universe.dev/htmltomarkdown\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/R-htmltomarkdown-007ec6\" alt=\"R\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pub.dev/packages/h2m\"\u003e\n    \u003cimg src=\"https://img.shields.io/pub/v/h2m?label=Dart\u0026color=007ec6\" alt=\"Dart\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown-android\"\u003e\n    \u003cimg src=\"https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown-android?label=Kotlin\u0026color=007ec6\" alt=\"Kotlin\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/swift\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Swift-SPM-007ec6\" alt=\"Swift\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/zig\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Zig-package-007ec6\" alt=\"Zig\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/kreuzberg-dev/html-to-markdown/releases\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/C-FFI-007ec6\" alt=\"C FFI\"\u003e\n  \u003c/a\u003e\n\n  \u003c!-- Project Info --\u003e\n  \u003ca href=\"https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/License-MIT-007ec6\" alt=\"License\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://docs.html-to-markdown.kreuzberg.dev\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Docs-html--to--markdown-007ec6\" alt=\"Documentation\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\" style=\"margin: 24px 0 0;\"\u003e\n  \u003ca href=\"https://kreuzberg.dev\"\u003e\n    \u003cimg alt=\"html-to-markdown\" src=\"https://github.com/user-attachments/assets/478a83da-237b-446b-b3a8-e564c13e00a8\" /\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\" style=\"display: flex; flex-wrap: wrap; gap: 12px; justify-content: center; margin: 28px 0 24px;\"\u003e\n  \u003ca href=\"https://discord.gg/xt9WY3GnKR\"\u003e\n    \u003cimg height=\"22\" src=\"https://img.shields.io/badge/Discord-Chat-007ec6?logo=discord\u0026logoColor=white\" alt=\"Join Discord\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://docs.html-to-markdown.kreuzberg.dev/demo/\"\u003e\n    \u003cimg height=\"22\" src=\"https://img.shields.io/badge/Live%20Demo-Open-007ec6?logo=webassembly\u0026logoColor=white\" alt=\"Live Demo\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\nHigh-performance HTML to Markdown conversion powered by Rust. Ships as native bindings for **Rust, Python, TypeScript/Node.js, Ruby, PHP, Go, Java, C#, Elixir, R, C (FFI), and WebAssembly** with identical rendering across all runtimes.\n\n**[Documentation](https://docs.html-to-markdown.kreuzberg.dev)** | **[API Reference](https://docs.rs/html-to-markdown-rs/)**\n\n## Highlights\n\n- **Rust-native throughput** with html5ever parsing\n- **12 language bindings** with consistent output across all runtimes\n- **Structured result** — `convert()` returns `ConversionResult` with `content`, `metadata`, `tables`, `images`, and `warnings`\n- **Metadata extraction** — title, headers, links, images, structured data (JSON-LD, Microdata, RDFa)\n- **Visitor pattern** — custom callbacks for content filtering, URL rewriting, domain-specific dialects\n- **Table extraction** — extract structured table data (cells, headers, rendered markdown) during conversion\n- **Secure by default** — built-in HTML sanitization via ammonia\n\n## Quick Start\n\n```bash\n# Rust\ncargo add html-to-markdown-rs\n\n# Python\npip install html-to-markdown\n\n# TypeScript / Node.js\nnpm install @kreuzberg/html-to-markdown-node\n\n# Ruby\ngem install html-to-markdown\n\n# CLI\ncargo install html-to-markdown-cli\n# or\nbrew install kreuzberg-dev/tap/html-to-markdown\n```\n\nSee the package READMEs for all languages including PHP, Go, Java, C#, Elixir, R, and WASM.\n\n### Usage\n\n`convert()` is the single entry point. It returns a structured `ConversionResult`:\n\n```python\n# Python\nfrom html_to_markdown import convert\n\nresult = convert(\"\u003ch1\u003eHello\u003c/h1\u003e\u003cp\u003eWorld\u003c/p\u003e\")\nprint(result.content)        # # Hello\\n\\nWorld\nprint(result.metadata)       # title, links, headings, …\n```\n\n```typescript\n// TypeScript / Node.js\nimport { convert } from \"@kreuzberg/html-to-markdown\";\n\nconst result = convert(\"\u003ch1\u003eHello\u003c/h1\u003e\u003cp\u003eWorld\u003c/p\u003e\");\nconsole.log(result.content); // # Hello\\n\\nWorld\nconsole.log(result.metadata); // title, links, headings, …\n```\n\n```rust\n// Rust\nuse html_to_markdown_rs::convert;\n\nlet result = convert(\"\u003ch1\u003eHello\u003c/h1\u003e\u003cp\u003eWorld\u003c/p\u003e\", None)?;\nprintln!(\"{}\", result.content.unwrap_or_default());\n```\n\n## Language Bindings\n\n| Language             | Package                                                                                                      | Install                                                           |\n| -------------------- | ------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------- |\n| Rust                 | [html-to-markdown-rs](https://crates.io/crates/html-to-markdown-rs)                                          | `cargo add html-to-markdown-rs`                                   |\n| Python               | [html-to-markdown](https://pypi.org/project/html-to-markdown/)                                               | `pip install html-to-markdown`                                    |\n| TypeScript / Node.js | [@kreuzberg/html-to-markdown-node](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node)           | `npm install @kreuzberg/html-to-markdown-node`                    |\n| WebAssembly          | [@kreuzberg/html-to-markdown-wasm](https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm)           | `npm install @kreuzberg/html-to-markdown-wasm`                    |\n| Ruby                 | [html-to-markdown](https://rubygems.org/gems/html-to-markdown)                                               | `gem install html-to-markdown`                                    |\n| PHP                  | [kreuzberg-dev/html-to-markdown](https://packagist.org/packages/kreuzberg-dev/html-to-markdown)              | `composer require kreuzberg-dev/html-to-markdown`                 |\n| Go                   | [htmltomarkdown](https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3/htmltomarkdown) | `go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3` |\n| Java                 | [dev.kreuzberg:html-to-markdown](https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown)       | Maven / Gradle                                                    |\n| C#                   | [KreuzbergDev.HtmlToMarkdown](https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/)                   | `dotnet add package KreuzbergDev.HtmlToMarkdown`                  |\n| Elixir               | [html_to_markdown](https://hex.pm/packages/html_to_markdown)                                                 | `mix deps.get html_to_markdown`                                   |\n| R                    | [htmltomarkdown](https://kreuzberg-dev.r-universe.dev/htmltomarkdown)                                        | `install.packages(\"htmltomarkdown\")`                              |\n| C (FFI)              | [releases](https://github.com/kreuzberg-dev/html-to-markdown/releases)                                       | Pre-built `.so` / `.dll` / `.dylib`                               |\n\n## Part of Kreuzberg.dev\n\n- [Kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) — document intelligence: text, tables, metadata from 90+ formats with optional OCR.\n- [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.\n- [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.\n- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.\n- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.\n- [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces all per-language bindings.\n- [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.\n\n## Contributing\n\nContributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions and guidelines.\n\n## License\n\nMIT License — see [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkreuzberg-dev%2Fhtml-to-markdown","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkreuzberg-dev%2Fhtml-to-markdown","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkreuzberg-dev%2Fhtml-to-markdown/lists"}