https://github.com/pankaj28843/html2latex
Convert WYSIWYG HTML to LaTeX with typed ASTs, full table support, and 100% test coverage
https://github.com/pankaj28843/html2latex
ast converter document-conversion html html-to-latex latex pdf-generation python typesafe wysiwyg
Last synced: about 2 months ago
JSON representation
Convert WYSIWYG HTML to LaTeX with typed ASTs, full table support, and 100% test coverage
- Host: GitHub
- URL: https://github.com/pankaj28843/html2latex
- Owner: pankaj28843
- License: mit
- Created: 2014-07-02T07:25:27.000Z (almost 12 years ago)
- Default Branch: main
- Last Pushed: 2026-02-20T14:10:49.000Z (2 months ago)
- Last Synced: 2026-02-20T15:51:25.453Z (2 months ago)
- Topics: ast, converter, document-conversion, html, html-to-latex, latex, pdf-generation, python, typesafe, wysiwyg
- Language: Python
- Size: 1.58 MB
- Stars: 17
- Watchers: 2
- Forks: 13
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Authors: AUTHORS.txt
Awesome Lists containing this project
README
# HTML2LaTeX

HTML2LaTeX converts WYSIWYG HTML fragments into LaTeX. The pipeline is built on
`justhtml`, typed ASTs, and deterministic serialization to produce reliable
output suitable for PDFs and reports.
## Highlights
- **Block elements**: headings (`h1`-`h5`), paragraphs, divs, lists (`ul`/`ol`/`dl`),
blockquotes, `pre` blocks, `hr`, `figure`/`figcaption`, and semantic containers.
- **Tables**: Full support including `thead`/`tbody`/`tfoot`, `th`/`td`, `colspan`,
`rowspan` (via `\multirow`), cell alignment (via `align` attribute or CSS `text-align`),
and `caption`.
- **Inline formatting**: bold, italic, underline, code, superscript, subscript,
strikethrough (`del`/`s`/`strike`), highlighted text (`mark`), font sizes (`small`/`big`),
inline quotes (`q` with proper nesting), and semantic tags (`kbd`, `samp`, `var`, `cite`, `ins`).
- **Links and images**: `\href`/`\url` for links, `\includegraphics` for images with
`width`/`height` attribute support.
- **Math passthrough**: via ``, `data-latex`, or `data-math` attributes.
- **Text alignment**: `text-align` CSS on `p`/`div` maps to `center`/`flushleft`/`flushright`.
- **Thread-safe**: Immutable options with diagnostics for invalid input.
## Requirements
- Python 3.10+
- Dependencies managed with `uv` via `pyproject.toml`
## Quick Start
```bash
uv sync
```
```bash
uv run python - <<'PY'
from html2latex import html2latex
print(html2latex("
Hello World
"))
PY
```
Expected output (fragment):
```
Hello World\par
```
## Usage
### Convert HTML to a LaTeX fragment
```python
from html2latex import html2latex
html = """
Hello World
Inline math: \( x^2 + y^2 = z^2 \)
"""
fragment = html2latex(html)
print(fragment)
```
### Convert HTML and inspect packages/diagnostics
```python
from html2latex import Converter, ConvertOptions
converter = Converter(ConvertOptions(strict=False))
result = converter.convert("Link")
print(result.body) # LaTeX fragment
print(result.packages) # e.g., ("hyperref",)
print(result.diagnostics)
```
### Render a full LaTeX document
```python
from html2latex import render
print(render("
Full document
"))
```
To add preamble content:
```python
from html2latex import render, ConvertOptions
options = ConvertOptions(metadata={"preamble": "\\usepackage{amsmath}"})
print(render("
Math
", options=options))
```
## LaTeX Packages
Package requirements are inferred from the output. If you need a full document,
use `render()` or `render_document()` and include `result.preamble`.
Common packages:
- `hyperref` (links)
- `graphicx` (images)
- `xcolor` (highlighted text via `mark`)
- `ulem` (strikethrough via `del`/`s`/`strike`)
- `multirow` (table cells spanning multiple rows)
## Demo App
A demo Flask app with a rich text editor is available:
```bash
docker compose build
docker compose up
```
Visit .
## History
The original implementation (2014-2016) powered the ClassKlap publishing
workflow. This version targets Python 3.10+ and general-purpose use.
## CI
CI runs `ruff` (lint + format) and `pytest` (465 tests, 100% coverage) across
Python 3.10-3.14, plus LaTeX validity checks (Tectonic) and Playwright E2E
smoke tests.
## License
MIT License - see [LICENSE](LICENSE).