https://github.com/jspast/cells2table
Table image parsing with cell detection models
https://github.com/jspast/cells2table
docling docling-plugin table-structure
Last synced: 5 days ago
JSON representation
Table image parsing with cell detection models
- Host: GitHub
- URL: https://github.com/jspast/cells2table
- Owner: jspast
- License: mit
- Created: 2025-12-30T02:11:01.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2026-05-15T00:25:44.000Z (29 days ago)
- Last Synced: 2026-05-15T02:44:18.869Z (29 days ago)
- Topics: docling, docling-plugin, table-structure
- Language: Jupyter Notebook
- Homepage:
- Size: 1.74 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# cells2table
Parsing tables in document images with cell detection models
## Implemented pipelines
### PaddlePaddle
- Classification model (wired / wireless)
- Cell detection model with different weights for each class
Uses ONNX weights downloaded automatically from [Hugging Face](https://huggingface.co/jspast/paddlepaddle-table-models-onnx) on first use.
## Instalation
With [uv](https://docs.astral.sh/uv/), add to your project with:
```sh
uv add cells2table
```
| Optional | Description |
| --------------- | ----------------------- |
| `docling` | For docling usage |
| `huggingface` | For downloading models |
## Usage
cells2table only extract structural information from the tables. Another library is needed to extract content from the cells.
### Docling
A [docling plugin](https://docling-project.github.io/docling/concepts/plugins/) is provided to allow integrating cells2table in a complete pipeline.
Usage example:
```python
from cells2table.docling import CustomDoclingTableStructureOptions
pipeline_options = PdfPipelineOptions(
allow_external_plugins=True,
table_structure_options=CustomDoclingTableStructureOptions(),
)
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options),
InputFormat.IMAGE: ImageFormatOption(pipeline_options=pipeline_options),
}
)
result = converter.convert("path/to/document.pdf")
print(result.document.export_to_markdown())
```