https://github.com/docling-project/docling-haystack
Docling Haystack integration
https://github.com/docling-project/docling-haystack
Last synced: about 1 year ago
JSON representation
Docling Haystack integration
- Host: GitHub
- URL: https://github.com/docling-project/docling-haystack
- Owner: docling-project
- License: mit
- Created: 2024-12-13T10:40:08.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-13T14:44:26.000Z (over 1 year ago)
- Last Synced: 2025-03-14T08:37:38.802Z (over 1 year ago)
- Language: Python
- Homepage: https://ds4sd.github.io/docling/integrations/haystack/
- Size: 436 KB
- Stars: 14
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Haystack Docling integration
[](https://pypi.org/project/docling-haystack/)
[](https://pypi.org/project/docling-haystack/)
[](https://python-poetry.org/)
[](https://github.com/psf/black)
[](https://pycqa.github.io/isort/)
[](https://pydantic.dev)
[](https://github.com/pre-commit/pre-commit)
[](https://opensource.org/licenses/MIT)
A [Docling](https://github.com/DS4SD/docling) integration for
[Haystack](https://github.com/deepset-ai/haystack/).
## Installation
Simply install `docling-haystack` from your package manager, e.g. pip:
```bash
pip install docling-haystack
```
## Usage
### Basic usage
Basic usage of `DoclingConverter` looks as follows:
```python
from haystack import Pipeline
from docling_haystack.converter import DoclingConverter
idx_pipe = Pipeline()
# ...
converter = DoclingConverter()
idx_pipe.add_component("converter", converter)
# ...
```
### Advanced usage
When initializing a `DoclingConverter`, you can use the following parameters:
- `converter` (optional): any specific Docling `DocumentConverter` instance to use
- `convert_kwargs` (optional): any specific kwargs for conversion execution
- `export_type` (optional): export mode to use: `ExportType.DOC_CHUNKS` (default) or
`ExportType.MARKDOWN`
- `md_export_kwargs` (optional): any specific Markdown export kwargs (for Markdown mode)
- `chunker` (optional): any specific Docling chunker instance to use (for doc-chunk
mode)
- `meta_extractor` (optional): any specific metadata extractor to use
### Example
For an end-to-end usage example, check out
[this notebook](https://ds4sd.github.io/docling/examples/rag_haystack/).