An open API service indexing awesome lists of open source software.

https://github.com/docling-project/docling-java

A Java API for Docling
https://github.com/docling-project/docling-java

ai docling java rag

Last synced: 30 days ago
JSON representation

A Java API for Docling

Awesome Lists containing this project

README

          

# Welcome to the Docling Java Project!

![Docling Java](docs/src/doc/docs/assets/img/docling-java.png)

This is the repository for Docling Java, a Java API for using [Docling](https://github.com/docling-project).

[![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://docling-project.github.io/docling-java/)
[![docling-core version](https://img.shields.io/maven-central/v/ai.docling/docling-core?label=docling-core
)](https://docling-project.github.io/docling-java/dev/core)
[![docling-serve-api version](https://img.shields.io/maven-central/v/ai.docling/docling-serve-api?label=docling-serve-api
)](https://docling-project.github.io/docling-java/dev/docling-serve/serve-api/)
[![docling-serve-client version](https://img.shields.io/maven-central/v/ai.docling/docling-serve-client?label=docling-serve-client)](https://docling-project.github.io/docling-java/dev/docling-serve/serve-client/)
[![docling-testcontainers version](https://img.shields.io/maven-central/v/ai.docling/docling-testcontainers?label=docling-testcontainers)](https://docling-project.github.io/docling-java/dev/testcontainers/)
[![License MIT](https://img.shields.io/github/license/docling-project/docling-java)](https://opensource.org/licenses/MIT)
[![Discord](https://img.shields.io/discord/1399788921306746971?color=6A7EC2&logo=discord&logoColor=ffffff)](https://docling.ai/discord)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/11397/badge)](https://www.bestpractices.dev/projects/11397)

[Docling](https://github.com/docling-project) simplifies document processing, parsing diverse formats, including advanced PDF understanding, and providing seamless integrations with the Generative AI ecosystem.

## Features

* πŸ—‚οΈ Parsing of [multiple document formats][supported_formats] incl. PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, VTT, images (PNG, TIFF, JPEG, ...), and more
* πŸ“‘ Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
* 🧬 Unified, expressive [DoclingDocument][docling_document] representation format
* β†ͺ️ Various [export formats][supported_formats] and options, including Markdown, HTML, [DocTags](https://arxiv.org/abs/2503.11576) and lossless JSON
* πŸ”’ Local execution capabilities for sensitive data and air-gapped environments
* πŸ€– Plug-and-play [integrations][integrations] including [LangChain4j](https://docs.langchain4j.dev/)
* πŸ” Extensive OCR support for scanned PDFs and images
* πŸ‘“ Support of several Visual Language Models ([GraniteDocling](https://huggingface.co/ibm-granite/granite-docling-258M))
* πŸŽ™οΈ Audio support with Automatic Speech Recognition (ASR) models

## Documentation

[See the documentation](https://docling-project.github.io/docling-java/) for complete information on the [various artifacts](#artifacts) that are provided by this project.

## Artifacts

This project provides the following artifacts:

- [`docling-core`](docling-core): Java API for working with the data types used by Docling for document representation (see [Docling Core](https://github.com/docling-project/docling-core)).
- [`docling-serve-api`](docling-serve/docling-serve-api): Java API for interacting with a [Docling Serve](https://github.com/docling-project/docling-serve) backend. It's framework‑agnostic.
* [`docling-serve-client`](docling-serve/docling-serve-client): A reference implementation of the [`docling-serve-api`](docling-serve/docling-serve-api) using Java's [`HttpClient`](https://openjdk.org/groups/net/httpclient/intro.html) and [Jackson](https://github.com/FasterXML/jackson) to connect to a [Docling Serve](https://github.com/docling-project/docling-serve) endpoint.
* [`docling-testing`](docling-testing): Utilities for testing Docling integration.
* [`docling-testcontainers`](docling-testcontainers): A [Testcontainers module](https://testcontainers.com/) for running Docling in a Docker container.

## Getting started

Use `DoclingServeApi.convertSource()` to convert individual documents (make sure both `docling-serve-api` and `docling-serve-client` are on your classpath).

For example:

```java
import ai.docling.serve.api.DoclingServeApi;
import ai.docling.serve.api.convert.request.ConvertDocumentRequest;
import ai.docling.serve.api.convert.request.source.HttpSource;
import ai.docling.serve.api.convert.response.ConvertDocumentResponse;

DoclingServeApi doclingServeApi = DoclingServeApi.builder()
.baseUrl("")
.build();

ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.source(
HttpSource.builder()
.url(URI.create("https://arxiv.org/pdf/2408.09869"))
.build()
)
.build();

ConvertDocumentResponse response = doclingServeApi.convertSource(request);
System.out.println(response.getDocument().getMarkdownContent());
```

More [usage information](https://docling-project.github.io/docling-java) is available in the docs.

## Get help and support

Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling-java/discussions).

## Contributing

Please read [Contributing to Docling Java](CONTRIBUTING.md) for details.

## License

The Docling codebase is under MIT license.
For individual model usage, please refer to the model licenses found in the original packages.

### IBM ❀️ Open Source AI

The project was started by the AI for knowledge team at IBM Research Zurich.

[supported_formats]: https://docling-project.github.io/docling/usage/supported_formats/
[docling_document]: https://docling-project.github.io/docling/concepts/docling_document/
[integrations]: https://docling-project.github.io/docling/integrations/

## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):



Eric Deandrea
Eric Deandrea

πŸ’» πŸ–‹ πŸ“– πŸ€” πŸš‡ 🚧 πŸ“† ⚠️ πŸ‘€
Thomas Vitale
Thomas Vitale

πŸ’» πŸ–‹ πŸ“– πŸ€” πŸš‡ 🚧 πŸ“† ⚠️ πŸ‘€
Alex Soto
Alex Soto

πŸ€” πŸ“†
Cesar Berrospi Ramis
Cesar Berrospi Ramis

πŸ€”
Michele Dolfi
Michele Dolfi

🎨 πŸ€” πŸš‡ πŸ’¬
Andrea Cosentino
Andrea Cosentino

🎨 πŸ“£ πŸ€” πŸ’» πŸ“–
jmb-streamsets
jmb-streamsets

πŸ€” 🎨


insectengine
insectengine

πŸ–‹ 🎨
Maxim Lysak
Maxim Lysak

πŸ–‹ 🎨
warnulf
warnulf

πŸ›

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind are welcome!