An open API service indexing awesome lists of open source software.

https://github.com/parthapray/docling_colab

This repo contains google colab notebook for handing Docling for data extraction such as text, image, table etc.
https://github.com/parthapray/docling_colab

chunk chunking colab-notebook docling docx embed extraction-data image lancedb markdown pdf pptx retrieval-augmented-generation table text transformers

Last synced: 2 months ago
JSON representation

This repo contains google colab notebook for handing Docling for data extraction such as text, image, table etc.

Awesome Lists containing this project

README

          

# Docling_Colab
This repo contains google colab notebook for handing Docling for data extrcation such as text, image, table etc.

![image](https://github.com/user-attachments/assets/779f8a40-3220-4d37-96c8-43c50b88916f)

# Docling

https://github.com/DS4SD/docling

https://ds4sd.github.io/docling/examples/

---

The colaboratory notebook shows how to access Docling for extraction of content from popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images).

Also show hybrid chuking using transformers, embedding and vector database.