https://github.com/parthapray/docling_colab
This repo contains google colab notebook for handing Docling for data extraction such as text, image, table etc.
https://github.com/parthapray/docling_colab
chunk chunking colab-notebook docling docx embed extraction-data image lancedb markdown pdf pptx retrieval-augmented-generation table text transformers
Last synced: 2 months ago
JSON representation
This repo contains google colab notebook for handing Docling for data extraction such as text, image, table etc.
- Host: GitHub
- URL: https://github.com/parthapray/docling_colab
- Owner: ParthaPRay
- License: mit
- Created: 2024-12-24T10:01:09.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-24T10:06:50.000Z (10 months ago)
- Last Synced: 2025-06-29T15:01:57.152Z (3 months ago)
- Topics: chunk, chunking, colab-notebook, docling, docx, embed, extraction-data, image, lancedb, markdown, pdf, pptx, retrieval-augmented-generation, table, text, transformers
- Language: Jupyter Notebook
- Homepage:
- Size: 697 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Docling_Colab
This repo contains google colab notebook for handing Docling for data extrcation such as text, image, table etc.
# Docling
https://github.com/DS4SD/docling
https://ds4sd.github.io/docling/examples/
---
The colaboratory notebook shows how to access Docling for extraction of content from popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images).
Also show hybrid chuking using transformers, embedding and vector database.