An open API service indexing awesome lists of open source software.

https://github.com/callbacked/smoldocling256m-webgpu

Document Understanding in the Browser!
https://github.com/callbacked/smoldocling256m-webgpu

ai document-understanding llms transformersjs

Last synced: 12 months ago
JSON representation

Document Understanding in the Browser!

Awesome Lists containing this project

README

          

# SmolDocling 256M WebGPU Demo

This project is a demo using the **SmolDocling-256M** model to perform document understanding tasks. Allowing you to transform document images into structured formats like Markdown and JSON, and more!

https://github.com/user-attachments/assets/b9c06328-a194-4316-ae44-c8ab030c480b

[Try it out on Hugging Face!](https://huggingface.co/spaces/callbacked/smoldocling256M-webgpu)

## Features

- **Intelligent Content Extraction**: Extracts structures from documents, like:
- Tables
- Math formulas (converted to LaTeX)
- Code blocks
- **Structured Output**: Converts document content into markdown and JSON
- **Region-Specific Processing**: Select a specific area of the document to process only the content you need.
- **Fully Offline**: All processing happens on your device in the browser. Your data never leaves your computer.

## 🚀 Getting Started
To run this project locally, follow these steps:

1. Clone the repo
```bash
git clone https://github.com/callbacked/smoldocling256M-webgpu
```
2. Navigate to the project directory
```bash
cd smoldocling256M-webgpu
```
3. Install NPM packages
```bash
npm install
```

4. Run

```bash
npm run dev
```

This will start the Vite development server, and you can view the application at `http://localhost:5173` (or another port if 5173 is in use).

### Building for Production

To create a production build:

```bash
npm run build
```