https://github.com/google-research/inksight

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/google-research/inksight
Owner: google-research
License: apache-2.0
Created: 2024-10-23T15:32:44.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-12-09T18:31:16.000Z (about 1 year ago)
Last Synced: 2025-04-07T20:08:07.194Z (9 months ago)
Language: Jupyter Notebook
Size: 13.1 MB
Stars: 672
Watchers: 12
Forks: 31
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

Organization Icon

# InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write

Blagoj Mitrevski† •
Arina Rak† •
Julian Schnitzler† •
Chengkun Li† •
Andrii Maksai&ddagger; •
Jesse Berent •
Claudiu Musat

^† First authors (random order) | ^&ddagger; Corresponding author: amaksai@google.com

---

Inksight

Animated teaser

## Overview

InkSight is an offline-to-online handwriting conversion system that transforms photos of handwritten text into digital ink through a Vision Transformer (ViT) and mT5 encoder-decoder architecture. By combining reading and writing priors in a multi-task training framework, our models process handwritten content without requiring specialized equipment, handling diverse writing styles and backgrounds. The system supports both word-level and full-page conversion, enabling practical digitization of physical notes into searchable, editable digital formats. In this repository we provide the model weights of Small-p, dataset, and example inference code (listed in the [releases section](#releases)).

Derender Diagram

InkSight system diagram (gif version)

## Releases
> **:warning: Notice:** Please use TensorFlow and tensorflow-text between version 2.15.0 and 2.17.0. Versions later than 2.17.0 may lead to unexpected behavior. We are currently investigating these issues.

We provide open resources for InkSight public version model. Choose the options that best fit your needs:

- Model weights:
- Public version [Small-p model for CPU/GPU inference](https://huggingface.co/Derendering/InkSight-Small-p)
- Public version [Small-p model for TPU inference](https://storage.googleapis.com/derendering_model/small-p-tpu.zip)
- A [dataset](docs/dataset.md) containing subsets of:
- Model-generated samples in universal `inkML` format
- Human expert digital ink traces in `npy` format
- [Example inference code](colab.ipynb): Demonstrates both word-level and full-page text inference using free, open-source alternatives to the [Google Cloud Vision Handwriting Text Detection API](https://cloud.google.com/vision/docs/handwriting). The implementation supports [docTR](https://github.com/mindee/doctr) and [Tesseract OCR](https://github.com/tesseract-ocr/tesseract).
- [Samples](figures/) of model outputs.

## News

- **October 2024**: We release [**Small-p model weights**](https://huggingface.co/Derendering/InkSight-Small-p) and our [**dataset**](https://huggingface.co/datasets/Derendering/InkSight-Derenderings) on Hugging Face.
- **October 2024**: Our work is now featured on the **[Google Research Blog](https://research.google/blog/a-return-to-hand-written-notes-by-learning-to-read-write/)**!

- **February 2024**: The **[InkSight Demo on Hugging Face](https://huggingface.co/spaces/Derendering/Model-Output-Playground)** is live!

## GPU Inference Environment Setup with Conda
To set up the environment and run the model inference locally on GPU, you can use the following steps:
```bash
# Clone the repository
git clone https://github.com/google-research/inksight.git
cd inksight

# Create and activate conda environment
conda env create -f environment.yml
conda activate inksight
```
If you encounter any issues during setup or running the model, please open an issue with details about your environment and the error message.

## Run Gradio 🤗 Playground Locally

To set up and run the Gradio Playground locally, you can use the following steps:

```bash
# Clone the huggingface space
git clone https://huggingface.co/spaces/Derendering/Model-Output-Playground

# Install the dependencies
cd Model-Output-Playground
pip install -r requirements.txt
```

Then you can run the following command to interact with the playground:
```bash
# Run the Gradio Playground
python app.py
```

## Licenses
![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)
The code in this repository is released under the [Apache 2 license](https://github.com/google-research/google-research/blob/master/LICENSE).

## Disclaimer

*Please note: This is not an officially supported Google product.*

## Citation

If you find our code or dataset useful for your research and applications, please cite using BibTeX:

```bibtex
@article{mitrevski2024inksight,
title={InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write},
author={Mitrevski, Blagoj and Rak, Arina and Schnitzler, Julian and Li, Chengkun and Maksai, Andrii and Berent, Jesse and Musat, Claudiu},
journal={arXiv preprint arXiv:2402.05804},
year={2024}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/google-research/inksight

Awesome Lists containing this project

README