https://github.com/google-research/inksight
https://github.com/google-research/inksight
Last synced: 25 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/google-research/inksight
- Owner: google-research
- License: apache-2.0
- Created: 2024-10-23T15:32:44.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-12-09T18:31:16.000Z (6 months ago)
- Last Synced: 2025-04-07T20:08:07.194Z (2 months ago)
- Language: Jupyter Notebook
- Size: 13.1 MB
- Stars: 672
- Watchers: 12
- Forks: 31
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
![]()
# InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write
Blagoj Mitrevski† •
Arina Rak† •
Julian Schnitzler† •
Chengkun Li† •
Andrii Maksai‡ •
Jesse Berent •
Claudiu Musat
† First authors (random order) | ‡ Corresponding author: [email protected]---
![]()
Animated teaser## Overview
InkSight is an offline-to-online handwriting conversion system that transforms photos of handwritten text into digital ink through a Vision Transformer (ViT) and mT5 encoder-decoder architecture. By combining reading and writing priors in a multi-task training framework, our models process handwritten content without requiring specialized equipment, handling diverse writing styles and backgrounds. The system supports both word-level and full-page conversion, enabling practical digitization of physical notes into searchable, editable digital formats. In this repository we provide the model weights of Small-p, dataset, and example inference code (listed in the [releases section](#releases)).
![]()
InkSight system diagram (gif version)## Releases
> **:warning: Notice:** Please use TensorFlow and tensorflow-text between version 2.15.0 and 2.17.0. Versions later than 2.17.0 may lead to unexpected behavior. We are currently investigating these issues.We provide open resources for InkSight public version model. Choose the options that best fit your needs:
- Model weights:
- Public version [Small-p model for CPU/GPU inference](https://huggingface.co/Derendering/InkSight-Small-p)
- Public version [Small-p model for TPU inference](https://storage.googleapis.com/derendering_model/small-p-tpu.zip)
- A [dataset](docs/dataset.md) containing subsets of:
- Model-generated samples in universal `inkML` format
- Human expert digital ink traces in `npy` format
- [Example inference code](colab.ipynb): Demonstrates both word-level and full-page text inference using free, open-source alternatives to the [Google Cloud Vision Handwriting Text Detection API](https://cloud.google.com/vision/docs/handwriting). The implementation supports [docTR](https://github.com/mindee/doctr) and [Tesseract OCR](https://github.com/tesseract-ocr/tesseract).![]()
- [Samples](figures/) of model outputs.## News
- **October 2024**: We release [**Small-p model weights**](https://huggingface.co/Derendering/InkSight-Small-p) and our [**dataset**](https://huggingface.co/datasets/Derendering/InkSight-Derenderings) on Hugging Face.
- **October 2024**: Our work is now featured on the **[Google Research Blog](https://research.google/blog/a-return-to-hand-written-notes-by-learning-to-read-write/)**!- **February 2024**: The **[InkSight Demo on Hugging Face](https://huggingface.co/spaces/Derendering/Model-Output-Playground)** is live!
## GPU Inference Environment Setup with Conda
To set up the environment and run the model inference locally on GPU, you can use the following steps:
```bash
# Clone the repository
git clone https://github.com/google-research/inksight.git
cd inksight# Create and activate conda environment
conda env create -f environment.yml
conda activate inksight
```
If you encounter any issues during setup or running the model, please open an issue with details about your environment and the error message.## Run Gradio 🤗 Playground Locally
To set up and run the Gradio
Playground locally, you can use the following steps:
```bash
# Clone the huggingface space
git clone https://huggingface.co/spaces/Derendering/Model-Output-Playground# Install the dependencies
cd Model-Output-Playground
pip install -r requirements.txt
```Then you can run the following command to interact with the playground:
```bash
# Run the Gradio Playground
python app.py
```## Licenses

The code in this repository is released under the [Apache 2 license](https://github.com/google-research/google-research/blob/master/LICENSE).## Disclaimer
*Please note: This is not an officially supported Google product.*
## Citation
If you find our code or dataset useful for your research and applications, please cite using BibTeX:
```bibtex
@article{mitrevski2024inksight,
title={InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write},
author={Mitrevski, Blagoj and Rak, Arina and Schnitzler, Julian and Li, Chengkun and Maksai, Andrii and Berent, Jesse and Musat, Claudiu},
journal={arXiv preprint arXiv:2402.05804},
year={2024}
}
```