https://github.com/guillaumegenthial/im2latex

Image to LaTeX (Seq2seq + Attention with Beam Search) - Tensorflow
https://github.com/guillaumegenthial/im2latex

attention-seq2seq beam-search im2latex imagecaptioning seq2seq seq2seq-attn show-and-tell tensorflow

Last synced: about 2 months ago
JSON representation

Image to LaTeX (Seq2seq + Attention with Beam Search) - Tensorflow

Host: GitHub
URL: https://github.com/guillaumegenthial/im2latex
Owner: guillaumegenthial
License: apache-2.0
Created: 2017-09-16T23:38:58.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-08-19T07:51:03.000Z (almost 5 years ago)
Last Synced: 2025-03-29T23:09:10.152Z (about 2 months ago)
Topics: attention-seq2seq, beam-search, im2latex, imagecaptioning, seq2seq, seq2seq-attn, show-and-tell, tensorflow
Language: Python
Homepage:
Size: 3.75 MB
Stars: 461
Watchers: 9
Forks: 129
Open Issues: 10
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# Im2Latex

Seq2Seq model with Attention + Beam Search for Image to LaTeX, similar to [Show, Attend and Tell](https://arxiv.org/abs/1502.03044) and [Harvard's paper and dataset](http://lstm.seas.harvard.edu/latex/).

Check the [blog post](https://guillaumegenthial.github.io/image-to-latex.html).

## Install

Install pdflatex (latex to pdf) and ghostsript + [magick](https://www.imagemagick.org/script/install-source.php
) (pdf to png) on Linux

```
make install-linux
```

(takes a while ~ 10 min, installs from source)

On Mac, assuming you already have a LaTeX distribution installed, you should have pdflatex and ghostscript installed, so you just need to install magick. You can try

```
make install-mac
```

## Getting Started

We provide a small dataset just to check the pipeline. To build the images, train the model and evaluate

```
make small
```

You should observe that the model starts to produce reasonable patterns of LaTeX after a few minutes.

## Data

We provide the pre-processed formulas from [Harvard](https://zenodo.org/record/56198#.V2p0KTXT6eA) but you'll need to produce the images from those formulas (a few hours on a laptop).

```
make build
```

Alternatively, you can download the [prebuilt dataset from Harvard](https://zenodo.org/record/56198#.V2p0KTXT6eA) and use their preprocessing scripts found [here](https://github.com/harvardnlp/im2markup)

## Training on the full dataset

If you already did `make build` you can just train and evaluate the model with the following commands

```
make train
make eval
```

Or, to build the images from the formulas, train the model and evaluate, run

```
make full
```

## Details

1. Build the images from the formulas, write the matching file and extract the vocabulary. __Run only once__ for a dataset
```
python build.py --data=configs/data.json --vocab=configs/vocab.json
```

2. Train
```
python train.py --data=configs/data.json --vocab=configs/vocab.json --training=configs/training.json --model=configs/model.json --output=results/full/
```

3. Evaluate the text metrics
```
python evaluate_txt.py --results=results/full/
```

4. Evaluate the image metrics
```
python evaluate_img.py --results=results/full/
```

(To get more information on the arguments, run)

```
python file.py --help
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/guillaumegenthial/im2latex

Awesome Lists containing this project

README