Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shashi456/im2latex
A solution for im2Latex problem.
https://github.com/shashi456/im2latex
Last synced: 24 days ago
JSON representation
A solution for im2Latex problem.
- Host: GitHub
- URL: https://github.com/shashi456/im2latex
- Owner: Shashi456
- License: gpl-3.0
- Created: 2018-10-07T08:23:29.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2018-11-16T10:17:23.000Z (about 6 years ago)
- Last Synced: 2024-10-30T02:54:40.661Z (2 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 10.8 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Im2Latex
- Solving the Open-AI Request for Research
- The dataset has to be downloaded from [here](https://zenodo.org/record/56198#.V2p0KTXT6eA)
- The dataset has to be preprocessed to generate vocabulary and tokenize the labels.
- Preprocessing scripts have been taken from HarvardNLP's [solution](https://github.com/harvardnlp/im2markup) and all credit for these scripts are due to them.## Preprocessing
- The following instructions were run to generate the files :```python preprocessing/preprocess_images.py --input-dir formula_images --output-dir images_processed```
```python preprocessing/preprocess_filter.py --filter --image-dir images_processed --label-path formulas.token.lst --data-path im2latex_train.lst --output-path train_filter.lst ```
```python preprocessing/preprocess_formulas.py --mode tokenize --input-file im2latex_formulas.lst --output-file formulas.token.lst```
```python scripts/preprocessing/generate_latex_vocab.py --data-path train_filter.lst --label-path formulas.token.lst --output-file latex_vocab.txt```
## Methodology
- We will be using the Top-down Bottom-up Attention Model Paper and adopting the architecture to this model.