Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bflashcp3f/schema-to-json
Schema-Driven Information Extraction from Heterogeneous Tables
https://github.com/bflashcp3f/schema-to-json
Last synced: about 1 month ago
JSON representation
Schema-Driven Information Extraction from Heterogeneous Tables
- Host: GitHub
- URL: https://github.com/bflashcp3f/schema-to-json
- Owner: bflashcp3f
- License: gpl-3.0
- Created: 2023-05-23T04:30:11.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-04-01T02:03:49.000Z (9 months ago)
- Last Synced: 2024-04-02T02:42:38.397Z (9 months ago)
- Language: Python
- Homepage: https://arxiv.org/abs/2305.14336
- Size: 11.6 MB
- Stars: 17
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - bflashcp3f/schema-to-json - EMNLP 2024 Findings "Schema-Driven Information Extraction from Heterogeneous Tables" (Python)
README
Schema-Driven Information Extraction from Heterogeneous Tables
===============================================================================This repo contains code and data associated with the paper ["Schema-Driven Information Extraction from Heterogeneous Tables"](https://arxiv.org/abs/2305.14336).
```
@misc{bai2023schemadriven,
title={Schema-Driven Information Extraction from Heterogeneous Tables},
author={Fan Bai and Junmo Kang and Gabriel Stanovsky and Dayne Freitag and Alan Ritter},
year={2023},
eprint={2305.14336},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```# Task: Schema-to-JSON
# Installment
1. Create conda environment.
```
git clone https://github.com/bflashcp3f/schema-to-json.git
cd schema-to-json
conda env create -f environment.yml
conda activate s2j
```2. Set up OpenAI API key with the environment variable `OPENAI_API_KEY`. If you want to use Azure, set up the environment variable `AZURE_API_KEY`.
3. Install from the source
```
pip install -e .
```# Data
Four datasets (MlTables, ChemTables, DiSCoMat and SWDE) in our benchmark are available under the `data` directory.# Experiments
Below are the commands to reproduce paper results. Make sure you set up `API_SOURCE` (`openai` or `azure`) and `BACKEND` (model name) in the script. For open-source models, use scripts with suffix `_os.sh`.
## MlTables
```
# Prompt (w/ error recovery)
bash scripts/mltables/prompt_error_recovery.sh# Evaluation
bash scripts/mltables/eval.sh
```## ChemTables
```
# Prompt (w/ error recovery)
bash scripts/chemtables/prompt_error_recovery.sh# Evaluation
bash scripts/chemtables/eval.sh
```## DiSCoMat
```
# Prompt
bash scripts/discomat/prompt_error_recovery.sh# Evaluation
bash scripts/discomat/eval.sh
```## SWDE
```
# Prompt
bash scripts/swde/prompt.sh# Evaluation
bash scripts/swde/eval.sh
```