https://github.com/debatelab/deepa2
Resources for creating, importing and using DeepA2 Argument Analysis Framework datasets
https://github.com/debatelab/deepa2
argumentation datasets machine-learning natural-language-processing
Last synced: 5 months ago
JSON representation
Resources for creating, importing and using DeepA2 Argument Analysis Framework datasets
- Host: GitHub
- URL: https://github.com/debatelab/deepa2
- Owner: debatelab
- License: apache-2.0
- Created: 2022-01-10T15:27:58.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-01-27T14:37:36.000Z (over 3 years ago)
- Last Synced: 2026-01-03T20:55:34.330Z (5 months ago)
- Topics: argumentation, datasets, machine-learning, natural-language-processing
- Language: Python
- Homepage:
- Size: 1.11 MB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Deep Argument Analysis (`deepa2`)
This project provides `deepa2`, which
* 🥚 takes NLP data (e.g. NLI, argument mining) as ingredients;
* 🎂 bakes DeepA2 datatsets conforming to the [Deep Argument Analysis Framework](https://arxiv.org/abs/2110.01509);
* 🍰 serves DeepA2 data as text2text datasets suitable for training language models.
There's a public collection of 🎂 DeepA2 datatsets baked with `deepa2` at the [HF hub](https://huggingface.co/datasets/debatelab/deepa2).
The [Documentation](docs/) describes usage options and gives background info on the Deep Argument Analysis Framework.
## Quickstart
### Integrating `deepa2` into Your Training Pipeline
1. Install `deepa2` into your ML project's virtual environment, e.g.:
```bash
source my-projects-venv/bin/activate
python --version # should be ^3.7
python -m pip install deepa2
```
2. Add `deepa2` preprocessor to your training pipeline. Your training script may look like, for example:
```sh
#!/bin/bash
# configure and activate environment
...
# download deepa2 datasets and
# prepare for text2text training
deepa2 serve \
--path some-deepa2-dataset \ # <<< 🎂
--export_format csv \
--export_path t2t \ # >>> 🍰
# run default training script,
# e.g., with 🤗 Transformers
python .../run_summarization.py \
--train_file t2t/train.csv \ # <<< 🍰
--text_column "text" \
--summary_column "target" \
--...
# clean-up
rm -r t2t
```
3. That's it.
### Create DeepA2 datasets with `deepa2` from existing NLP data
Install [poetry](https://python-poetry.org/docs/#installation).
Clone the repository:
```bash
git clone https://github.com/debatelab/deepa2-datasets.git
```
Install this package from within the repo's root folder:
```bash
poetry install
```
Bake a DeepA2 dataset, e.g.:
```bash
poetry run deepa2 bake \\
--name esnli \\ # <<< 🥚
--debug-size 100 \\
--export-path ./data/processed # >>> 🎂
```
## Contribute a DeepA2Builder for another Dataset
We welcome contributions to this repository, especially scripts that port existing datasets to the DeepA2 Framework. Within this repo, a code module that transforms data into the DeepA2 format contains
1. a Builder class that describes how DeepA2 examples will be constructed and that implements the abstract `builder.Builder` interface (such as, e.g., `builder.entailmentbank_builder.EnBankBuilder`);
2. a DataLoader which provides a method for loading the raw data as a 🤗 Dataset object (such as, for example, `builder.entailmentbank_builder.EnBankLoader`) -- you may use `deepa2.DataLoader` as is in case the data is available in a way compatible with 🤗 Dataset;
3. dataclasses which describe the features of the raw data and the preprocessed data, and which extend the dummy classes `deepa2.RawExample` and `deepa2.PreprocessedExample`;
4. a collection of unit tests that check the concrete Builder's methods (such as, e.g., `tests/test_enbank.py`);
5. a documentation of the pipeline (as for example in `docs/esnli.md`).
Consider **suggesting** to collaboratively construct such a pipeline by opening a [new issue](https://github.com/debatelab/deepa2/issues/new?assignees=&labels=enhancement&template=new_dataset.md&title=%5BDATASET+NAME%5D).
## Citation
This repository builds on and extends the DeepA2 Framework originally presented in:
```bibtex
@article{betz2021deepa2,
title={DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models},
author={Gregor Betz and Kyle Richardson},
year={2021},
eprint={2110.01509},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```