https://github.com/datngu/nli-artifacts

NLI artifacts mitigation
https://github.com/datngu/nli-artifacts

Last synced: 4 months ago
JSON representation

NLI artifacts mitigation

Host: GitHub
URL: https://github.com/datngu/nli-artifacts
Owner: datngu
Created: 2023-12-07T13:13:59.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-12-07T13:14:42.000Z (over 1 year ago)
Last Synced: 2025-01-21T20:48:47.426Z (6 months ago)
Language: Jupyter Notebook
Size: 18.6 MB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Dissecting vocabulary biases in Natural Language Inference datasets through Statistical testing approach and Automated data augmentation for artifact mitigation

## Getting Started
You'll need Python >= 3.6 to run the code in this repo.

First, clone the repository:

`git clone [email protected]:datngu/nli-artifacts.git`

and follow the README.md file in the __run__ directory to install the base version.

You then need to install: __nlpaug__ package from: https://github.com/makcedward/nlpaug
Please check its homepage to install it properly.

## Obtain the baseline dataset

Please follow steps by steps in the notebook of __data_writer.ipynb__ to write the original dataset and generate the hypothesis-only data.

## Statistical testing

Implemtation of the poposed statistical test can be done by running the __statistical_test.ipynb__.

Related figures and data are automatically generated.

### Note:

Please make sure that related packages are installed.

## Obtain augmented data

We prepare customized python and bash scripts in the subdirectory of aug to generate the augmented data.

Please remember to download the related models/databases if you like to use sysnomyn and embbeding augmentation.

Links and instructions to download these data are provided in the python script.

## Training and optimizing the models

In the __run__ subdirectory, we provided python scripts for training and optimize the ELECTRA small model.
We also provided slurm scripts that we used to run the experiments in the __run_experiments__ directory.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/datngu/nli-artifacts

Awesome Lists containing this project

README