Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/thesofakillers/infoshare

Official repository for the paper: "Probing LLMs for Joint Encoding of Linguistic Categories." Findings of EMNLP 2023.
https://github.com/thesofakillers/infoshare

deep-learning dependency-parsing interpretability machine-learning parts-of-speech probing syntax transformers

Last synced: about 2 months ago
JSON representation

Official repository for the paper: "Probing LLMs for Joint Encoding of Linguistic Categories." Findings of EMNLP 2023.

Awesome Lists containing this project

README

        

# Probing LLMs for Joint Encoding of Linguistic Categories

[![Paper](https://img.shields.io/static/v1.svg?logo=arxiv&label=Paper&message=Open%20Paper&color=green)](https://arxiv.org/abs/2310.18696)

Official repository for the paper: "Probing LLMs for Joint Encoding of
Linguistic Categories." Findings of EMNLP 2023.

https://arxiv.org/abs/2310.18696

## Requirements and Setup

Details such as python and package versions can be found in the generated
[pyproject.toml](pyproject.toml) and [poetry.lock](poetry.lock) files.

We recommend using an environment manager such as
[conda](https://docs.conda.io/en/latest/). After setting up your environment
with the correct python version, please proceed with the installation of the
required packages. We provide a [requirements.txt](requirements.txt) file for
this.

```terminal
pip install -r requirements.txt
```

This `requirements.txt` file is generated by running the following

```terminal
sh gen_pip_reqs.sh
```

## Repository contents

```bash
.
├── data/ # Where data is kept
├── experiments/ # arrays of images
├── images/ # more individual images
├── lisa/ # SLURM jobs and configs
├── infoshare/
│   ├── datamodules/ # handle data loading, processing
│   ├── models/ # Model implementations
│   ├── run
│   │   ├── test.py # run testing
│   │   ├── test_xlingual.py # run testing across languages
│   │   └── train.py # run training
│   ├── __init__.py
│   └── utils.py # general utils
├── notebooks/ # see notebooks/README.md
├── reports/ # LaTeX and more
├── README.md # you are here
├── lswsd_lemmas.txt # lemmas used for LSWSD
├── poetry.lock # dependencies metadata
├── pyproject.toml # project metadata
├── gen_pip_reqs.sh # script for generating requirements.txt
└── requirements.txt # required packages for PIP
```

The above was generated with

```bash
tree . -L 3 --dirsfirst -I "*.eps|*.png|*.pdf|lightning_logs|*pycache*|backup"
```

followed by some manual edits.

## Citation

If you use this code or find our work otherwise useful, please consider citing
our paper:

```bibtex
@inproceedings{starace2023probing,
title={Probing LLMs for Joint Encoding of Linguistic Categories},
author={Starace, Giulio and Papakostas, Konstantinos and Choenni, Rochelle and Panagiotopoulos, Apostolos and Rosati, Matteo and Leidinger, Alina and Shutova, Ekaterina},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
pages={7158--7179},
year={2023}
}
```