Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thesofakillers/infoshare
Official repository for the paper: "Probing LLMs for Joint Encoding of Linguistic Categories." Findings of EMNLP 2023.
https://github.com/thesofakillers/infoshare
deep-learning dependency-parsing interpretability machine-learning parts-of-speech probing syntax transformers
Last synced: about 2 months ago
JSON representation
Official repository for the paper: "Probing LLMs for Joint Encoding of Linguistic Categories." Findings of EMNLP 2023.
- Host: GitHub
- URL: https://github.com/thesofakillers/infoshare
- Owner: thesofakillers
- Created: 2022-04-29T21:22:33.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-30T19:20:45.000Z (about 1 year ago)
- Last Synced: 2023-12-30T20:24:10.294Z (about 1 year ago)
- Topics: deep-learning, dependency-parsing, interpretability, machine-learning, parts-of-speech, probing, syntax, transformers
- Language: Python
- Homepage:
- Size: 31.5 MB
- Stars: 6
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Probing LLMs for Joint Encoding of Linguistic Categories
[![Paper](https://img.shields.io/static/v1.svg?logo=arxiv&label=Paper&message=Open%20Paper&color=green)](https://arxiv.org/abs/2310.18696)
Official repository for the paper: "Probing LLMs for Joint Encoding of
Linguistic Categories." Findings of EMNLP 2023.https://arxiv.org/abs/2310.18696
## Requirements and Setup
Details such as python and package versions can be found in the generated
[pyproject.toml](pyproject.toml) and [poetry.lock](poetry.lock) files.We recommend using an environment manager such as
[conda](https://docs.conda.io/en/latest/). After setting up your environment
with the correct python version, please proceed with the installation of the
required packages. We provide a [requirements.txt](requirements.txt) file for
this.```terminal
pip install -r requirements.txt
```This `requirements.txt` file is generated by running the following
```terminal
sh gen_pip_reqs.sh
```## Repository contents
```bash
.
├── data/ # Where data is kept
├── experiments/ # arrays of images
├── images/ # more individual images
├── lisa/ # SLURM jobs and configs
├── infoshare/
│ ├── datamodules/ # handle data loading, processing
│ ├── models/ # Model implementations
│ ├── run
│ │ ├── test.py # run testing
│ │ ├── test_xlingual.py # run testing across languages
│ │ └── train.py # run training
│ ├── __init__.py
│ └── utils.py # general utils
├── notebooks/ # see notebooks/README.md
├── reports/ # LaTeX and more
├── README.md # you are here
├── lswsd_lemmas.txt # lemmas used for LSWSD
├── poetry.lock # dependencies metadata
├── pyproject.toml # project metadata
├── gen_pip_reqs.sh # script for generating requirements.txt
└── requirements.txt # required packages for PIP
```The above was generated with
```bash
tree . -L 3 --dirsfirst -I "*.eps|*.png|*.pdf|lightning_logs|*pycache*|backup"
```followed by some manual edits.
## Citation
If you use this code or find our work otherwise useful, please consider citing
our paper:```bibtex
@inproceedings{starace2023probing,
title={Probing LLMs for Joint Encoding of Linguistic Categories},
author={Starace, Giulio and Papakostas, Konstantinos and Choenni, Rochelle and Panagiotopoulos, Apostolos and Rosati, Matteo and Leidinger, Alina and Shutova, Ekaterina},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
pages={7158--7179},
year={2023}
}
```