Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/raphaelmerx/llm-bilingual-examples
Using LLMs to generate bilingual dictionary examples
https://github.com/raphaelmerx/llm-bilingual-examples
Last synced: 28 days ago
JSON representation
Using LLMs to generate bilingual dictionary examples
- Host: GitHub
- URL: https://github.com/raphaelmerx/llm-bilingual-examples
- Owner: raphaelmerx
- Created: 2024-11-13T07:04:13.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-11-13T07:24:05.000Z (about 1 month ago)
- Last Synced: 2024-11-13T08:24:20.285Z (about 1 month ago)
- Language: Jupyter Notebook
- Homepage: https://arxiv.org/abs/2410.03182v1
- Size: 55.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Generating bilingual example sentences with large language models as lexicography assistants
Paper: https://arxiv.org/abs/2410.03182
## Getting started
- Create a Python (3.12) venv, then `pip install -r requirements.txt`
- Get API keys in .env: `cp .env.example .env` and populate your `.env` file (with `OPENAI_API_KEY` and `REPLICATE_API_KEY`)
- Run the notebooks## Notebooks
### 1. `generate_dict_examples.ipynb`
From a list of candidate words, generate example sentneces using GPT-4o and Llama-3.1-405B.
### 2. `rate_examples.ipynb`
Interpret annotation ratings: inter-annotator agreement, performance per model and per language.
### 3. `pretrained_metrics_corel.ipynb`
Calculate correlations between example GDEX ratings and pre-trained metrics (perplexity, mask probability, entropy).
### 4. `llm_rate_example.ipynb`
Rate an example using an LLM, using 10 previous ratings for in-context learning (ICL), to align the LLM with a specific annotator.
## Data
Annotated examples are in `select_examples_[gpt4,llama]_[fra,ind,tdt]_eng_rated_A[1,2].tsv`