https://github.com/poloclub/llm-attributor

LLM Attributor: Attribute LLM's Generated Text to Training Data
https://github.com/poloclub/llm-attributor

attribution generative-ai llm notebook-jupyter visualization

Last synced: 5 months ago
JSON representation

LLM Attributor: Attribute LLM's Generated Text to Training Data

Host: GitHub
URL: https://github.com/poloclub/llm-attributor
Owner: poloclub
License: mit
Created: 2024-02-06T19:50:30.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-06-21T20:34:53.000Z (over 1 year ago)
Last Synced: 2025-04-21T09:52:42.216Z (6 months ago)
Topics: attribution, generative-ai, llm, notebook-jupyter, visualization
Language: Jupyter Notebook
Homepage:
Size: 22.5 MB
Stars: 41
Watchers: 9
Forks: 9
Open Issues: 1
Metadata Files:
- Readme: Readme.md
- License: LICENSE

Awesome Lists containing this project

README

          # LLM Attributor: Attribute LLM's Generated Text to Training Data 

LLM Attributor helps you visualize training data attribution of text generation of your large language models (LLMs). Interactively select text phrases and visualize the training data points responsible for generating the selected phrases. Easily modify model-generated text and observe how your changes affect the attribution with a visualized side-by-side comparison.

[![license](https://img.shields.io/badge/License-MIT-success)]()

[![pypi](https://img.shields.io/pypi/v/llm-attributor?color=blue)](https://pypi.org/project/llm-attributor/)

[![arxiv badge](https://img.shields.io/badge/arXiv-2404.01361-red)](https://arxiv.org/abs/2404.01361)

    

    🎬 Demo YouTube Video

    ✍️ Technical Report

## Feature Highlights

## Getting Started

### Installation

LLM Attributor is published in the Python Package Index (PyPI) repository. To install LLM Attributor, you can use `pip`:

```bash

pip install llm-attributor

```

### Initialization

You can import LLM Attributor to your computational notebooks (e.g., Jupyter Notebook/Lab) and initialize your model and data configurations.

```python

from LLMAttributor import LLMAttributor

attributor = LLMAttributor(

    llama2_dir=LLAMA2_DIR,

    tokenizer_dir=TOKENIZER_DIR,

    model_save_dir=MODEL_SAVE_DIR,

    train_dataset=TRAIN_DATASET

)

```

For the LLAMA2_DIR and TOKENIZER_DIR, you can input the path to the base LLaMA2 model. These are necessary when your model is not fine-tuned yet. 

MODEL_SAVE_DIR is the directory where your fine-tuned model is (or will be saved).

## Demo

You can try `disaster-demo.ipynb` and `finance-demo.ipynb` to try interactive visualization of LLM Attributor.

## Credits

LLM Attributor is created by [Seongmin Lee](https://seongmin.xyz), [Jay Wang](https://zijie.wang), Aishwarya Chakravarthy, [Alec Helbling](https://alechelbling.com), [Anthony Peng](https://shengyun-peng.github.io), [Mansi Phute](https://mphute.github.io), [Polo Chau](https://poloclub.github.io/polochau/), and [Minsuk Kahng](https://minsuk.com).

## License

The software is available under the MIT License.

## Contact

If you have any questions, feel free to [open an issue](https://github.com/poloclub/LLM-Attribution/issues) or contact [Seongmin Lee](https://seongmin.xyz).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/poloclub/llm-attributor

Awesome Lists containing this project

README