https://github.com/myousefi/kaggle-llm-prompt-recovery

My scripts and models for the LLM Prompt Recovery Competition on Kaggle
https://github.com/myousefi/kaggle-llm-prompt-recovery

Last synced: 4 months ago
JSON representation

My scripts and models for the LLM Prompt Recovery Competition on Kaggle

Host: GitHub
URL: https://github.com/myousefi/kaggle-llm-prompt-recovery
Owner: myousefi
License: other
Created: 2024-04-08T22:56:25.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-04-20T07:49:20.000Z (about 1 year ago)
Last Synced: 2025-01-29T22:19:18.750Z (6 months ago)
Language: Jupyter Notebook
Homepage:
Size: 32 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Kaggle LLM Prompt Recovery

This repository contains code for the Kaggle competition "LLM Prompt Recovery". The goal of this competition is to develop models that can recover the original prompt given the response generated by a large language model (LLM).

I have utilized Slurm to submit jobs to the Discovery computing cluster, a generous cloud center provided by Northeastern University. The datasets library has been used to generate a large sample of prompt-response pairs from the Gemma-7b-it models. I have used QLoRA for Supervised Finetuning of Gemma-2b-it on the rewritten texts. I have explored the embedding space of the sentence-t5-base model, which was used to determine the semantic distance between the original prompt and the recovered prompt. Please note that the project files are not yet fully organized, and a comprehensive README is still in progress.

To better understand the Gemma family of models, I have developed two tools:

![t-SNE Visualization](fig/dash-tsne.gif)

This tool visualizes the 3-component t-SNE of the "Rewriting Prompts" in the embedding space of the sentence-t5-base model.

![Streamlit Dashboard](fig/streamlit.gif)

This Streamlit dashboard allows users to input text and a "Rewriting Prompt". It then queries the Gemma-7b-it model and outputs the rewritten text.

## Contact

If you have any questions or suggestions, please feel free to reach out to me at [[email protected]](mailto:[email protected]?subject=Kaggle%20LLM%20Prompt%20Recovery%20REPO).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/myousefi/kaggle-llm-prompt-recovery

Awesome Lists containing this project

README