{"id":20663717,"url":"https://github.com/vita-group/dp-opt","last_synced_at":"2025-04-19T15:56:06.622Z","repository":{"id":211497923,"uuid":"717887312","full_name":"VITA-Group/DP-OPT","owner":"VITA-Group","description":"[ICLR'24] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer","archived":false,"fork":false,"pushed_at":"2024-01-17T15:19:26.000Z","size":41,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":12,"default_branch":"main","last_synced_at":"2024-01-18T06:20:01.994Z","etag":null,"topics":["llm","privacy","prompt-engineering"],"latest_commit_sha":null,"homepage":"https://jyhong.gitlab.io/publication/2023dp_opt/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VITA-Group.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.bib","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-11-12T22:18:31.000Z","updated_at":"2024-04-14T17:55:27.758Z","dependencies_parsed_at":"2024-04-19T11:31:33.829Z","dependency_job_id":null,"html_url":"https://github.com/VITA-Group/DP-OPT","commit_stats":null,"previous_names":["vita-group/dp-opt"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDP-OPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDP-OPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDP-OPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDP-OPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VITA-Group","download_url":"https://codeload.github.com/VITA-Group/DP-OPT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249731224,"owners_count":21317341,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","privacy","prompt-engineering"],"created_at":"2024-11-16T19:19:29.765Z","updated_at":"2025-04-19T15:56:06.587Z","avatar_url":"https://github.com/VITA-Group.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer\n====================================================\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n\nOfficial PyTorch Code for Paper: \"DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer\" [Junyuan Hong](https://jyhong.gitlab.io/), [Jiachen T. Wang](https://tianhaowang.netlify.app/), [Chenhui Zhang](https://scholar.google.com/citations?user=UYxdrBsAAAAJ\u0026hl=en), [Zhangheng Li](https://scholar.google.com/citations?user=NZCLqZMAAAAJ), [Bo Li](https://aisecure.github.io/), [Zhangyang Wang](https://vita-group.github.io/), *ICLR (Spotlight, top-5%)* 2024.\n\n[paper](https://arxiv.org/abs/2312.03724) / [code](https://github.com/VITA-Group/DP-OPT) / [blog](https://jyhong.gitlab.io/publication/2023dp_opt/)\n\n**TL;DR**: We proposed the first end-to-end privacy-preserving automatic prompt engineering method.\n\n## Overview\n\n\n![featured](https://jyhong.gitlab.io/publication/2023dp_opt/featured.png)\n\nLarge Language Models (LLMs) have emerged as dominant tools for various tasks, particularly when tailored for a specific target by prompt tuning. Nevertheless, concerns surrounding data privacy present obstacles due to the tuned prompts' dependency on sensitive private information. A practical solution is to host a local LLM and optimize a soft prompt privately using data. Yet, hosting a local model becomes problematic when model ownership is protected. Alternative methods, like sending data to the model’s provider for training, intensify these privacy issues facing an untrusted provider. In this paper, we present a novel solution called Differentially-Private Offsite Prompt Tuning (DP-OPT) to address this challenge. Our approach involves tuning a discrete prompt on the client side and then applying it to the desired cloud models. We demonstrate that prompts suggested by LLMs themselves can be transferred without compromising performance significantly. To ensure that the prompts do not leak private information, we introduce the first private prompt generation mechanism, by a differentially-private (DP) ensemble of in-context learning with private demonstrations. With DP-OPT, generating privacy-preserving prompts by Vicuna-7b can yield competitive performance compared to non-private in-context learning on GPT3.5 or local private prompt tuning.\n\n## Get Started\n\nPrepare conda env.\n```shell\nconda create --name dp-opt python=3.8 -y\nconda activate dp-opt\npip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\npip install transformers datasets accelerate sentencepiece scikit-learn wandb autodp\n# transformers==4.28.1\n```\n\nPrepare DLN datasets\n```shell\nbash setup_data.sh\n```\n\nTo use openai models, create `openai_config.py` in the root folder. This will be only used for evaluation.\n```python\nimport openai\n\nopenai.api_key = \"\u003cyour-key\u003e\"\n# openai.organization = \"\u003cyour-org\u003e\"\nopenai.api_base = \"https://api.openai.com/v1\"\nopenai_model_types = ['text-davinci-003']\n```\n\u003e :warning: **Warning:** Setting `echo` and `logprobs` simultaneously is no longer supported for certain OpenAI models.\n\u003e However, classification inference with openai models requires both settings. Consider to host your own models, e.g., thru vLLM, instead.\n\n**Example**: Do prompt engineer on website:\n```shell\npip install gradio\npython web_demo.py\n# open http://127.0.0.1:7860\n```\n\n| Train | Test |\n| :---- | :--- |\n| ![](https://github.com/VITA-Group/DP-OPT/assets/6964516/fabcb8a1-65c9-4d6f-ac4f-e1274ad85b51) | ![](https://github.com/VITA-Group/DP-OPT/assets/6964516/12982a66-dcb0-4e77-ae69-be9eec87db07\") |\n\n**Example**: Use local model (`lmsys/vicuna-7b-v1.3`) to generate a instruction and test the instruction by OpenAI model (`text-davinci-003`).\n* OPT:\n```shell\n# generate a instruction\npython train_opt.py --ape_mode=iid_ibwd --ensemble_gen=True --gen_temp=1.1 --num_prompt=40 --max_new_tokens=50 \\\n--data=sst2 --holdout_ratio=0.01\n# evaluate the instruction\npython eval_opt.py --ape_mode=iid_ibwd --ensemble_gen=True --gen_temp=1.1 --num_prompt=40 --max_new_tokens=50 \\\n--data=sst2 \\\n--test_model=text-davinci-003\n```\n* DP-OPT:\n```shell\n# generate a instruction\npython train_opt.py --ape_mode=iid_ibwd --ensemble_gen=True --gen_temp=1.1 --num_prompt=40 --max_new_tokens=50 \\\n--data=sst2 --holdout_ratio=0.01 \\\n--target_eps=8. --dp_eps=1.8 --dp_delta=5e-7 --tokenwise_gen=True\n# evaluate the instruction\npython eval_opt.py --ape_mode=iid_ibwd --ensemble_gen=True --gen_temp=1.1 --num_prompt=40 --max_new_tokens=50 \\\n--data=sst2 \\\n--target_eps=8. --dp_eps=1.8 --dp_delta=5e-7 --tokenwise_gen=True \\\n--test_model=text-davinci-003\n```\n\n## Experiments\n\nWandb sweeps files are under `sweeps/\u003cdata_name\u003e/\u003cmethod\u003e.yml`.\n`sweeps/\u003cdata_name\u003e/\u003cmethod\u003e.yml` is used for tuning prompts.\nWe use `sweeps/\u003cdata_name\u003e/\u003cmethod\u003e_test.yml` to test prompts on different models.\n\nSupported datasets: `sst2`, `trec`, `mpqa`, `disaster`.\n\n![image](https://github.com/VITA-Group/DP-OPT/assets/6964516/8040b268-1c19-4d5a-8583-44ed23a0a090)\n\nMethods (exmaplified on `sst2`):\n* 5-shot In-Context Learning (ICL)\n```shell\nwandb sweep sweeps/sst2/icl.yml\n```\n* Deep Language Network with One-layer (DLN-1)\n```shell\nwandb sweep sweeps/sst2/dln1.yml\nwandb sweep sweeps/sst2/dln1_test.yml\n```\n* Offsite Prompt Tuning (OPT)\n```shell\nwandb sweep sweeps/sst2/opt.yml\nwandb sweep sweeps/sst2/opt_test.yml\n```\n* Differentially-Private Offsite Prompt Tuning (DP-OPT)\n```shell\nwandb sweep sweeps/sst2/dp-opt.yml\nwandb sweep sweeps/sst2/dp-opt_test.yml\n```\n-----\nPart of the codes are based on [deep-language-networks](https://github.com/microsoft/deep-language-networks).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvita-group%2Fdp-opt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvita-group%2Fdp-opt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvita-group%2Fdp-opt/lists"}