{"id":21823065,"url":"https://github.com/joisino/prism","last_synced_at":"2025-10-25T11:05:10.125Z","repository":{"id":243617203,"uuid":"812893854","full_name":"joisino/prism","owner":"joisino","description":"Code for \"Making Translators Privacy-aware on the User's Side\" (TMLR 2024)","archived":false,"fork":false,"pushed_at":"2024-06-10T05:33:46.000Z","size":489,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-21T11:31:25.663Z","etag":null,"topics":["llm","machine-learning","privacy","translation"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2312.04068","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/joisino.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-10T05:30:47.000Z","updated_at":"2024-08-15T11:40:55.000Z","dependencies_parsed_at":"2024-06-10T08:36:01.001Z","dependency_job_id":"cf783cc3-a6b0-45ef-a794-9c7ddff1f721","html_url":"https://github.com/joisino/prism","commit_stats":null,"previous_names":["joisino/prism"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/joisino/prism","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joisino%2Fprism","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joisino%2Fprism/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joisino%2Fprism/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joisino%2Fprism/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/joisino","download_url":"https://codeload.github.com/joisino/prism/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joisino%2Fprism/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280943394,"owners_count":26417747,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-25T02:00:06.499Z","response_time":81,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","machine-learning","privacy","translation"],"created_at":"2024-11-27T17:19:25.406Z","updated_at":"2025-10-25T11:05:10.108Z","avatar_url":"https://github.com/joisino.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Making Translators Privacy-aware on the User’s Side (TMLR 2024)\n\n[![arXiv](https://img.shields.io/badge/arXiv-2312.04068-b31b1b.svg)](https://arxiv.org/abs/2312.04068)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./imgs/logo.png\" width=50%\u003e\n\u003c/p\u003e\n\nWe propose PRISM to enable users of machine translation systems to preserve the privacy of data on their own initiative.\n\nPaper: https://arxiv.org/abs/2312.04068\n\n## ✨ Summary\n\n\u003cimg src=\"./imgs/overview.png\"\u003e\n\n▲ **Overview of PRISM**: PRISM converts the input sentence into a privacy-less sentence and sends it to the machine translation system. PRISM then converts the translated sentence back into the original sentence.\n\n## 💿 Preparation\n\nInstall [Poetry ](https://python-poetry.org/) and run the following command:\n\n```bash\n$ poetry install\n$ poetry run bash prepare.sh\n```\n\nSet an OpenAI API key in `.env`.\n\n## 🧪 Evaluation\n\n```bash\n$ poetry run python eval.py --method prismstar --translator chatgpt\n$ poetry run python eval.py --method prismr --translator chatgpt\n$ poetry run python eval.py --method nodecode --translator chatgpt\n$ poetry run python eval.py --method pup --translator chatgpt\n```\nPlease refer to the help command for further options.\n\n```\n$ poetry run python eval.py -h\nusage: eval.py [-h] [--lang LANG] [--basedir BASEDIR] [--rates RATES] [--method {pup,prismr,prismstar,nodecode}] [--translator {chatgpt,t5,t5-gpu}]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --lang LANG\n  --basedir BASEDIR\n  --rates RATES\n  --method {pup,prismr,prismstar,nodecode}\n  --translator {chatgpt,t5,t5-gpu}\n```\n\n### Results\n\n\u003cimg src=\"./imgs/results.png\"\u003e\n\n▲ **Results.** PRISM* strikes an excellent balance between privacy and translation quality.\n\nPlease refer to the paper for more details.\n\n## ⛏️ How to Build a Dictionary by Yourself\n\nRun the following command to extract candidate words from the corpus. It uses `load_mctest()` for the corpus. You can replace it with your own corpus. In general, it is recommended to use the same or similar corpus as the one used in the evaluation.\n\n```bash\n$ poetry run python extract_all_words.py\n```\n\nThen, run the following command to build a dictionary. It build a dictiory based on wmt14 dataset (i.e., a public news corpus).\n\n```bash\n$ poetry run python build_dict.py 1 -1 --target French\n$ poetry run merge_cand_words.py cand_words_French_1000\n```\n\nBulding the entire dictionary may take a long time. You can build each part separately (in separate machines) and merge them.\n\n```bash\n$ poetry run python build_dict.py 1 100 --target French\n$ poetry run python build_dict.py 100 200 --target French\n$ poetry run python build_dict.py 200 300 --target French\n...\n$ poetry run merge_cand_words.py cand_words_French_1000\n```\n\n## 🖋️ Citation\n\n```\n@article{sato2024making,\n  author    = {Ryoma Sato},\n  title     = {Making Translators Privacy-aware on the User’s Side},\n  journal   = {Transactions on Machine Learning Research},\n  year      = {2024},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoisino%2Fprism","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjoisino%2Fprism","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoisino%2Fprism/lists"}