{"id":20298114,"url":"https://github.com/jwkirchenbauer/lm-watermarking","last_synced_at":"2025-05-07T20:34:23.183Z","repository":{"id":65545380,"uuid":"592876462","full_name":"jwkirchenbauer/lm-watermarking","owner":"jwkirchenbauer","description":null,"archived":false,"fork":false,"pushed_at":"2024-03-14T15:39:17.000Z","size":12215,"stargazers_count":418,"open_issues_count":0,"forks_count":58,"subscribers_count":18,"default_branch":"main","last_synced_at":"2024-03-14T17:07:59.202Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jwkirchenbauer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-01-24T18:17:28.000Z","updated_at":"2024-03-14T16:51:53.000Z","dependencies_parsed_at":"2023-02-14T00:31:16.075Z","dependency_job_id":"5711cbff-ea40-4001-a4f0-f46d539d31d5","html_url":"https://github.com/jwkirchenbauer/lm-watermarking","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwkirchenbauer%2Flm-watermarking","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwkirchenbauer%2Flm-watermarking/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwkirchenbauer%2Flm-watermarking/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jwkirchenbauer%2Flm-watermarking/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jwkirchenbauer","download_url":"https://codeload.github.com/jwkirchenbauer/lm-watermarking/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252953717,"owners_count":21830890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T16:02:11.478Z","updated_at":"2025-05-07T20:34:23.160Z","avatar_url":"https://github.com/jwkirchenbauer.png","language":"Jupyter Notebook","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"# 💧 [A Watermark for Large Language Models](https://arxiv.org/abs/2301.10226) 🔍\n\n### [Demo](https://huggingface.co/spaces/tomg-group-umd/lm-watermarking) | [Paper](https://arxiv.org/abs/2301.10226)\n\nOfficial implementation of the watermarking and detection algorithms presented in the papers:\n\n\"A Watermark for Large language Models\" by _John Kirchenbauer*, Jonas Geiping*, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein_  \n\n\"On the Reliability of Watermarks for Large Language Models\" by _John Kirchenbauer*, Jonas Geiping*, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, Tom Goldstein_\n\n### Updates:\n\n- **(6/7/23)** We're thrilled to announce the release of [\"On the Reliability of Watermarks for Large Language Models\"](https://arxiv.org/abs/2306.04634) Our new preprint documents a deep dive into the robustness properties of more advanced watermarks.\n\n- **(6/9/23)** Initial code release implementing the alternate watermark and detector variants in the new work. Files located in this subdirectory: [`watermark_reliability_release`](watermark_reliability_release).\n\n- **(9/23/23)** Update to the docs with recommendations on parameter settings. Extended implementation (recommended) available in `extended_watermark_processor.py`.\n\n- **(1/16/24)** [\"On the Reliability of Watermarks for Large Language Models\"](https://arxiv.org/abs/2306.04634) has been accepted for publication and will be presented at ICLR 2024 in Vienna, Austria!\n\n---\n\nImplementation is based on the \"logit processor\" abstraction provided by the [huggingface/transformers 🤗](https://github.com/huggingface/transformers) library.\n\nThe `WatermarkLogitsProcessor` is designed to be readily compatible with any model that supports the `generate` API.\nAny model that can be constructed using the `AutoModelForCausalLM` or `AutoModelForSeq2SeqLM` factories _should_ be compatible.\n\n### Repo contents\n\nThe core implementation is defined by the `WatermarkBase`, `WatermarkLogitsProcessor`, and `WatermarkDetector` classes in the files `watermark_processor.py`, for a minimal implementation and `extended_watermark_processor.py` for the more full featured implementation (recommended).\nThe `demo_watermark.py` script implements a gradio demo interface as well as minimum working example in the `main` function using the minimal version.\n\nDetails about the parameters and the detection outputs are provided in the gradio app markdown blocks as well as the argparse definition.\n\nThe `homoglyphs.py` and `normalizers.py` modules implement algorithms used by the `WatermarkDetector`. `homoglyphs.py` (and its raw data in `homoglyph_data`) is an updated version of the homoglyph code from the deprecated package described here: https://github.com/life4/homoglyphs.\nThe `experiments` directory contains pipeline code that we used to run the original experiments in the paper. However this is stale/deprecated\nin favor of the implementation in `watermark_processor.py`.\n\n### Demo Usage\n\nAs a quickstart, the app can be launched with default args (or deployed to a [huggingface Space](https://huggingface.co/spaces)) using `app.py`\nwhich is just a thin wrapper around the demo script.\n```sh\npython app.py\ngradio app.py # for hot reloading\n# or\npython demo_watermark.py --model_name_or_path facebook/opt-6.7b\n```\n\n\n### How to Watermark - A short guide on watermark hyperparameters\nWhat watermark hyperparameters are optimal for your task or for a comparison to new watermarks? We'll provide a brief overview about all important settings below, and best practices for future work. This guide represents our current understanding of optimal settings as of August 2023, and so is a bit more up to date than our ICML 2023 conference paper.\n\n**TL;DR**: As a baseline generation setting, we suggest default values of `gamma=0.25` and `delta=2.0`. Reduce delta if text quality is negatively impacted. For the context width, h, we recommend a moderate value, i.e. h=4, and as a default PRF we recommend `selfhash`, but can use `minhash` if you want. Reduce h if more robustness against edits is required. Note however that the choice of PRF only matters if h\u003e1. The recommended PRF and context width can be easily selected by instantiating the watermark processor and detector with `seeding_scheme=\"selfhash\"` (a shorthand for `seeding_scheme=\"ff-anchored_minhash_prf-4-True-15485863\"`, but do use a different base key if actually deploying). For detection, always run with `--ignore--repeated-ngrams=True`.\n\n1) **Logit bias delta**: The magnitude of delta determines the strength of the watermark. A sufficiently large value of delta recovers a \"hard\" watermark that encodes 1 bit of information at every token, but this is not an advisable setting, as it strongly affects model quality. A moderate delta in the range of [0.5, 2.0] is appropriate for normal use cases, but the strength of delta is relative to the entropy of the output distribution. Models that are overconfident, such as instruction-tuned models, may benefit from choosing a larger delta value. With non-infinite delta values, the watermark strength is directly proportional to the (spike) entropy of the text and exp(delta) (see Theorem 4.2 in our paper).\n\n2) **Context width h**: Context width is the length of the context which is taken into account when seeding the watermark at each location. The longer the context, the \"more random\" the red/green list partitions are, and the less detectable the watermark is. For private watermarks, this implies that the watermark is harder to discover via brute-force (with an exponential increase in hardness with increasing context width h).\nIn the limit of a very long context width, we approach the \"undetectable\" setting of https://eprint.iacr.org/2023/763. However, the longer the context width, the less \"nuclear\" the watermark is, and robustness to paraphrasing and other attacks decreases. In the limit of h=0, the watermark is independent of local context and, as such, it is minimally random, but maximally robust against edits (see https://arxiv.org/abs/2306.17439).\n\n3) **Ignoring repeated ngrams**: The watermark is only pseudo-random based on the local context. Whenever local context repeats, this constitutes a violation of the assumption that the PRNG numbers used to seed the green/red partition operation are drawn iid. (See Sec.4. in our paper for details). For this reason, p-values for text with repeated n-grams (n-gram here meaning context + chosen token) will be misleading. As such, detection should be run with `--ignore-repeated-ngrams` set to `True`. An additional, detailed analysis of this effect can be found in http://arxiv.org/abs/2308.00113.\n\n4) **Choice of pseudo-random-function** (PRF): This choice is only relevant if context width h\u003e1 and determines the robustness of the hash of the context against edits. In our experiments we find \"min\"-hash PRFs to be the most performant in striking a balance between maximizing robustness and minimizing impact on text quality. In comparison to a PRF that depends on the entire context, this PRF only depends on a single, randomly chosen token from the context.\n\n5) **Self-Hashing**: It is possible to extend the context width of the watermark onto the current token. This effectively extends the context width \"for-free\" by one. The only downside is that this approach requires hashing all possible next tokens, and applying the logit bias only to tokens where their inclusion in the context would produce a hash that includes this token on the green list. This is slow in the way we implement it, because we use cuda's pseudorandom number generator and a simple inner-loop implementation, but in principle has a negligible cost, compared to generating new tokens if engineered for deployment. A generalized algorithm for self-hashing can be found as Alg.1 in http://arxiv.org/abs/2306.04634.\n\n6) **Gamma**: Gamma denotes the fraction of the vocabulary that will be in each green list. We find gamma=0.25 to be slightly more optimal empirically, but this is a minor effect and reasonable values of gamma between 0.25 and 0.75 will lead to reasonable watermark. A intuitive argument can be made for why this makes it easier to achieve a fraction of green tokens sufficiently higher than gamma to reject the null hypothesis, when you choose a lower gamma value.\n\n7) **Base Key**: Our watermark is salted with a small base key of 15485863 (the millionth prime). If you deploy this watermark, we do not advise re-using this key.\n\n### How to use the watermark in your own code.\n\nOur implementation can be added into any huggingface generation pipeline as an additional `LogitProcessor`, only the classes `WatermarkLogitsProcessor` and `WatermarkDetector` from the `extended_watermark_processor.py` file are required.\n\nExample snippet to generate watermarked text:\n```python\n\nfrom extended_watermark_processor import WatermarkLogitsProcessor\n\nwatermark_processor = WatermarkLogitsProcessor(vocab=list(tokenizer.get_vocab().values()),\n                                               gamma=0.25,\n                                               delta=2.0,\n                                               seeding_scheme=\"selfhash\") #equivalent to `ff-anchored_minhash_prf-4-True-15485863`\n# Note:\n# You can turn off self-hashing by setting the seeding scheme to `minhash`.\n\ntokenized_input = tokenizer(input_text, return_tensors='pt').to(model.device)\n# note that if the model is on cuda, then the input is on cuda\n# and thus the watermarking rng is cuda-based.\n# This is a different generator than the cpu-based rng in pytorch!\n\noutput_tokens = model.generate(**tokenized_input,\n                               logits_processor=LogitsProcessorList([watermark_processor]))\n\n# if decoder only model, then we need to isolate the\n# newly generated tokens as only those are watermarked, the input/prompt is not\noutput_tokens = output_tokens[:,tokenized_input[\"input_ids\"].shape[-1]:]\n\noutput_text = tokenizer.batch_decode(output_tokens, skip_special_tokens=True)[0]\n```\n\nExample snippet to detect watermarked text:\n```python\n\nfrom extended_watermark_processor import WatermarkDetector\n\nwatermark_detector = WatermarkDetector(vocab=list(tokenizer.get_vocab().values()),\n                                        gamma=0.25, # should match original setting\n                                        seeding_scheme=\"selfhash\", # should match original setting\n                                        device=model.device, # must match the original rng device type\n                                        tokenizer=tokenizer,\n                                        z_threshold=4.0,\n                                        normalizers=[],\n                                        ignore_repeated_ngrams=True)\n\nscore_dict = watermark_detector.detect(output_text) # or any other text of interest to analyze\n```\n\nTo recover the main settings of the experiments in the original work (for historical reasons), use the seeding scheme `simple_1` and set `ignore_repeated_ngrams=False` at detection time.\n\n\n### Contributing\nSuggestions and PR's welcome 🙂\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwkirchenbauer%2Flm-watermarking","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjwkirchenbauer%2Flm-watermarking","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjwkirchenbauer%2Flm-watermarking/lists"}