{"id":17361215,"url":"https://github.com/evan-lloyd/graphpatch","last_synced_at":"2025-02-26T12:31:32.869Z","repository":{"id":211169038,"uuid":"728355277","full_name":"evan-lloyd/graphpatch","owner":"evan-lloyd","description":"graphpatch is a library for activation patching on PyTorch neural network models.","archived":false,"fork":false,"pushed_at":"2024-05-28T23:05:02.000Z","size":2251,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-05-29T11:57:15.465Z","etag":null,"topics":["interpretability","large-language-models","mechanistic-interpretability","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/evan-lloyd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-06T19:05:25.000Z","updated_at":"2024-06-01T00:26:29.476Z","dependencies_parsed_at":"2024-01-01T23:32:20.542Z","dependency_job_id":"341a0c39-7313-474d-a783-1518290ad4f5","html_url":"https://github.com/evan-lloyd/graphpatch","commit_stats":null,"previous_names":["evan-lloyd/graphpatch"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evan-lloyd%2Fgraphpatch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evan-lloyd%2Fgraphpatch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evan-lloyd%2Fgraphpatch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/evan-lloyd%2Fgraphpatch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/evan-lloyd","download_url":"https://codeload.github.com/evan-lloyd/graphpatch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240852533,"owners_count":19868273,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["interpretability","large-language-models","mechanistic-interpretability","pytorch"],"created_at":"2024-10-15T19:31:57.702Z","updated_at":"2025-02-26T12:31:32.815Z","avatar_url":"https://github.com/evan-lloyd.png","language":"Python","funding_links":[],"categories":["Mechanistic interpretability libraries"],"sub_categories":[],"readme":"# graphpatch 0.2.3\n\nDocumentation is hosted on [Read the Docs](https://graphpatch.readthedocs.io/en/stable).\n\n## Overview\n\n`graphpatch` is a library for [activation patching](https://graphpatch.readthedocs.io/en/stable/what_is_activation_patching.html#what-is-activation-patching) (often\nalso referred to as “ablation”) on [PyTorch](https://pytorch.org/docs/stable/index.html) neural network models. You use\nit by first wrapping your model in a [`PatchableGraph`](https://graphpatch.readthedocs.io/en/stable/patchable_graph.html#graphpatch.PatchableGraph) and then running operations in a context\ncreated by [`PatchableGraph.patch()`](https://graphpatch.readthedocs.io/en/stable/patchable_graph.html#graphpatch.PatchableGraph.patch):\n\n```python\npg = PatchableGraph(model, **inputs, use_cache=False)\n# Applies patches to the multiplication result within the activation function of the\n# MLP in the 18th transformer layer. ProbePatch records the last observed value at the\n# given node, while ZeroPatch zeroes out the value seen by downstream computations.\nwith pg.patch(\"transformer.h_17.mlp.act.mul_3\": [probe := ProbePatch(), ZeroPatch()]):\n   output = pg(**inputs)\n# Patches are applied in order. probe.activation holds the value prior\n# to ZeroPatch zeroing it out.\nprint(probe.activation)\n```\n\nIn contrast to [other approaches](#related-work), `graphpatch` can patch (or record) any\nintermediate tensor value without manual modification of the underlying model’s code. See [Working with graphpatch](https://graphpatch.readthedocs.io/en/stable/working_with_graphpatch.html#working-with-graphpatch) for\nsome tips on how to use the generated graphs.\n\nNote that `graphpatch` activation patches are compatible with [AutoGrad](https://pytorch.org/docs/stable/autograd.html)!\nThis means that, for example, you can perform optimizations over the `value` parameter to\n[`AddPatch`](https://graphpatch.readthedocs.io/en/stable/patch.html#graphpatch.patch.AddPatch):\n\n```python\ndelta = torch.zeros(size, requires_grad=True, device=\"cuda\")\noptimizer = torch.optim.Adam([delta], lr=0.5)\nfor _ in range(num_steps):\n   with graph.patch({node_name: AddPatch(value=delta)):\n      logits = graph(**prompt_inputs)\n   loss = my_loss_function(logits)\n   loss.backward()\n   optimizer.step()\n```\n\nFor a practical usage example, see the [demo](https://github.com/evan-lloyd/graphpatch/tree/main/demos/ROME) of using `graphpatch` to replicate [ROME](https://rome.baulab.info/).\n\n## Prerequisites\n\nThe only mandatory requirements are `torch\u003e=2` and `numpy\u003e=1.17`. Version 2+ of `torch` is required\nbecause `graphpatch` leverages [`torch.compile()`](https://pytorch.org/docs/stable/generated/torch.compile.html#torch.compile), which was introduced in `2.0.0`, to extract computational graphs from models.\nCUDA support is not required. `numpy` is required for full `compile()` support.\n\nPython 3.8–3.12 are supported. Note that `torch` versions prior to `2.1.0` do not support compilation\non Python 3.11, and versions prior to `2.4.0` do not support compilation on Python 3.12;\nyou will get an exception when trying to use `graphpatch` with such a configuration. No version of\n`torch` yet supports compilation on Python 3.13.\n\n## Installation\n\n`graphpatch` is available on PyPI, and can be installed via `pip`:\n\n```default\npip install graphpatch\n```\n\nNote that you will likely want to do this in an environment that already has `torch`, since `pip` may not resolve\n`torch` to a CUDA-enabled version by default. You don’t need to do anything special to make `graphpatch` compatible\nwith `transformers`, `accelerate`, and `bitsandbytes`; their presence is detected at run-time. However, for convenience,\nyou can install `graphpatch` with the “transformers” extra, which will install known compatible versions of these libraries along\nwith some of their optional dependencies that are otherwise mildly inconvenient to set up:\n\n```default\npip install graphpatch[transformers]\n```\n\n## Model compatibility\n\nFor full functionality, `graphpatch` depends on being able to call [`torch.compile()`](https://pytorch.org/docs/stable/generated/torch.compile.html#torch.compile) on your\nmodel. This currently supports a subset of possible Python operations–for example, it doesn’t support\ncontext managers. `graphpatch` implements some workarounds for situations that a native\n`compile()` can’t handle, but this coverage isn’t complete. To deal with this, `graphpatch`\nhas a graceful fallback that should be no worse of a user experience than using module hooks.\nIn that case, you will only be able to patch an uncompilable submodule’s inputs, outputs,\nparameters, and buffers. See [Notes on compilation](https://graphpatch.readthedocs.io/en/stable/notes_on_compilation.html#notes-on-compilation) for more discussion.\n\n## `transformers` integration\n\n`graphpatch` is theoretically compatible with any model in Huggingface’s [transformers](https://huggingface.co/docs/transformers/main/en/index)\nlibrary, but note that there may be edge cases in specific model code that it can’t yet handle. For\nexample, it is not (yet!) compatible with the key-value caching implementation, so if you want full\ncompilation of such models you should pass `use_cache=False` as part of the example inputs.\n\n`graphpatch` is compatible with models loaded via [accelerate](https://huggingface.co/docs/accelerate/main/en/index) and with 8-bit parameters\nquantized by [bitsandbytes](https://pypi.org/project/bitsandbytes/). This means that you can run `graphpatch` on\nmultiple GPU’s and/or with quantized inference very easily on models provided by `transformers`:\n\n```python\nmodel = LlamaForCausalLM.from_pretrained(\n   model_path,\n   device_map=\"auto\",\n   quantization_config=BitsAndBytesConfig(load_in_8bit=True),\n   torch_dtype=torch.float16,\n)\npg = PatchableGraph(model, **example_inputs, use_cache=False)\n```\n\nFor `transformers` models supporting the [`GenerationMixin`](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin) protocol, you will\nalso be able to use convenience functions like [`generate()`](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate) in\ncombination with activation patching:\n\n```python\n# Prevent Llama from outputting \"Paris\"\nwith pg.patch({\"lm_head.output\": ZeroPatch(slice=(slice(None), slice(None), 3681))}):\n   output_tokens = pg.generate(**inputs, max_length=20, use_cache=False)\n```\n\n### Version compatibility\n\n`graphpatch` should be compatible with all versions of optional libraries matching the minimum\nversion requirements, but this is a highly ambitious claim to make for a Python library. If you end\nup with errors that seem related to `graphpatch`’s integration with these libraries, you might try\nchanging their versions to those listed below. This list was automatically generated as part of the\n`graphpatch` release process. It reflects the versions used while testing `graphpatch 0.2.3`:\n\n```default\naccelerate==1.0.0\nbitsandbytes==0.44.1\nnumpy==1.24.4 (Python 3.8)\nnumpy==2.0.2 (Python 3.9)\nnumpy==2.1.1 (later Python versions)\nsentencepiece==0.2.0\ntransformer-lens==2.4.1\ntransformers==4.45.2\n```\n\n\u003ca id=\"related-work\"\u003e\u003c/a\u003e\n\n## Alternatives\n\n[`Module hooks`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_forward_hook) are built in to `torch` and can be used for activation\npatching. You can even add them to existing models without modifying their code. However, this will only give you\naccess to module inputs and outputs; accessing or patching intermediate values still requires a manual rewrite.\n\n[TransformerLens](https://transformerlensorg.github.io/TransformerLens/index.html) provides the\n[`HookPoint`](https://transformerlensorg.github.io/TransformerLens/generated/code/transformer_lens.hook_points.html#transformer_lens.hook_points.HookPoint) class, which can record and patch intermediate\nactivations. However, this requires manually rewriting your model’s code to wrap the values you want to make\npatchable.\n\n[TorchLens](https://github.com/johnmarktaylor91/torchlens) records and outputs visualizations for every intermediate\nactivation. However, it is currently unable to perform any activation patching.\n\n[nnsight](https://github.com/ndif-team/nnsight) offers a nice activation patching API, but is limited to\nmodule inputs and outputs.\n\n[pyvene](https://github.com/stanfordnlp/pyvene) offers fine-grained control over activation patches (for example, down to\na specific attention head), and a description language/serialization format to allow specification of reproducible\nexperiments.\n\n## Documentation index\n\n* [API](https://graphpatch.readthedocs.io/en/stable/api.html)\n  * [ExtractionOptions](https://graphpatch.readthedocs.io/en/stable/extraction_options.html)\n  * [Patch](https://graphpatch.readthedocs.io/en/stable/patch.html)\n  * [PatchableGraph](https://graphpatch.readthedocs.io/en/stable/patchable_graph.html)\n* [Data structures](https://graphpatch.readthedocs.io/en/stable/data_structures.html)\n  * [CompiledGraphModule](https://graphpatch.readthedocs.io/en/stable/compiled_graph_module.html)\n  * [MultiplyInvokedModule](https://graphpatch.readthedocs.io/en/stable/multiply_invoked_module.html)\n  * [NodePath](https://graphpatch.readthedocs.io/en/stable/node_path.html)\n  * [OpaqueGraphModule](https://graphpatch.readthedocs.io/en/stable/opaque_graph_module.html)\n* [Notes on compilation](https://graphpatch.readthedocs.io/en/stable/notes_on_compilation.html)\n* [What is activation patching?](https://graphpatch.readthedocs.io/en/stable/what_is_activation_patching.html)\n* [Working with `graphpatch`](https://graphpatch.readthedocs.io/en/stable/working_with_graphpatch.html)\n\n* [Full index](https://graphpatch.readthedocs.io/en/stable/genindex.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevan-lloyd%2Fgraphpatch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevan-lloyd%2Fgraphpatch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevan-lloyd%2Fgraphpatch/lists"}