{"id":19428878,"url":"https://github.com/eleutherai/concept-erasure","last_synced_at":"2025-04-04T09:07:06.804Z","repository":{"id":162441221,"uuid":"636982911","full_name":"EleutherAI/concept-erasure","owner":"EleutherAI","description":"Erasing concepts from neural representations with provable guarantees","archived":false,"fork":false,"pushed_at":"2025-01-27T01:06:54.000Z","size":174,"stargazers_count":226,"open_issues_count":4,"forks_count":15,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-28T08:05:05.633Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EleutherAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-06T06:36:23.000Z","updated_at":"2025-03-12T16:53:29.000Z","dependencies_parsed_at":"2024-01-25T06:24:45.391Z","dependency_job_id":"4a4e25df-2014-48d2-a293-6b1e927d1c14","html_url":"https://github.com/EleutherAI/concept-erasure","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fconcept-erasure","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fconcept-erasure/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fconcept-erasure/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EleutherAI%2Fconcept-erasure/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EleutherAI","download_url":"https://codeload.github.com/EleutherAI/concept-erasure/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247149500,"owners_count":20891954,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T14:17:04.277Z","updated_at":"2025-04-04T09:07:06.784Z","avatar_url":"https://github.com/EleutherAI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Least-Squares Concept Erasure (LEACE)\nConcept erasure aims to remove specified features from a representation. It can be used to improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept to observe changes in model behavior). This is the repo for **LEAst-squares Concept Erasure (LEACE)**, a closed-form method which provably prevents all linear classifiers from detecting a concept while inflicting the least possible damage to the representation. You can check out the paper [here](https://arxiv.org/abs/2306.03819).\n\n# Installation\n\nWe require Python 3.10 or later. You can install the package from PyPI:\n\n```bash\npip install concept-erasure\n```\n\n# Usage\n\nThe two main classes in this repo are `LeaceFitter` and `LeaceEraser`.\n\n- `LeaceFitter` keeps track of the covariance and cross-covariance statistics needed to compute the LEACE erasure function. These statistics can be updated in an incremental fashion with `LeaceFitter.update()`. The erasure function is lazily computed when the `.eraser` property is accessed. This class uses O(_d\u003csup\u003e2\u003c/sup\u003e_) memory, where _d_ is the dimensionality of the representation, so you may want to discard it after computing the erasure function.\n- `LeaceEraser` is a compact representation of the LEACE erasure function, using only O(_dk_) memory, where _k_ is the number of classes in the concept you're trying to erase (or equivalently, the _dimensionality_ of the concept if it's not categorical).\n\n## Batch usage\n\nIn most cases, you probably have a batch of feature vectors `X` and concept labels `Z` and want to erase the concept from `X`. The easiest way to do this is by using the `LeaceEraser.fit()` convenience method:\n\n```python\nimport torch\nfrom sklearn.datasets import make_classification\nfrom sklearn.linear_model import LogisticRegression\n\nfrom concept_erasure import LeaceEraser\n\nn, d, k = 2048, 128, 2\n\nX, Y = make_classification(\n    n_samples=n,\n    n_features=d,\n    n_classes=k,\n    random_state=42,\n)\nX_t = torch.from_numpy(X)\nY_t = torch.from_numpy(Y)\n\n# Logistic regression does learn something before concept erasure\nreal_lr = LogisticRegression(max_iter=1000).fit(X, Y)\nbeta = torch.from_numpy(real_lr.coef_)\nassert beta.norm(p=torch.inf) \u003e 0.1\n\neraser = LeaceEraser.fit(X_t, Y_t)\nX_ = eraser(X_t)\n\n# But learns nothing after\nnull_lr = LogisticRegression(max_iter=1000, tol=0.0).fit(X_.numpy(), Y)\nbeta = torch.from_numpy(null_lr.coef_)\nassert beta.norm(p=torch.inf) \u003c 1e-4\n```\n\n## Streaming usage\nIf you have a **stream** of data, you can use `LeaceFitter.update()` to update the statistics. This is useful if you have a large dataset and want to avoid storing it all in memory.\n\n```python\nfrom concept_erasure import LeaceFitter\nfrom sklearn.datasets import make_classification\nimport torch\n\nn, d, k = 2048, 128, 2\n\nX, Y = make_classification(\n    n_samples=n,\n    n_features=d,\n    n_classes=k,\n    random_state=42,\n)\nX_t = torch.from_numpy(X)\nY_t = torch.from_numpy(Y)\n\nfitter = LeaceFitter(d, 1, dtype=X_t.dtype)\n\n# Compute cross-covariance matrix using batched updates\nfor x, y in zip(X_t.chunk(2), Y_t.chunk(2)):\n    fitter.update(x, y)\n\n# Erase the concept from the data\nx_ = fitter.eraser(X_t[0])\n```\n\n# Paper replication\n\nScripts used to generate the part-of-speech tags for the concept scrubbing experiments can be found in [this repo](https://github.com/EleutherAI/tagged-pile). We plan to upload the tagged datasets to the HuggingFace Hub shortly.\n\n## Concept scrubbing\n\nThe concept scrubbing code is a bit messy right now, and will probably be refactored soon. We found it necessary to write bespoke implementations for different HuggingFace model families. So far we've implemented LLaMA and GPT-NeoX. These can be found in the `concept_erasure.scrubbing` submodule.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feleutherai%2Fconcept-erasure","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feleutherai%2Fconcept-erasure","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feleutherai%2Fconcept-erasure/lists"}