{"id":48568876,"url":"https://github.com/ml-research/localizing_memorization_in_diffusion_models","last_synced_at":"2026-04-08T13:33:33.599Z","repository":{"id":242533280,"uuid":"809817902","full_name":"ml-research/localizing_memorization_in_diffusion_models","owner":"ml-research","description":"[NeurIPS 2024] Source code for our paper \"Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models\".","archived":false,"fork":false,"pushed_at":"2025-07-18T17:39:29.000Z","size":18833,"stargazers_count":12,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-07-18T22:02:47.727Z","etag":null,"topics":["artificial-intelligence","deep-learning","diffusion-models","machine-learning","memorization","privacy","stable-diffusion"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2406.02366","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ml-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-03T14:02:09.000Z","updated_at":"2025-07-18T17:39:33.000Z","dependencies_parsed_at":"2024-06-03T16:19:23.490Z","dependency_job_id":"a21f14c4-79dd-4559-906f-6a3a0d0782bc","html_url":"https://github.com/ml-research/localizing_memorization_in_diffusion_models","commit_stats":null,"previous_names":["ml-research/localizing_memorization_in_diffusion_models"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ml-research/localizing_memorization_in_diffusion_models","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-research%2Flocalizing_memorization_in_diffusion_models","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-research%2Flocalizing_memorization_in_diffusion_models/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-research%2Flocalizing_memorization_in_diffusion_models/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-research%2Flocalizing_memorization_in_diffusion_models/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ml-research","download_url":"https://codeload.github.com/ml-research/localizing_memorization_in_diffusion_models/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-research%2Flocalizing_memorization_in_diffusion_models/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31558383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T10:21:54.569Z","status":"ssl_error","status_checked_at":"2026-04-08T10:21:38.171Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","diffusion-models","machine-learning","memorization","privacy","stable-diffusion"],"created_at":"2026-04-08T13:33:32.783Z","updated_at":"2026-04-08T13:33:33.591Z","avatar_url":"https://github.com/ml-research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# _Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models_\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ml-research/localizing_memorization_in_diffusion_models)\n  \u003ccenter\u003e\n  \u003cimg src=\"images/concept.jpg\" alt=\"Concept\"  height=230\u003e\n  \u003c/center\u003e\n\n\u003e **Abstract:**\n\u003e *Diffusion models (DMs) produce very detailed and high-quality images. Their power results from extensive training on large amounts of data — usually scraped from the internet without proper attribution or consent from content creators.  Unfortunately, this practice raises privacy and intellectual property concerns, as DMs can memorize and later reproduce their potentially sensitive or copyrighted training images at inference time. Prior efforts prevent this issue by either changing the input to the diffusion process, thereby preventing the DM from generating memorized samples during inference, or removing the memorized data from training altogether. While those are viable solutions when the DM is developed and deployed in a secure and constantly monitored environment, they hold the risk of adversaries circumventing the safeguards and are not effective when the DM itself is publicly released. To solve the problem, we introduce NeMo, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers. Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples. By deactivating these memorization neurons, we can avoid the replication of training data at inference time, increase the diversity in the generated outputs, and mitigate the leakage of private and copyrighted data. In this way, our NeMo contributes to a more responsible deployment of DMs.*  \n[Paper (Arxiv)](https://arxiv.org/abs/2406.02366)  \n[Paper Page](https://ml-research.github.io/localizing_memorization_in_diffusion_models/)  \n\n# Setup\n\nThe easiest way to perform the attacks is to run the code in a Docker container. To build the Docker image, run the following script:\n```bash\ndocker build -t nemo  .\n```\n\nTo create and start a Docker container, run the following command from the project's root:\n```bash\ndocker run --rm --shm-size 16G --name my_container --gpus '\"device=all\"' -v $(pwd):/workspace -it nemo bash\n```\nTo add additional GPUs, modify the option ```'\"device=0,1,2\"'``` accordingly. Detach from the container using ```Ctrl+P``` followed by ```Ctrl+Q```.\n\n# Localizing Memorization\nThe following steps describe how to apply NeMo to detect memorizing neurons in Stable Diffusion. Each script provides multiple options; run a script with the option -h to get the list of options. Default values correspond to the settings used in the main paper. The first two steps can be skipped since we already provide the required statistics and thresholds.\n\n## 1. Calculating Activation Statistics (Optional)\nTo identify neurons that memorize specific samples, we must first calculate the activation statistics on unmemorized samples. Use the following script:\n```python \npython 1_compute_activations_statistics.py\n```\nPre-computed activation statistics for Stable Diffusion v1-4 and 50,000 LAION prompts are provided at ```statistics/statistics_additional_laion_prompts_v1_4.pt```.\n\n## 2. Calculate SSIM Thresholds (Optional)\nIn addition to activation statistics on unmemorized prompts, we need SSIM thresholds for the neuron detection algorithm. First, calculate the pairwise SSIM between different seeds of unmemorized prompts:\n```python\npython 2_compute_pairwise_ssim.py\n```\n\nManually calculate the thresholds by loading the file with PyTorch and compute the mean and standard deviation. For the paper, the threshold is set to $0.428$, which corresponds to the mean SSIM score plus one standard deviation. This value is also set as the default in the following detection step.\n\n## 3. Detect Memorization Neurons\nTo identify memorization neurons, run the following script. Both the initial selection and the refinement process are automatically executed:\n\n```python\npython 3_detect_memorized_neurons.py\n```\n\n## 4. Image Generation\nTo calculate metrics, generate the original images (without blocking neurons) and then generate images with the identified neurons blocked. Use the following scripts:\n\n```python\npython 4_generate_images.py --original_images -o=generated_images_unblocked\npython 4_generate_images.py --refined_neurons -o=generated_images_blocked\n```\n\n  \u003ccenter\u003e\n  \u003cimg src=\"images/examples.png\" alt=\"Examples\"  height=160\u003e\n  \u003c/center\u003e\n\n# Evaluation Metrics\n\nAfter generating images, compute the metrics by running the scripts in the [metrics](metrics) directory. For all metrics, provide the link to the CSV result file containing the detected neurons. To split the results into VM and TM prompts, also provide a link to the original prompt file with ```-p=prompts/memorized_laion_prompts.csv```\n\nFor SSCD-based metrics, download the model via ```wget https://dl.fbaipublicfiles.com/sscd-copy-detection/sscd_disc_mixup.torchscript.pt``` and place it in the project's root folder.\n\n## Memorization\nThe memorization metrics measure the degree of memorization still present in the generated images. Generate images for each memorized prompt with activated/deactivated memorization neurons and measure the cosine similarities between image pairs using SSCD embeddings to quantify memorization. Additionally, measure the degree of memorization towards the original training images. First, download the original images following the URLs provided in the [prompt file](prompts/memorized_laion_prompts.csv). Ensure the downloaded images are enumerated like ```0001_first_image.jpg``` to match the generated and original images in the script. Higher SSCD scores indicate a higher degree of memorization. Run the following scripts to compute the memorization metrics:\n\n```python\npython metrics/compute_sscd_gen.py -p=prompts/memorized_laion_prompts.csv -f=generated_images_blocked -r=generated_images_unblocked\npython metrics/compute_sscd_orig.py -p=prompts/memorized_laion_prompts.csv -f=generated_images_blocked -r=original_images\n```\n\n## Diversity\nThe diversity metric assesses the variety of images generated for the same memorized prompt with different seeds. Deactivating memorization neurons increases the diversity of generated images. Compute the diversity metric by running the following script:\n```python\npython metrics/compute_diversity.py -p=prompts/memorized_laion_prompts.csv -f=generated_images_blocked\n```\n\n## Quality\nTo assess the overall image quality of a DM with activated/deactivated neurons, compute the Fréchet Inception Distance (FID), CLIP-FID, and Kernel Inception Distance (KID) on COCO prompts using the [clean-fid](https://github.com/GaParmar/clean-fid) implementation. This implementation requires two folders: one with the original images and one with the generated images. \n\nAdditionally, compute the similarities between the generated images and the input prompts using CLIP scores to ensure alignment between the generated images and their prompts. Run the following script to compute the prompt alignment:\n\n```python\npython metrics/compute_prompt_alignment.py -p=prompts/memorized_laion_prompts.csv -f=generated_images_blocked\n```\n\n# Citation\nIf you build upon our work, please don't forget to cite us.\n```\n@inproceedings{hintersdorf2024nemo,\n    title={Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models},\n    author={Dominik Hintersdorf and Lukas Struppek and Kristian Kersting and Adam Dziedzic and Franziska Boenisch},\n    booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},\n    year={2024}\n}\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-research%2Flocalizing_memorization_in_diffusion_models","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fml-research%2Flocalizing_memorization_in_diffusion_models","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-research%2Flocalizing_memorization_in_diffusion_models/lists"}