{"id":28362119,"url":"https://github.com/rutujaingole/jailbreaking-deep-models","last_synced_at":"2026-05-14T20:05:37.985Z","repository":{"id":293918255,"uuid":"985501027","full_name":"rutujaingole/Jailbreaking-Deep-Models","owner":"rutujaingole","description":"This repository contains the codebase for Jailbreaking Deep Models, which investigates the vulnerability of deep convolutional neural networks to adversarial attacks. The project systematically implements and analyzes Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and localized patch-based attacks on the pretrained","archived":false,"fork":false,"pushed_at":"2025-05-17T23:06:10.000Z","size":13733,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-20T11:44:02.963Z","etag":null,"topics":["adversarial-attacks","deep-learning","densenet121","fgsm-attack","imagenet-classifier","jailbreak","machine-learning","numpy","patch-based-attack","pgd-adversarial-attacks","torch"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rutujaingole.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-17T22:34:50.000Z","updated_at":"2025-05-17T23:10:44.000Z","dependencies_parsed_at":"2025-05-17T23:35:48.400Z","dependency_job_id":null,"html_url":"https://github.com/rutujaingole/Jailbreaking-Deep-Models","commit_stats":null,"previous_names":["rutujaingole/jailbreaking-deep-models"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rutujaingole/Jailbreaking-Deep-Models","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rutujaingole%2FJailbreaking-Deep-Models","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rutujaingole%2FJailbreaking-Deep-Models/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rutujaingole%2FJailbreaking-Deep-Models/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rutujaingole%2FJailbreaking-Deep-Models/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rutujaingole","download_url":"https://codeload.github.com/rutujaingole/Jailbreaking-Deep-Models/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rutujaingole%2FJailbreaking-Deep-Models/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33041234,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adversarial-attacks","deep-learning","densenet121","fgsm-attack","imagenet-classifier","jailbreak","machine-learning","numpy","patch-based-attack","pgd-adversarial-attacks","torch"],"created_at":"2025-05-28T14:10:03.202Z","updated_at":"2026-05-14T20:05:37.980Z","avatar_url":"https://github.com/rutujaingole.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Jailbreaking Deep Models: Adversarial Attacks on ImageNet-Classifiers\n\nThis repository contains the codebase for Deep Learning Project 3 (Spring 2025), which investigates the vulnerability of deep convolutional neural networks to adversarial attacks. The project systematically implements and analyzes Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and localized patch-based attacks on the pretrained ResNet-34 model using a subset of the ImageNet-1K dataset.\n\n## Project Overview\n\nModern neural networks, despite achieving high accuracy on standard benchmarks, remain susceptible to carefully crafted perturbations that are often imperceptible to humans. This project demonstrates how such attacks can significantly degrade model performance, even reducing ResNet-34's top-1 accuracy from 76% to 0% under PGD, while appearing visually unchanged.\n\nThree attack strategies are explored:\n- **FGSM**: A fast, single-step attack used as a baseline\n- **PGD**: An iterative, stronger version of FGSM\n- **Patch-based PGD**: A constrained version where only a 32×32 region is perturbed\n\nIn addition, the transferability of these attacks is evaluated on DenseNet-121 to assess cross-model generalization.\n\n## Directory Structure\n```\n.\n├── code/\n│ ├── Project3_DL.ipynb # Main attack and evaluation notebook\n├── figures/\n│ ├── accuracy_barplot.png # Accuracy results under each attack\n│ ├── adv_visuals_all_attacks.png # Original vs adversarial comparison with diff maps\n│ └── Original and FGSM\n| └── PGD and Patch\n├── data/\n│ └── TestDataSet.zip # Provided test dataset (or instructions to download)\n\n```\n\n## Main Libraries:\n```\ntorch\ntorchvision\nmatplotlib\nnumpy\n\n```\n\n## Key Results\n\n| Attack Type | Top-1 Accuracy (ResNet-34) | Top-5 Accuracy (ResNet-34) |\n|-------------|-----------------------------|-----------------------------|\n| Original    | 0.7600                      | 0.9420                      |\n| FGSM        | 0.1040                      | 0.2460                      |\n| PGD         | 0.0000                      | 0.0340                      |\n| Patch       | 0.3700                      | 0.6940                      |\n\nTransferability to DenseNet-121 was observed to be most effective for FGSM, with reduced impact for PGD and patch-based attacks.\n\n## How to Run\n\nAll experiments are implemented using PyTorch. To reproduce results:\n1. Open the `Project3_DL.ipynb` notebook in Colab or Jupyter.\n2. Upload `TestDataSet.zip` to the working directory.\n3. Run the notebook sequentially to generate adversarial examples, evaluate model accuracy, and visualize outputs.\n\n## Notes\n\n- The dataset is not included due to size constraints; please upload `TestDataSet.zip` manually if running on Colab.\n- All images are normalized using ImageNet mean and standard deviation.\n- Visualizations include pixel-wise difference maps (amplified ×10) to highlight imperceptible perturbations.\n\n## Author\nRutuja Ingole\nNet ID: rdi4221\n\nThis project was completed as part of the Deep Learning course at NYU, Spring 2025.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frutujaingole%2Fjailbreaking-deep-models","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frutujaingole%2Fjailbreaking-deep-models","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frutujaingole%2Fjailbreaking-deep-models/lists"}