{"id":15679223,"url":"https://github.com/orgoro/white-2-black","last_synced_at":"2025-05-07T09:27:04.703Z","repository":{"id":102370950,"uuid":"137789681","full_name":"orgoro/white-2-black","owner":"orgoro","description":"The official code to reproduce results from the NACCL2019 paper: White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks","archived":false,"fork":false,"pushed_at":"2019-06-04T13:08:36.000Z","size":18453,"stargazers_count":12,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"orphan","last_synced_at":"2025-03-31T08:39:10.797Z","etag":null,"topics":["adversarial-attacks","adversarial-networks","nlp","toxic-comment-classification","toxicity"],"latest_commit_sha":null,"homepage":"https://naacl2019.org/program/accepted/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/orgoro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-18T18:16:34.000Z","updated_at":"2024-03-27T11:40:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"6d87f00d-c884-4e7e-9089-6228e909f349","html_url":"https://github.com/orgoro/white-2-black","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orgoro%2Fwhite-2-black","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orgoro%2Fwhite-2-black/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orgoro%2Fwhite-2-black/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orgoro%2Fwhite-2-black/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/orgoro","download_url":"https://codeload.github.com/orgoro/white-2-black/tar.gz/refs/heads/orphan","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252849960,"owners_count":21813898,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adversarial-attacks","adversarial-networks","nlp","toxic-comment-classification","toxicity"],"created_at":"2024-10-03T16:27:08.740Z","updated_at":"2025-05-07T09:27:04.680Z","avatar_url":"https://github.com/orgoro.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.com/orgoro/white-2-black.svg?branch=orphan)](https://travis-ci.com/orgoro/white-2-black)\n\n# white2black\n\n## INTRODUCTION\nThe official code to reproduce results in the NACCL2019 paper:\n*White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks*\n\nThe code is divided into sub-packages:\n##### 1. [./Agents](./toxic_fool/agents) - _adversarial learned attck generators_ \n##### 2. [./Attacks](./toxic_fool/attacks) - _optimization attacks like hot flip_\n##### 3. [./Toxicity Classifier](./toxic_fool/toxicity_classifier) - _a classifier of sentences toxic/non toxic_\n##### 4. [./Data](./toxic_fool/data) - _data handling_\n##### 5. [./Resources](./toxic_fool/resources) - _resources for other categories_\n\n## ALGORITHM\nAs seen in the figure below we train a classifier to predict the class of toxic and non-toxic sentences.\nWe attack this model using a white-box algorithm called hot-flip and distill the knowledge into a second model - `DistFlip`.\n`DistFlip` is able to generate attacks in a black-box manner.\nThese attacks generalize well to the [Google Perspective](https://www.perspectiveapi.com/) algorithm (tested Jan 2019).\n![algorithm](/doc/algorithm.png)\n\n## DATA\nWe used the data from this [kaggle challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) \nby Jigsaw\n\nFor data flip using HotFlip+ you can download \nthe [data from Google Drive](https://drive.google.com/file/d/15zSclVYjFYtM1YXUxZbFUpmWS1MgHTx3/view?usp=sharing)\nand unzip it into: `./toxic_fool/resources/data`\n\n\n## RESULTS\nThe number of flips needed to change the label of a sentences using the original white box algorithm and ours (green)\n![survival rate](doc/survival_rate.png)\n\nSome example sentences:\n![examples](doc/examples.png)\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forgoro%2Fwhite-2-black","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Forgoro%2Fwhite-2-black","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forgoro%2Fwhite-2-black/lists"}