{"id":18881627,"url":"https://github.com/ecrows/cgtext-detection-adv","last_synced_at":"2026-02-21T04:30:20.437Z","repository":{"id":124593373,"uuid":"421626108","full_name":"ecrows/cgtext-detection-adv","owner":"ecrows","description":"Adversarial examples against detection of computer-generated text.","archived":false,"fork":false,"pushed_at":"2022-07-30T15:12:51.000Z","size":56,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-31T03:27:53.843Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ecrows.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-27T00:31:03.000Z","updated_at":"2024-03-07T15:17:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"9a0fe45f-ab99-4344-83e3-362163d1b11a","html_url":"https://github.com/ecrows/cgtext-detection-adv","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ecrows%2Fcgtext-detection-adv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ecrows%2Fcgtext-detection-adv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ecrows%2Fcgtext-detection-adv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ecrows%2Fcgtext-detection-adv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ecrows","download_url":"https://codeload.github.com/ecrows/cgtext-detection-adv/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239850446,"owners_count":19707348,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T06:50:32.308Z","updated_at":"2026-02-21T04:30:20.408Z","avatar_url":"https://github.com/ecrows.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Adversarial Text Attacks Against Detection of Computer-Generated Text\n\nCode for loading computer-generated text datasets, training text classification models on these datasets, and evaluating adversarial text attacks against them.\n\nResults published in the paper \"Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers\".\n\n## Install Requirements\n\n`pip install -r requirements.txt`\n\n\n## Data Dependencies\n\nThese experiments rely on several data sources and machine learning models to operate.  You must download these datasets and retrieve these models prior to running the code.\n\nYou'll need to run the following code from an environment where you have access to CUDA 11.0.  For example, a server running Jupyter Notebook with CUDA 11\n\nDownload the required data into \"data\" path.\n\n### GPT-2\n\n```\ngit clone git@github.com:openai/gpt-2-output-dataset.git\ncd gpt-2-output-dataset\npython download_dataset.py\n```\n\n### GPT-3\n\nDownload the file \"175b_samples.jsonl\" from the repo https://github.com/openai/gpt-3 as \"gpt3_175b_samples.jsonl\"\n\n### Generate Datasets\n\nRun the notebook \"Construct_Datasets.ipynb\"\n\nThis will convert the raw GPT-2 and GPT-3 test datasets into a format that is compatible with the Grover detection model and place them under \"classification_data\".  These same output datasets will be used for evaluating the statistical SVM models.\n\n### Phrasal Feature Datasets\n\nYou'll need the following data files.\n\nidioms.txt\ncliche500.txt\narchaisms.txt\n\n### Download spacy model\n`python -m spacy download en`\n\n### Stanza\nBecause the coreference resolution is busted.\npip install git@github.com:stanfordnlp/stanza.git@dev\n\n\n\n### MAUVE\n\n`pip install mauve-text`\n\n## Running the experiments\n\nRun through the notebooks in order.  Intermediate data files can be used to avoid re-running sections (use data loading commands as appropriate).\n\n### Paper Citation (to be published in IJCNN 2022)\n\n```\n@article{crothers2022adversarial,\n  title={Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers},\n  author={Crothers, Evan and Japkowicz, Nathalie and Viktor, Herna and Branco, Paula},\n  journal={arXiv preprint arXiv:2203.07983},\n  year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fecrows%2Fcgtext-detection-adv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fecrows%2Fcgtext-detection-adv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fecrows%2Fcgtext-detection-adv/lists"}