{"id":23335857,"url":"https://github.com/SproutNan/AI-Safety_SCAV","last_synced_at":"2025-08-23T04:32:54.750Z","repository":{"id":259916254,"uuid":"865230249","full_name":"SproutNan/AI-Safety_SCAV","owner":"SproutNan","description":"This is the code repository for \"Uncovering Safety Risks of Large Language Models through Concept Activation Vector\"","archived":false,"fork":false,"pushed_at":"2024-11-14T09:30:34.000Z","size":63,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-14T10:23:05.640Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SproutNan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-30T07:43:19.000Z","updated_at":"2024-11-14T09:30:37.000Z","dependencies_parsed_at":"2024-10-28T18:04:27.999Z","dependency_job_id":null,"html_url":"https://github.com/SproutNan/AI-Safety_SCAV","commit_stats":null,"previous_names":["sproutnan/ai-safety_scav"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SproutNan%2FAI-Safety_SCAV","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SproutNan%2FAI-Safety_SCAV/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SproutNan%2FAI-Safety_SCAV/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SproutNan%2FAI-Safety_SCAV/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SproutNan","download_url":"https://codeload.github.com/SproutNan/AI-Safety_SCAV/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230665554,"owners_count":18261516,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-21T02:01:32.635Z","updated_at":"2024-12-21T02:01:36.477Z","avatar_url":"https://github.com/SproutNan.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"\u003cdiv align=\"center\"\u003e\n\n# AI-Safety SCAV\n\n\u003cp\u003e\n    \u003ca href=\"https://arxiv.org/abs/2404.12038\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/cs.CL-2404.12038-b31b1b?logo=arxiv\u0026logoColor=red\" alt=\"arXiv\"/\u003e\n    \u003c/a\u003e\n    \u003ca href=\"mailto:xitingwang@ruc.edu.cn\" target=\"_blank\"\u003e\n        \u003cimg alt=\"email\" src=\"https://img.shields.io/badge/📮 enquiry-blue\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n\u003c/div\u003e\n\nThis is the code for our NeurIPS 2024 paper *\u003cstrong\u003eUncovering Safety Risks of Large Language Models through Concept Activation Vector\u003c/strong\u003e*.\n\n## News\n\n- [2024-11-17] The code for visualization of embedding-level attack is released.\n- [2024-10-28] The code for prompt-level attack is released.\n- [2024-09-30] The code for embedding-level attack is released.\n- [2024-04-18] The paper is available on arXiv.\n\n## Citation\n\nIf you find this work helpful, please consider citing our paper:\n\n```bibtex\n@inproceedings{Xu2024uncovering,\n  title  = {Uncovering Safety Risks of Large Language Models through Concept Activation Vector},\n  author = {Zhihao Xu and Ruixuan Huang and Changyu Chen and Xiting Wang},\n  year   = {2024},\n  url    = {https://openreview.net/forum?id=Uymv9ThB50}\n}\n```\n\n## Disclaimer\n\nThis project may lead to attacks on LLMs and is intended for academic research use only. It is prohibited for illegal purposes. The authors have shared the vulnerabilities with OpenAI and Microsoft.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSproutNan%2FAI-Safety_SCAV","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSproutNan%2FAI-Safety_SCAV","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSproutNan%2FAI-Safety_SCAV/lists"}