{"id":18253294,"url":"https://github.com/shaoyijia/cmg","last_synced_at":"2025-04-04T16:33:07.590Z","repository":{"id":40441417,"uuid":"507534906","full_name":"shaoyijia/CMG","owner":"shaoyijia","description":"Code for ECML-PKDD 2022 Paper --- CMG: A Class-Mixed Generation Approach to Out-of-Distribution Detection","archived":false,"fork":false,"pushed_at":"2022-10-12T00:17:31.000Z","size":342,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-20T16:15:19.361Z","etag":null,"topics":["anomaly-detection","data-generation","novelty-detection","out-of-distribution-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shaoyijia.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-26T09:49:27.000Z","updated_at":"2024-07-09T14:34:20.000Z","dependencies_parsed_at":"2023-01-19T21:19:21.839Z","dependency_job_id":null,"html_url":"https://github.com/shaoyijia/CMG","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaoyijia%2FCMG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaoyijia%2FCMG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaoyijia%2FCMG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shaoyijia%2FCMG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shaoyijia","download_url":"https://codeload.github.com/shaoyijia/CMG/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247209795,"owners_count":20901858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anomaly-detection","data-generation","novelty-detection","out-of-distribution-detection"],"created_at":"2024-11-05T10:05:52.960Z","updated_at":"2025-04-04T16:33:07.270Z","avatar_url":"https://github.com/shaoyijia.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CMG: A Class-Mixed Generation Approach to Out-of-Distribution Detection\n\nThis repository contains the code for our ECML'22 paper [CMG: A Class-Mixed Generation Approach to Out-of-Distribution\nDetection](https://2022.ecmlpkdd.org/wp-content/uploads/2022/09/sub_531.pdf) by Mengyu Wang*, Yijia Shao*, Haowei Lin, Wenpeng Hu, and Bing Liu.\n\n## Overview\n\nWe propose CMG (*Class-Mixed Generation*) for efficient out-of-distribution (OOD) detection. CMG uses CVAE to generate\npseudo-OOD training samples based on class-mixed information in the latent space. The pseudo-OOD data are used to\nfine-tune a classifier in a 2-stage manner. By using different loss functions, we propose two versions of the CMG\nsystem, CMG-softmax (CMG-s) and CMG-energy (CMG-e). The figure below illustrates our CMG system.\n\n\u003cimg src=\"figure/model.png\" alt=\"model\" width=\"600\" /\u003e\n\n## Requirements\n\n### Environments\n\nThe required packages are as follows:\n\n- python 3.5\n- torch 1.2\n- torchvision 0.4\n- CUDA 10.0\n- scikit-learn 0.22\n\n### Datasets\n\nFor Setting 1 - OOD Detection on the Same Dataset, this repository supports MNIST and CIFAR10 which can be directly\ndownloaded through `torchvision`.\n\nFor Setting 2 - OOD Detection on Different Datasets, this repository supports CIFAR10 as IND data and SVHN / LSUN /\ntinyImageNet / LSUN-FIX / ImageNet-FIX / CIFAR-100 as OOD data. SVHN and CIFAR 100 can be directly downloaded\nthrough `torchvision`. The remaining data have been processed by the CSI paper and you can download\nthem [here](https://github.com/alinlab/CSI).\n\nPlease download datasets to [./datasets](./datasets) and rename the file. See [./datasets/datasetsHelper.py](./datasets/datasetsHelper.py) and our paper for more details.\n\n## Training CMG\n\n### CMG Stage 1\n\nCMG Stage 1 involves IND classifier building and CVAE training.\n\nThe standard code is running with a single GPU and you can assign a specific GPU in the command line.\n\nFor more details, please view [utils.py](./utils.py).\n\n#### Train IND classifier\n\nTo train IND classifier on MNIST for Setting 1, run this command:\n\n```\npython -m train \\\n  --task 'same_dataset_mnist' \\\n  --partition {'partition1'/'partition2'/'partition3'/'partition4'/'partition5'} \\\n  --command 'train_classifier' \\\n  --device {the available GPU in your cluser, e.g., cuda:0} \\\n  --params-dict-name {checkpoint name, e.g., './ckpt/main_model_partition1.pkl'}\n```\n\nTo train IND classifier on CIFAR10 for Setting 1, run this command:\n\n```\npython -m train \\\n    --task 'same_dataset_cifar10' \\\n    --partition {'partition1'/'partition2'/'partition3'/'partition4'/'partition5'} \\\n    --command 'train_classifier' \\\n    --device {the available GPU in your cluser, e.g., cuda:0} \\\n    --params-dict-name {checkpoint name, e.g., './ckpt/main_model_partition1.pkl'}\n```\n\nTo train IND classifier on CIFAR10 for Setting 2, run this command:\n\n```\npython -m train \\\n    --task 'different_dataset' \\\n    --command 'train_classifier' \\\n    --device {the available GPU in your cluser, e.g., cuda:0} \\\n    --params-dict-name {checkpoint name, e.g., './ckpt/main_model_different_dataset.pkl'}\n```\n\n#### Train CVAE\n\nTo train CVAE on MNIST for Setting 1, run this command:\n\n```\npython -m train \\\n    --task 'same_dataset_mnist' \\\n    --partition {'partition1'/'partition2'/'partition3'/'partition4'/'partition5'} \\\n    --command 'train_cvae' \\\n    --device {the available GPU in your cluser, e.g., cuda:0} \\\n    --params-dict-name {checkpoint name, e.g., './ckpt/cvae_partition1.pkl'}\n```\n\nTo train CVAE on CIFAR10 for Setting 1, run this command:\n\n```\npython -m train \\\n    --task 'same_dataset_cifar10' \\\n    --partition {'partition1'/'partition2'/'partition3'/'partition4'/'partition5'} \\\n    --command 'train_cvae' \\\n    --device {the available GPU in your cluser, e.g., cuda:0} \\\n    --params-dict-name {checkpoint name, e.g., './ckpt/cvae_partition1.pkl'}\n```\n\nTo train CVAE on CIFAR10 for Setting 2, run this command:\n\n```\npython -m train \\\n    --task 'different_dataset' \\\n    --command 'train_cvae' \\\n    --device {the available GPU in your cluser, e.g., cuda:0} \\\n    --params-dict-name {checkpoint name, e.g., './ckpt/cvae_different_dataset.pkl'}\n```\n\n\n### CMG Stage 2 and Evaluation\n\nTo perform CMG Stage 2 and get the final result on MNIST for Setting 1, run this command:\n\n```\npython -m eval \\\n    --task 'same_dataset_mnist' \\\n    --partition {'partition1'/'partition2'/'partition3'/'partition4'/'partition5'} \\\n    --device {the available GPU in your cluser, e.g., cuda:0} \\\n    --params-dict-name {main model checkpoint name} \\\n    --params-dict-name2 {cvae checkpoint name} \\\n    --mode {'CMG-energy'/'CMG-softmax'}\n```\n\nTo perform CMG Stage 2 and get the final result on CIFAR10 for Setting 1, run this command:\n\n```\npython -m eval \\\n    --task 'same_dataset_cifar10' \\\n    --partition {'partition1'/'partition2'/'partition3'/'partition4'/'partition5'} \\\n    --device {the available GPU in your cluser, e.g., cuda:0} \\\n    --params-dict-name {main model checkpoint name} \\\n    --params-dict-name2 {cvae checkpoint name} \\\n    --mode {'CMG-energy'/'CMG-softmax'}\n```\n\nTo perform CMG Stage 2 and get the final result on CIFAR10 for Setting 2, run this command:\n\n```\npython -m eval \\\n    --task 'different_dataset' \\\n    --ood-dataset {'SVHN'/'LSUN'/'LSUN-FIX'/'tinyImageNet'/'ImageNet-FIX'/'CIFAR100'}\n    --device {the available GPU in your cluser, e.g., cuda:0} \\\n    --params-dict-name {main model checkpoint name} \\\n    --params-dict-name2 {cvae checkpoint name} \\\n    --mode {'CMG-energy'/'CMG-softmax'}\n```\n\n## Apply CMG to CSI\n\nCMG is a training paradigm orthogonal to existing OOD detection models and can enhance existing systems to further\nimprove their performance (please see our paper for details).\n\nHere we also provide codes to reproduce the new SOTA by applying CMG to [CSI](https://github.com/alinlab/CSI).\nSee [./CSI+CMG](./CSI+CMG).\n\n## Acknowledgements\n\nWe thank [CSI](https://github.com/alinlab/CSI) for providing downloaded links for their processed data and our \"CSI+CMG\"\ncode is also based on their implementation.\n\n## Citation\n\nPlease cite our paper if you use this code of parts of it:\n```\n@article{wangcmg,\n  title={CMG: A Class-Mixed Generation Approach to Out-of-Distribution Detection},\n  author={Wang, Mengyu and Shao, Yijia and Lin, Haowei and Hu, Wenpeng and Liu, Bing}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshaoyijia%2Fcmg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshaoyijia%2Fcmg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshaoyijia%2Fcmg/lists"}