{"id":13709102,"url":"https://github.com/google-research/mixmatch","last_synced_at":"2025-09-28T22:31:22.559Z","repository":{"id":35575698,"uuid":"186900783","full_name":"google-research/mixmatch","owner":"google-research","description":null,"archived":true,"fork":false,"pushed_at":"2023-03-24T22:14:44.000Z","size":925,"stargazers_count":1134,"open_issues_count":2,"forks_count":162,"subscribers_count":25,"default_branch":"master","last_synced_at":"2024-09-27T03:40:55.326Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-05-15T20:49:28.000Z","updated_at":"2024-09-23T12:22:17.000Z","dependencies_parsed_at":"2022-07-27T22:09:38.679Z","dependency_job_id":"86d584dc-880b-4ab8-8f78-a08666b82eb2","html_url":"https://github.com/google-research/mixmatch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fmixmatch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fmixmatch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fmixmatch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fmixmatch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/mixmatch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234569704,"owners_count":18854133,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T23:00:35.919Z","updated_at":"2025-09-28T22:31:22.187Z","avatar_url":"https://github.com/google-research.png","language":"Python","funding_links":[],"categories":["Self-Supervised Learning","Python","其他_机器视觉"],"sub_categories":["Semi-Supervised Learning","网络服务_其他"],"readme":"# MixMatch - A Holistic Approach to Semi-Supervised Learning\n\nCode for the paper: \"[MixMatch - A Holistic Approach to Semi-Supervised Learning](https://arxiv.org/abs/1905.02249)\" by David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver and Colin Raffel.\n\nThis is not an officially supported Google product.\n\n## Setup\n\n**Important**: `ML_DATA` is a shell environment variable that should point to the location where the datasets are installed. See the *Install datasets* section for more details.\n\n### Install dependencies\n\n```bash\nsudo apt install python3-dev python3-virtualenv python3-tk imagemagick\nvirtualenv -p python3 --system-site-packages env3\n. env3/bin/activate\npip install -r requirements.txt\n```\n\n### Install datasets\n\n```bash\nexport ML_DATA=\"path to where you want the datasets saved\"\n# Download datasets\nCUDA_VISIBLE_DEVICES= ./scripts/create_datasets.py\ncp $ML_DATA/svhn-test.tfrecord $ML_DATA/svhn_noextra-test.tfrecord\n\n# Create semi-supervised subsets\nfor seed in 1 2 3 4 5; do\n    for size in 250 500 1000 2000 4000; do\n        CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=$size $ML_DATA/SSL/svhn $ML_DATA/svhn-train.tfrecord $ML_DATA/svhn-extra.tfrecord \u0026\n        CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=$size $ML_DATA/SSL/svhn_noextra $ML_DATA/svhn-train.tfrecord \u0026\n        CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=$size $ML_DATA/SSL/cifar10 $ML_DATA/cifar10-train.tfrecord \u0026\n    done\n    CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=10000 $ML_DATA/SSL/cifar100 $ML_DATA/cifar100-train.tfrecord \u0026\n    CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=1000 $ML_DATA/SSL/stl10 $ML_DATA/stl10-train.tfrecord $ML_DATA/stl10-unlabeled.tfrecord \u0026\n    wait\ndone\nCUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=1 --size=5000 $ML_DATA/SSL/stl10 $ML_DATA/stl10-train.tfrecord $ML_DATA/stl10-unlabeled.tfrecord\n```\n\n#### Install privacy datasets\n\n```bash\nCUDA_VISIBLE_DEVICES= ./privacy/scripts/create_datasets.py\n\nfor size in 27 38 77 156 355 671 867; do\nCUDA_VISIBLE_DEVICES= ./privacy/scripts/create_split.py --size=$size $ML_DATA/SSL/svhn500 $ML_DATA/svhn500-train.tfrecord \u0026\ndone; wait\n\nfor size in 96 185 353 719 1415 2631 3523; do\nCUDA_VISIBLE_DEVICES= ./privacy/scripts/create_split.py --size=$size $ML_DATA/SSL/svhn300 $ML_DATA/svhn300-train.tfrecord \u0026\ndone; wait\n\nfor size in 56 81 109 138 266 525 1059 2171 4029 5371; do\nCUDA_VISIBLE_DEVICES= ./privacy/scripts/create_split.py --size=$size $ML_DATA/SSL/svhn200 $ML_DATA/svhn200-train.tfrecord \u0026\ndone; wait\n\nfor size in 145 286 558 1082 2172 4078 5488; do\nCUDA_VISIBLE_DEVICES= ./privacy/scripts/create_split.py --size=$size $ML_DATA/SSL/svhn200s150 $ML_DATA/svhn200s150-train.tfrecord \u0026\ndone; wait\n```\n\n\n## Running\n\n### Setup\n\nAll commands must be ran from the project root. The following environment variables must be defined:\n```bash\nexport ML_DATA=\"path to where you want the datasets saved\"\nexport PYTHONPATH=$PYTHONPATH:.\n```\n\n### Example\n\nFor example, training a mixmatch with 32 filters on cifar10 shuffled with `seed=3`, 250 labeled samples and 5000\nvalidation samples:\n```bash\nCUDA_VISIBLE_DEVICES=0 python mixmatch.py --filters=32 --dataset=cifar10.3@250-5000 --w_match=75 --beta=0.75\n```\n\nAvailable labelled sizes are 250, 500, 1000, 2000, 4000.\nFor validation, available sizes are 1, 5000 (and 500 for STL10).\nPossible shuffling seeds are 1, 2, 3, 4, 5 and 0 for no shuffling (0 is not used in practiced since data requires to be\nshuffled for gradient descent to work properly).\n\n### Valid dataset names\n```bash\nfor dataset in cifar10 svhn svhn_noextra; do\nfor seed in 1 2 3 4 5; do\nfor valid in 1 5000; do\nfor size in 250 500 1000 2000 4000; do\n    echo \"${dataset}.${seed}@${size}-${valid}\"\ndone; done; done; done\n\nfor seed in 1 2 3 4 5; do\nfor valid in 1 5000; do\n    echo \"cifar100.${seed}@10000-${valid}\"\ndone; done\n\nfor seed in 1 2 3 4 5; do\nfor valid in 1 500; do\n    echo \"stl10.${seed}@1000-${valid}\"\ndone; done\necho \"stl10.1@5000-1\"\n```\n\n\n## Monitoring training progress\n\nYou can point tensorboard to the training folder (by default it is `--train_dir=./experiments`) to monitor the training\nprocess:\n\n```bash\ntensorboard.sh --port 6007 --logdir experiments/\n```\n\n## Checkpoint accuracy\n\nWe compute the median accuracy of the last 20 checkpoints in the paper, this is done through this code:\n\n```bash\n# Following the previous example in which we trained cifar10.3@250-5000, extracting accuracy:\n./scripts/extract_accuracy.py experiments/compare/cifar10.3@250-5000/MixMatch_archresnet_batch64_beta0.75_ema0.999_filters32_lr0.002_nclass10_repeat4_scales3_w_match75.0_wd0.02\n# The command above will create a stats/accuracy.json file in the model folder.\n# The format is JSON so you can either see its content as a text file or process it to your liking.\n```\n\n## Reproducing tables from the paper\n\nCheck the contents of the `runs/*.sh` files, these will give you the commands (and the hyper-parameters) to reproduce the results from the paper.\n\n## Citing this work\n\n```\n@article{berthelot2019mixmatch,\n  title={MixMatch: A Holistic Approach to Semi-Supervised Learning},\n  author={Berthelot, David and Carlini, Nicholas and Goodfellow, Ian and Papernot, Nicolas and Oliver, Avital and Raffel, Colin},\n  journal={arXiv preprint arXiv:1905.02249},\n  year={2019}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fmixmatch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Fmixmatch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fmixmatch/lists"}