{"id":13709125,"url":"https://github.com/google-research/remixmatch","last_synced_at":"2025-05-06T15:32:19.529Z","repository":{"id":39739704,"uuid":"233144972","full_name":"google-research/remixmatch","owner":"google-research","description":null,"archived":true,"fork":false,"pushed_at":"2022-11-21T21:56:03.000Z","size":70,"stargazers_count":130,"open_issues_count":0,"forks_count":21,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-11-13T19:39:40.118Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-10T23:05:32.000Z","updated_at":"2024-10-10T00:34:16.000Z","dependencies_parsed_at":"2022-08-29T02:40:17.686Z","dependency_job_id":null,"html_url":"https://github.com/google-research/remixmatch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fremixmatch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fremixmatch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fremixmatch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fremixmatch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/remixmatch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252713016,"owners_count":21792410,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T23:00:36.122Z","updated_at":"2025-05-06T15:32:15.287Z","avatar_url":"https://github.com/google-research.png","language":"Python","funding_links":[],"categories":["Self-Supervised Learning","其他_机器视觉"],"sub_categories":["Semi-Supervised Learning","网络服务_其他"],"readme":"# ReMixMatch\n\nCode for the paper: \"[ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring](https://arxiv.org/abs/1911.09785)\" by David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel.\n\n\nThis is not an officially supported Google product.\n\n## Setup\n\n**Important**: `ML_DATA` is a shell environment variable that should point to the location where the datasets are installed. See the *Install datasets* section for more details.\n\n### Install dependencies\n\n```bash\nsudo apt install python3-dev python3-virtualenv python3-tk imagemagick\nvirtualenv -p python3 --system-site-packages env3\n. env3/bin/activate\npip install -r requirements.txt\n```\n\n### Install datasets\n\n```bash\nexport ML_DATA=\"path to where you want the datasets saved\"\n# Download datasets\nCUDA_VISIBLE_DEVICES= ./scripts/create_datasets.py\ncp $ML_DATA/svhn-test.tfrecord $ML_DATA/svhn_noextra-test.tfrecord\n\n# Create unlabeled datasets\nCUDA_VISIBLE_DEVICES= scripts/create_unlabeled.py $ML_DATA/SSL2/svhn $ML_DATA/svhn-train.tfrecord $ML_DATA/svhn-extra.tfrecord \u0026\nCUDA_VISIBLE_DEVICES= scripts/create_unlabeled.py $ML_DATA/SSL2/svhn_noextra $ML_DATA/svhn-train.tfrecord \u0026\nCUDA_VISIBLE_DEVICES= scripts/create_unlabeled.py $ML_DATA/SSL2/cifar10 $ML_DATA/cifar10-train.tfrecord \u0026\nCUDA_VISIBLE_DEVICES= scripts/create_unlabeled.py $ML_DATA/SSL2/cifar100 $ML_DATA/cifar100-train.tfrecord \u0026\nCUDA_VISIBLE_DEVICES= scripts/create_unlabeled.py $ML_DATA/SSL2/stl10 $ML_DATA/stl10-train.tfrecord $ML_DATA/stl10-unlabeled.tfrecord \u0026\nwait\n\n# Create semi-supervised subsets\nfor seed in 0 1 2 3 4 5; do\n    for size in 40 250 1000 4000; do\n        CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=$size $ML_DATA/SSL2/svhn $ML_DATA/svhn-train.tfrecord $ML_DATA/svhn-extra.tfrecord \u0026\n        CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=$size $ML_DATA/SSL2/svhn_noextra $ML_DATA/svhn-train.tfrecord \u0026\n        CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=$size $ML_DATA/SSL2/cifar10 $ML_DATA/cifar10-train.tfrecord \u0026\n    done\n    CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=10000 $ML_DATA/SSL2/cifar100 $ML_DATA/cifar100-train.tfrecord \u0026\n    CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=2500 $ML_DATA/SSL2/cifar100 $ML_DATA/cifar100-train.tfrecord \u0026\n    CUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=$seed --size=1000 $ML_DATA/SSL2/stl10 $ML_DATA/stl10-train.tfrecord $ML_DATA/stl10-unlabeled.tfrecord \u0026\n    wait\ndone\nCUDA_VISIBLE_DEVICES= scripts/create_split.py --seed=1 --size=5000 $ML_DATA/SSL2/stl10 $ML_DATA/stl10-train.tfrecord $ML_DATA/stl10-unlabeled.tfrecord\n```\n\n## Running\n\n### Setup\n\nAll commands must be ran from the project root. The following environment variables must be defined:\n```bash\nexport ML_DATA=\"path to where you want the datasets saved\"\nexport PYTHONPATH=$PYTHONPATH:.\n```\n\n### Example\n\nFor example, training a remixmatch with 32 filters and 4 augmentations on cifar10 shuffled with `seed=3`, 250 labeled samples and 5000\nvalidation samples:\n```bash\nCUDA_VISIBLE_DEVICES=0 python cta/cta_remixmatch.py --filters=32 --K=4 --dataset=cifar10.3@250-5000 --w_match=1.5 --beta=0.75 --train_dir ./experiments/remixmatch\n```\n\nAvailable labelled sizes are 40, 100, 250, 1000, 4000.\nFor validation, available sizes are 1, 5000.\nPossible shuffling seeds are 1, 2, 3, 4, 5 and 0 for no shuffling (0 is not used in practiced since data requires to be\nshuffled for gradient descent to work properly).\n\n\n#### Multi-GPU training\nJust pass more GPUs and remixmatch automatically scales to them, here we assign GPUs 4-7 to the program:\n```bash\nCUDA_VISIBLE_DEVICES=4,5,6,7 python cta/cta_remixmatch.py --filters=32 --K=4 --dataset=cifar10.3@250-5000 --w_match=1.5 --beta=0.75 --train_dir ./experiments/remixmatch\n```\n\n### Valid dataset names\n```bash\nfor dataset in cifar10 svhn svhn_noextra; do\nfor seed in 0 1 2 3 4 5; do\nfor valid in 1 5000; do\nfor size in 40 250 1000 4000; do\n    echo \"${dataset}.${seed}@${size}-${valid}\"\ndone; done; done; done\n\nfor seed in 0 1 2 3 4 5; do\nfor valid in 1 5000; do\n    echo \"cifar100.${seed}@10000-${valid}\"\ndone; done\n\nfor seed in 1 2 3 4 5; do\nfor valid in 1 5000; do\n    echo \"stl10.${seed}@1000-${valid}\"\ndone; done\necho \"stl10.1@5000-1\"\n```\n\n\n## Monitoring training progress\n\nYou can point tensorboard to the training folder (by default it is `--train_dir=./experiments`) to monitor the training\nprocess:\n\n```bash\ntensorboard.sh --port 6007 --logdir experiments\n```\n\n## Checkpoint accuracy\n\nWe compute the median accuracy of the last 20 checkpoints in the paper, this is done through this code:\n\n```bash\n# Following the previous example in which we trained cifar10.3@250-5000, extracting accuracy:\n./scripts/extract_accuracy.py experiments/cifar10.d.d.d.3\\@250-5000/CTAugment_depth2_th0.80_decay0.990/CTAReMixMatch_K4_archresnet_batch64_beta0.75_filters32_lr0.002_nclass10_redux1st_repeat4_scales3_use_dmTrue_use_xeTrue_w_kl0.5_w_match1.5_w_rot0.5_warmup_kimg1024_wd0.02/\n# The command above will create a stats/accuracy.json file in the model folder.\n# The format is JSON so you can either see its content as a text file or process it to your liking.\n```\n\n## Reproducing tables from the paper\n\nCheck the contents of the `runs/*.sh` files, these will give you the commands (and the hyper-parameters) to reproduce the results from the paper.\n\n## Citing this work\n\n```\n@article{berthelot2019remixmatch,\n    title={ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring},\n    author={David Berthelot and Nicholas Carlini and Ekin D. Cubuk and Alex Kurakin and Kihyuk Sohn and Han Zhang and Colin Raffel},\n    journal={arXiv preprint arXiv:1911.09785},\n    year={2019},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fremixmatch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Fremixmatch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fremixmatch/lists"}