{"id":20161911,"url":"https://github.com/ailab-cvc/groupmixformer","last_synced_at":"2026-03-04T11:02:35.084Z","repository":{"id":208910360,"uuid":"720949494","full_name":"AILab-CVC/GroupMixFormer","owner":"AILab-CVC","description":"GroupMixAttention and GroupMixFormer","archived":false,"fork":false,"pushed_at":"2023-12-13T05:07:44.000Z","size":326,"stargazers_count":115,"open_issues_count":3,"forks_count":12,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-10T00:28:15.871Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AILab-CVC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-20T03:13:54.000Z","updated_at":"2025-01-22T06:55:47.000Z","dependencies_parsed_at":"2024-11-14T03:00:42.562Z","dependency_job_id":null,"html_url":"https://github.com/AILab-CVC/GroupMixFormer","commit_stats":null,"previous_names":["ailab-cvc/groupmixformer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AILab-CVC/GroupMixFormer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FGroupMixFormer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FGroupMixFormer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FGroupMixFormer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FGroupMixFormer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AILab-CVC","download_url":"https://codeload.github.com/AILab-CVC/GroupMixFormer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FGroupMixFormer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30078415,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T08:01:56.766Z","status":"ssl_error","status_checked_at":"2026-03-04T08:00:42.919Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T00:21:50.776Z","updated_at":"2026-03-04T11:02:35.061Z","avatar_url":"https://github.com/AILab-CVC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n\u003cdiv align=\"center\"\u003e\n\n### **[GroupMixFormer: Advancing Vision Transformers with Group-Mix Attention](xxx)**\n\n\n\n[Chongjian Ge](https://chongjiange.github.io/),\n[Xiaohan Ding](https://dingxiaohan.xyz),\n[Zhan Tong](https://scholar.google.com/citations?user=6FsgWBMAAAAJ\u0026hl=zh-CN),\n[Li Yuan](https://yuanli2333.github.io),\n[Jiangliu Wang](https://laura-wang.github.io),\n[Yibing Song](https://ybsong00.github.io),\n[Ping Luo](http://luoping.me/)\n\u003cbr\u003e\n\n\n\n\n\u003c/div\u003e\n\nOfficial PyTorch implementation of **GroupMixFormer** for the [paper](xx):\n\n\n \u003cimg src=\"./pics/teaser.png\" alt=\"Image Description\"\u003e\n\n\n\n## 🐱 Abstract\n\u003cb\u003eTL; DR: \u003c/b\u003e \n\u003cp style=\"text-align: justify;\"\u003e\nWe introduce GroupMixFormer, which employs Group-Mix Attention (GMA) as an advanced substitute for conventional self-attention. GMA is designed to concurrently capture correlations between tokens as well as between different groups of tokens, accommodating diverse group sizes.\n\u003c/p\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eFull abstract\u003c/b\u003e\u003c/summary\u003e\n\u003cp style=\"text-align: justify;\"\u003e\nVision Transformers (ViTs) have shown to enhance visual recognition through modeling long-range dependencies with multi-head self-attentions (MHSA), which is typically formulated as Query-Key-Value computation. However, the attention map generated from the Query and Key only captures token-to-token correlations at one single granularity. In this paper, we argue that self-attention should have a more comprehensive mechanism to capture correlations among tokens and groups (i.e., multiple adjacent tokens) for higher representational capacity. Thereby, we propose Group-Mix Attention (GMA) as an advanced replacement for traditional self-attention, which can simultaneously capture token-to-token, token-to-group, and group-to-group correlations with various group sizes. To this end, GMA splits the Query, Key, and Value into segments uniformly and performs different group aggregations to generate group proxies. The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value. Based on GMA, we introduce a powerful backbone, namely GroupMixFormer, which achieves state-of-the-art performance in image classification, object detection, and semantic segmentation with fewer parameters than existing models. For instance, GroupMixFormer-L (with 70.3M parameters and 384^2 input) attains 86.2% Top-1 accuracy on ImageNet-1K without external data, while GroupMixFormer-B (with 45.8M parameters) attains 51.2% mIoU on ADE20K.\n\u003c/p\u003e\n\u003c/details\u003e\n\n\n## 🚩 **Updates**\n\n### New features\n\n- ✅ Oct. 18, 2023. Release the training code.\n- ✅ Oct. 18, 2023. Release the inference code.\n- ✅ Oct. 18, 2023. Release the pretrained models for classification.\n\n### Catalog\n- [x] ImageNet-1K Training Code  \n- [ ] Downstream Transfer (Detection, Segmentation) Code\n\n\n## ⚙️ Usage\n### 1 - Installation\n\n- Create an new conda virtual environment\n```\nconda create -n groupmixformer python=3.8 -y\nconda activate groupmixformer\n```\n\n- Install [Pytorch](https://pytorch.org/)\u003e=1.8.0, [torchvision](https://pytorch.org/vision/stable/index.html)\u003e=0.9.0 following official instructions. For example:\n```\npip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html\n```\n\n- Clone this repo and install required packages:\n```\ngit clone https://github.com/AILab-CVC/GroupMixFormer.git\npip install timm==0.4.12 tensorboardX six tensorboard ipdb yacs tqdm fvcore\n```\n\n- The results in the paper are produced with `torch==1.8.0+cu111 torchvision==0.9.0+cu111 timm==0.4.12`.\n\n- Other dependicies: [mmdetection](https://github.com/open-mmlab/mmdetection) and [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) are optional for downstream transfer.\n\n### 2 - Data Preparation\nDownload and extract ImageNet train and val images from http://image-net.org/.\nThe directory structure is:\n\n```\n│path/to/imagenet/\n├──train/\n│  ├── n01440764\n│  │   ├── n01440764_10026.JPEG\n│  │   ├── n01440764_10027.JPEG\n│  │   ├── ......\n│  ├── ......\n├──val/\n│  ├── n01440764\n│  │   ├── ILSVRC2012_val_00000293.JPEG\n│  │   ├── ILSVRC2012_val_00002138.JPEG\n│  │   ├── ......\n│  ├── ......\n```\n\n### 3 - Trianing Scripts\nTo train GroupMixFormer-Small on ImageNet-1k on a single node with 8 gpus for 300 epochs, please run:\n```\npython3 -m torch.distributed.launch --nproc_per_node 8 --nnodes 1 --use_env train.py \\\n  --data-path \u003cYour data path\u003e \\\n  --batch-size 64 \\\n  --output \u003cYour target output path\u003e \\\n  --cfg ./configs/groupmixformer_small.yaml \\\n  --model-type groupmixformer \\\n  --model-file groupmixformer.py \\\n  --tag groupmixformer_small\n```\n\nor you can simply run the following script:\n```\nbash launch_scripts/run_train.sh\n```\n\nFor multi-node training, please refer to the code: [multi_machine_start.py](multi_machine_start.py)\n\n### 4 - Inference Scripts\nTo eval GroupMixFormer-Small on ImageNet-1k on a single node, please identify the path of pretrained weight and run:\n```\nCUDA_VISIBLE_DEVICES=1 OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node 1 --nnodes 1 --use_env test.py \\\n  --data-path \u003cYour data path\u003e \\\n  --batch-size 64 \\\n  --output \u003cYour target output path\u003e \\\n  --cfg ./configs/groupmixformer_small.yaml \\\n  --model-type groupmixformer \\\n  --model-file groupmixformer.py \\\n  --tag groupmixformer_small\n```\n\nor you can simply run the following script:\n```\nbash launch_scripts/run_eval.sh\n```\n\nThis should give \n```\n* Acc@1 83.400 Acc@5 96.464\n```\n\n\n## ⏬ Model Zoo\n\nWe provide GroupMixFormer models pretrained on ImageNet 2012. You can download the corresponding pretrained and move it to `./pretrained` folder.\n\n| name | resolution |acc@1 | #params | FLOPs | model - configs |\n|:---:|:---:|:---:|:---:| :---:|:---:|\n| GroupMixFormer-M | 224x224 | 79.6 | 5.7M | 1.4G | [model](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/rhettgee_connect_hku_hk/EuH7I7RGUSVHhD46RTECpesBqJVyACRzmBDwBXYWRxcDtg?e=Qvexbk) - [configs](configs/groupmixformer_miny.yaml) |\n| GroupMixFormer-T | 224x224 | 82.6 | 11.0M | 3.7G | [model](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/rhettgee_connect_hku_hk/EnRAzY3LalhGmqx91sEIqrUBLoa5ISS9kOw1ujNcOWSrzA?e=vkCUTZ) - [configs](configs/groupmixformer_tiny.yaml) |\n| GroupMixFormer-S | 224x224 | 83.4 | 22.4M | 5.2G | [model](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/rhettgee_connect_hku_hk/Em7lUESSPaFPowQotlsUi1sBjA9uVldOUi2mbqdF40Uktw?e=ExCTeU) - [configs](configs/groupmixformer_small.yaml) |\n| GroupMixFormer-B | 224x224  | 84.7 | 45.8M | 17.6G | [model](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/rhettgee_connect_hku_hk/Env1DBxPFZpMifAaVhHpKYgB3O4urE34o4b9_g4Jr-JfQQ?e=C6ed1c) - [configs](configs/groupmixformer_base.yaml) |\n| GroupMixFormer-L | 224x224 | 85.0 | 70.3M | 36.1G | [model](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/rhettgee_connect_hku_hk/EuH7I7RGUSVHhD46RTECpesBqJVyACRzmBDwBXYWRxcDtg?e=Qvexbk) - [configs](configs/groupmixformer_large.yaml) |\n\n\n\n## 🤗 Acknowledgement\nThis repository is built using the [timm](https://github.com/rwightman/pytorch-image-models) library, [DeiT](https://github.com/facebookresearch/deit) and [Swin](https://github.com/microsoft/Swin-Transformer) repositories.\n\n## 🗜️ License\nThis project is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information.\n\n## 📖 Citation\nIf you find this repository helpful, please consider citing:\n```\n@Article{xxx\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fgroupmixformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Failab-cvc%2Fgroupmixformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fgroupmixformer/lists"}