{"id":13690018,"url":"https://github.com/stanfordnlp/mac-network","last_synced_at":"2025-05-13T15:39:11.674Z","repository":{"id":44840129,"uuid":"129476685","full_name":"stanfordnlp/mac-network","owner":"stanfordnlp","description":"Implementation for the paper \"Compositional Attention Networks for Machine Reasoning\" (Hudson and Manning, ICLR 2018)","archived":false,"fork":false,"pushed_at":"2021-07-10T11:35:03.000Z","size":210,"stargazers_count":501,"open_issues_count":16,"forks_count":119,"subscribers_count":30,"default_branch":"master","last_synced_at":"2025-03-29T16:05:41.518Z","etag":null,"topics":["attention","clevr","compositional-attention-networks","machine-reasoning","question-answering","tensorflow","vqa"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stanfordnlp.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-04-14T03:14:11.000Z","updated_at":"2025-03-20T09:56:14.000Z","dependencies_parsed_at":"2022-08-12T11:40:14.235Z","dependency_job_id":null,"html_url":"https://github.com/stanfordnlp/mac-network","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanfordnlp%2Fmac-network","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanfordnlp%2Fmac-network/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanfordnlp%2Fmac-network/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stanfordnlp%2Fmac-network/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stanfordnlp","download_url":"https://codeload.github.com/stanfordnlp/mac-network/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247369952,"owners_count":20927928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","clevr","compositional-attention-networks","machine-reasoning","question-answering","tensorflow","vqa"],"created_at":"2024-08-02T16:00:41.843Z","updated_at":"2025-04-05T17:05:50.519Z","avatar_url":"https://github.com/stanfordnlp.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["Natural Language for Visual Reasoning"],"readme":"# Compostional Attention Networks for Real-World Reasoning \n\u003cp align=\"center\"\u003e\n  \u003cb\u003e\u003ca href=\"https://cs.stanford.edu/~dorarad\"\u003eDrew A. Hudson\u003c/a\u003e \u0026 \u003ca href=\"https://nlp.stanford.edu/~manning/\"\u003eChristopher D. Manning\u003c/a\u003e\u003c/b\u003e\u003c/span\u003e\n\u003c/p\u003e\n\n***Please note: We have updated the [GQA challenge](https://visualreasoning.net/challenge.html) deadline to be May 15. Best of Luck! :)***\n\nThis is the implementation of [Compositional Attention Networks for Machine Reasoning](https://arxiv.org/pdf/1803.03067.pdf) (ICLR 2018) on two visual reasoning datasets: [CLEVR dataset](http://cs.stanford.edu/people/jcjohns/clevr/) and the ***New*** [***GQA dataset***](https://visualreasoning.net) ([CVPR 2019](https://visualreasoning.net/gqaPaper.pdf)). We propose a fully differentiable model that learns to perform multi-step reasoning.\nSee our [website](https://cs.stanford.edu/people/dorarad/mac/) and [blogpost](https://cs.stanford.edu/people/dorarad/mac/blog.html) for more information about the model!\n\nIn particular, the implementation includes the MAC cell at [`mac_cell.py`](mac_cell.py). The code supports the standard cell as presented in the paper as well as additional extensions and variants. Run `python main.py -h` or see [`config.py`](config.py) for the complete list of options.\n\nThe adaptation of MAC as well as several baselines for the GQA dataset are located at the **GQA** branch.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://cs.stanford.edu/people/dorarad/mac/imgs/cell.png\" style=\"float:left\" width=\"260px\"\u003e\n  \u003cimg src=\"https://cs.stanford.edu/people/dorarad/mac/imgs/visual.png\" style=\"float:left\" width=\"310px\"\u003e\n  \u003cimg src=\"https://cs.stanford.edu/people/dorarad/visual3.png\" style=\"float:left\" width=\"280px\"\u003e\n\u003c/div\u003e\n\n## Bibtex\nFor MAC:\n```bibtex\n@inproceedings{hudson2018compositional,\n  title={Compositional Attention Networks for Machine Reasoning},\n  author={Hudson, Drew A and Manning, Christopher D},\n  journal={International Conference on Learning Representations (ICLR)},\n  year={2018}\n}\n```\n\nFor the GQA dataset:\n```bibtex\n@article{hudson2018gqa,\n  title={GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering},\n  author={Hudson, Drew A and Manning, Christopher D},\n  journal={Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year={2019}\n}\n```\n\n## Requirements\n- Tensorflow (originally has been developed with 1.3 but should work for later versions as well).\n- We have performed experiments on Maxwell Titan X GPU. We assume 12GB of GPU memory.\n- See [`requirements.txt`](requirements.txt) for the required python packages and run `pip install -r requirements.txt` to install them.\n\n## Pre-processing\nBefore training the model, we first have to download the CLEVR dataset and extract features for the images:\n\n### Dataset\nTo download and unpack the data, run the following commands:\n```bash\nwget https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip\nunzip CLEVR_v1.0.zip\nmv CLEVR_v1.0 CLEVR_v1\nmkdir CLEVR_v1/data\nmv CLEVR_v1/questions/* CLEVR_v1/data/\n```\nThe final command moves the dataset questions into the `data` directory, where we will put all the data files we use during training.\n\n### Feature extraction\nExtract ResNet-101 features for the CLEVR train, val, and test images with the following commands:\n\n```bash\npython extract_features.py --input_image_dir CLEVR_v1/images/train --output_h5_file CLEVR_v1/data/train.h5 --batch_size 32\npython extract_features.py --input_image_dir CLEVR_v1/images/val --output_h5_file CLEVR_v1/data/val.h5 --batch_size 32\npython extract_features.py --input_image_dir CLEVR_v1/images/test --output_h5_file CLEVR_v1/data/test.h5 --batch_size 32\n```\n\n## Training \nTo train the model, run the following command:\n```bash\npython main.py --expName \"clevrExperiment\" --train --testedNum 10000 --epochs 25 --netLength 4 @configs/args.txt\n```\n\nFirst, the program preprocesses the CLEVR questions. It tokenizes them and maps them to integers to prepare them for the network. It then stores a JSON with that information about them as well as word-to-integer dictionaries in the `./CLEVR_v1/data` directory.\n\nThen, the program trains the model. Weights are saved by default to `./weights/{expName}` and statistics about the training are collected in `./results/{expName}`, where `expName` is the name we choose to give to the current experiment. \n\n### Notes\n- The number of examples used for training and evaluation can be set by `--trainedNum` and `--testedNum` respectively.\n- You can use the `-r` flag to restore and continue training a previously pre-trained model. \n- We recommend you to try out varying the number of MAC cells used in the network through the `--netLength` option to explore different lengths of reasoning processes.\n- Good lengths for CLEVR are in the range of 4-16 (using more cells tends to converge faster and achieves a bit higher accuracy, while lower number of cells usually results in more easily interpretable attention maps). \n\n### Model variants\nWe have explored several variants of our model. We provide a few examples in `configs/args2-4.txt`. For instance, you can run the first by: \n```bash\npython main.py --expName \"experiment1\" --train --testedNum 10000 --epochs 40 --netLength 6 @configs/args2.txt\n```\n- [`args2`](configs/args2.txt) uses a non-recurrent variant of the control unit that converges faster.\n- [`args3`](configs/args3.txt) incorporates self-attention into the write unit.\n- [`args4`](configs/args4.txt) adds control-based gating over the memory.\n\nSee [`config.py`](config.py) for further available options (Note that some of them are still in an experimental stage).\n\n## Evalutation\nTo evaluate the trained model, and get predictions and attention maps, run the following: \n```bash\npython main.py --expName \"clevrExperiment\" --finalTest --testedNum 10000 --netLength 16 -r --getPreds --getAtt @configs/args.txt\n```\nThe command will restore the model we have trained, and evaluate it on the validation set. JSON files with predictions and the attention distributions resulted by running the model are saved by default to `./preds/{expName}`.\n\n- In case you are interested in getting attention maps (`--getAtt`), and to avoid having large prediction files, we advise you to limit the number of examples evaluated to 5,000-20,000.\n\n## Visualization\nAfter we evaluate the model with the command above, we can visualize the attention maps generated by running:\n```bash\npython visualization.py --expName \"clevrExperiment\" --tier val \n```\n(Tier can be set to `train` or `test` as well). The script supports filtering of the visualized questions by various ways. See [`visualization.py`](visualization.py) for further details.\n\nTo get more interpretable visualizations, it is highly recommended to reduce the number of cells to 4-8 (`--netLength`). Using more cells allows the network to learn more effective ways to approach the task but these tend to be less interpretable compared to a shorter networks (with less cells).  \n\nOptionally, to make the image attention maps look a little bit nicer, you can do the following (using [imagemagick](https://www.imagemagick.org)):\n```\nfor x in preds/clevrExperiment/*Img*.png; do magick convert $x -brightness-contrast 20x35 $x; done;\n```\n\nThank you for your interest in our model! Please contact me at dorarad@cs.stanford.edu for any questions, comments, or suggestions! :-)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstanfordnlp%2Fmac-network","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstanfordnlp%2Fmac-network","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstanfordnlp%2Fmac-network/lists"}