{"id":13934825,"url":"https://github.com/davidmascharka/tbd-nets","last_synced_at":"2025-04-06T16:15:42.512Z","repository":{"id":30535084,"uuid":"125099930","full_name":"davidmascharka/tbd-nets","owner":"davidmascharka","description":"PyTorch implementation of \"Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning\"","archived":false,"fork":false,"pushed_at":"2021-12-07T20:02:33.000Z","size":22909,"stargazers_count":348,"open_issues_count":3,"forks_count":74,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-03-30T15:09:44.898Z","etag":null,"topics":["deep-learning","machine-learning","neural-networks","pytorch","visual-question-answering","visualization","vqa"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/1803.05268","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidmascharka.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-13T18:54:02.000Z","updated_at":"2025-03-08T08:18:25.000Z","dependencies_parsed_at":"2022-09-25T04:40:12.173Z","dependency_job_id":null,"html_url":"https://github.com/davidmascharka/tbd-nets","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidmascharka%2Ftbd-nets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidmascharka%2Ftbd-nets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidmascharka%2Ftbd-nets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidmascharka%2Ftbd-nets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidmascharka","download_url":"https://codeload.github.com/davidmascharka/tbd-nets/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247509238,"owners_count":20950232,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","machine-learning","neural-networks","pytorch","visual-question-answering","visualization","vqa"],"created_at":"2024-08-07T23:01:15.648Z","updated_at":"2025-04-06T16:15:42.488Z","avatar_url":"https://github.com/davidmascharka.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook","Paper implementations｜论文实现","Paper implementations"],"sub_categories":["Other libraries｜其他库:","Other libraries:"],"readme":"# Transparency-by-Design networks (TbD-nets)\n[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/davidmascharka/tbd-nets/binder?filepath=full-vqa-example.ipynb)\n![Python version support](https://img.shields.io/badge/python-3.5%20%203.6-blue.svg)\n![PyTorch version support](https://img.shields.io/badge/pytorch-0.2%200.3%200.4-red.svg)\n\nThis repository contains code for replicating the experiments and visualizations from the paper\n\n[Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual\nReasoning](https://arxiv.org/abs/1803.05268)\n\u003e[David Mascharka](https://davidmascharka.com), [Philip Tran](https://github.com/ptran516), [Ryan Soklaski](https://github.com/rsokl), Arjun Majumdar \u003cbr\u003e\n\nThe paper describes Transparency-by-Design networks (TbD-nets), which are built around a visual\nattention mechanism. This repository contains the model architecture put forward in the paper and\ncode that will allow you to\n\n- [Produce the visualizations from the paper](#recreating-our-visualizations)\n- [Ask a natural-language question about an image you provide](#full-vqa)\n- [Train a model from scratch on the CLEVR dataset](#training-a-model)\n- [Predict answers on the CLEVR test set](#testing-a-model)\n\nA visualization of the output produced by our TbD-net model can be seen below.\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"img/tbd-net.png\" width=\"80%\"\u003e\n\u003c/div\u003e\n\nIf you find this code useful in your research, please cite\n\n``` tex\n@InProceedings{Mascharka_2018_CVPR,\nauthor = {Mascharka, David and Tran, Philip and Soklaski, Ryan and Majumdar, Arjun},\ntitle = {Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning},\nbooktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},\nmonth = {June},\nyear = {2018}\n} \n```\n\n# Full VQA\nTo ask a natural-language question and provide an image to the model and get an answer and reasoning\nchain back, see the [full VQA example notebook](full-vqa-example.ipynb). This will define all the\nmachinery you need to perform the full VQA task and will allow you to download the necessary models.\n[Try it with Binder!](https://mybinder.org/v2/gh/davidmascharka/tbd-nets/binder?filepath=full-vqa-example.ipynb)\n\n# Recreating Our Visualizations\nYou can [use Binder to use our model without any setup!](https://mybinder.org/v2/gh/davidmascharka/tbd-nets/binder?filepath=visualize-output.ipynb)\n\nTo reproduce our work on your local machine, you'll need to clone this repository and [set up\nPyTorch](#pytorch). We also recommend using [CUDA and cuDNN](#cudacudnn) if you have a GPU\navailable.\n\nYou can then open up the [visualize-output notebook](visualize-output.ipynb). That will walk you\nthrough running our model and generates all the figures we use in our paper. It will also download\none of our pretrained models. From there, you can play around with the images we provide without\nhaving to download any outside data or models. If you would like to experiment with our other\nmodels, see the [downloading models section](#downloading-models).\n\n# Training a Model\nTo train a model from scratch, there are a few requirements to take care of. We assume you have\nalready [set up PyTorch](#pytorch) and [CUDA/cuDNN](#cudacudnn) if you plan on using a GPU (which is\nhighly recommended).\n\n### 1. Getting data\nThe CLEVR dataset is available at [its project page](http://cs.stanford.edu/people/jcjohns/clevr/).\nThe first step for training is to download that data.\n\nYou will also need to extract features and process the question files to produce programs before\ntraining a model. The [instructions\nhere](https://github.com/facebookresearch/clevr-iep/blob/master/TRAINING.md#preprocessing-clevr)\nprovide a method for this. We recommend cloning that repository and following those instructions.\n\nNOTE: to extract 28x28 features, you will need to add the `--model_stage 2` option to the\n`extract_features.py` command. Following the conventions on that page, the command you want is:\n\n``` shell\npython scripts/extract_features.py \\\n    --input_image_dir data/CLEVR_v1.0/images/train \\\n    --output_h5_file data/train_features.h5 \\\n    --model_stage 2\n```\n\nIf you want to train on the 14x14 feature maps, you can follow Justin's instructions exactly.\n\nAfter you have finished the above, you will have several HDF5 files containing the image features\nand questions, and a vocabulary file. While we do provide a `DataLoader` that will work with the\nHDF5 files, we personally find NumPy npy files more robust and generally more pleasant to work with,\nso we default to using those.\n\n##### a. Converting HDF5 to npy\nNote that this step is completely optional. The [h5_to_np script](utils/h5_to_np.py) will produce\nnpy files from your HDF5 files.\n\nNote that the largest NumPy data file (train_features.npy) is 53 gigabytes for the 14x14 feature\nmaps or 105 gigabytes for the 28x28 feature maps, meaning you will need a substantial amount of RAM\navailable on your machine to create these files. *If you do not have enough memory available, use\nthe HDF5 data loader instead of trying to convert these files.*\n\nTo convert your HDF5 files to npy files, invoke one of the following, depending on whether you want\nto convert images to NumPy format as well:\n\n``` shell\npython h5_to_np -q /path/to/questions.h5 -f /path/to/features.h5 -i /path/to/images.h5 -d /path/to/dest/\npython h5_to_np -q /path/to/questions.h5 -f /path/to/features.h5 -d /path/to/destination/\n```\n\n### 2. Training the model\nThe [train-model notebook](train-model.ipynb) will then walk through the training process. Our\nrecommended directory structure (which is shown in the notebook) is to create a symlink to your data\nfiles inside the `data` folder. This can be done via:\n\n``` shell\nln -s /path/to/the/data/train_questions.npy data/training/\nln -s /path/to/the/data/train_image_idxs.npy data/training/\n# etc\n```\n\nfor data in npy format, or via:\n\n``` shell\nln -s /path/to/the/data/train_features.h5 data/training/\nln -s /path/to/the/data/train_questions.h5 data/training/\n# likewise for validation\n```\n\nfor data in HDF5 format.\n\nIf you prefer a different directory structure, update the data loader paths in the notebook. The\nnotebook will walk through training a model from this point.\n\n# Testing a Model\nNote that the testing data does not provide ground truth programs, so we will need to generate\nprograms from the questions for testing. We do not focus on this component of the network in our\nwork, so we reuse the program generator from [Johnson *et\nal.*](https://github.com/facebookresearch/clevr-iep) We have repackaged the sequence-to-sequence\nmodel they use for this, removing unnecessary functionality and updating the code to run on PyTorch\nversions later than 0.1. We provide a model checkpoint that we trained ourselves, so you don't need\nto download and use their model. The [test-eval notebook](test-eval.ipynb) will walk through the \nprocess to produce a file containing the predicted test answers.\n\n# Notes\n### Downloading Models\nTo download models, you can use the `download_pretrained_models.py` script, or download them\nprogrammatically as we do in the visualize output and full VQA notebooks.\n\nThere are several pretrained models available. If you would like to play with a specific model from\nthe table of results in the paper, you certainly can. *However*, we only provide extracted features\nfor the model trained on 28x28 feature maps, so if you want to use the 14x14 feature maps you'll\nneed to extract those features yourself. See [the getting data section](#1-getting-data) for details\non that. The download options for the script are:\n\n``` shell\npython utils/download_pretrained_models.py -m original\npython utils/download_pretrained_models.py -m reg\npython utils/download_pretrained_models.py -m hres\npython utils/download_pretrained_models.py -m all\n```\n\nThe default is `hres` which downloads only the models trained with higher-resolution 28x28 feature\nmaps and the regularization factor (see paper text for details). This results in cleaner looking\nattention masks, state-of-the-art performance, and is recommended. If you want to replicate the\nother results in the table, `original` will give only the models trained without regularization on\n14x14 feature maps, `reg` will download the models trained with regularization on 14x14 feature\nmaps, and `all` will download everything.\n\n### Python\nWe only recommend running the code with Python 3, having done all our development using Python\n3.6. While the code may be coerced into running in Python 2, *we will not support Python 2*, so please do not open\nissues that are related to Python 2 support.\n\n### PyTorch\nOur development was done using PyTorch v0.1.12, v0.2.0, and v0.3.0 and has been tested with v0.4. As such, our code\nshould run even on PyTorch versions earlier than 0.2 without modifications. However, we do recommend running on PyTorch \n0.2.0 or later. For setting up PyTorch, see the [official installation instructions](https://github.com/pytorch/pytorch#installation). The specific hash that the original model from our paper was developed from is\n[here](https://github.com/pytorch/pytorch/tree/d9b89a352c4ceeff24878f4f5321e16f059e98c3).\n\nTo use PyTorch \u003c0.4, clone the repository and check out `tags/torch0.3`. For PyTorch 0.4 and above, `master` will run.\n\n### CUDA/cuDNN\nOur code is tested under CUDA 8 and CUDA 9 with cuDNN 5 and cuDNN 7, respectively. For setting up\nCUDA, see the [NVIDIA documentation](https://developer.nvidia.com/cuda-toolkit). We recommend using\ncuDNN, which is also available [from NVIDIA](https://developer.nvidia.com/cudnn).\n\n### Operating Systems\nOur development was done on CentOS 6 and Ubuntu 16.04. The code has also been tested under Arch\nLinux.\n\n### Setting up a conda environment\nIf you like, you can use the `environment.yml` configuration to set up a development environment if\nyou use `conda`. This is the environment that Binder uses to give a live notebook for the\nvisualizations. To create an environment using this, run\n\n``` shell\nconda env create -f environment.yml\n```\n\nThe environment can then be activated with `source activate tbd-env`.\n\n# Copyright\nDISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited.\n\nThis material is based upon work supported by the Assistant Secretary of Defense for Research and\nEngineering under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions,\nfindings, conclusions or recommendations expressed in this material are those of the author(s) and\ndo not necessarily reflect the views of the Assistant Secretary of Defense for Research and\nEngineering.\n\n© 2017 Massachusetts Institute of Technology.\n\nMIT Proprietary, Subject to FAR52.227-11 Patent Rights - Ownership by the contractor (May 2014)\n\nThe software/firmware is provided to you on an As-Is basis\n\nDelivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or\n7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are\ndefined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than\nas specifically authorized by the U.S. Government may violate any copyrights that exist in this\nwork.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidmascharka%2Ftbd-nets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidmascharka%2Ftbd-nets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidmascharka%2Ftbd-nets/lists"}