{"id":18801029,"url":"https://github.com/mandliya/pythia_updated","last_synced_at":"2026-01-04T08:30:14.338Z","repository":{"id":79099950,"uuid":"195294089","full_name":"mandliya/pythia_updated","owner":"mandliya","description":null,"archived":false,"fork":false,"pushed_at":"2023-05-22T22:16:38.000Z","size":6705,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-29T19:46:23.739Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mandliya.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-04T19:46:56.000Z","updated_at":"2023-08-17T04:44:38.000Z","dependencies_parsed_at":"2024-11-07T22:35:19.917Z","dependency_job_id":null,"html_url":"https://github.com/mandliya/pythia_updated","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mandliya%2Fpythia_updated","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mandliya%2Fpythia_updated/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mandliya%2Fpythia_updated/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mandliya%2Fpythia_updated/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mandliya","download_url":"https://codeload.github.com/mandliya/pythia_updated/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239734538,"owners_count":19688256,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T22:21:44.185Z","updated_at":"2026-01-04T08:30:14.293Z","avatar_url":"https://github.com/mandliya.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pythia\n\n[![Documentation Status](https://readthedocs.org/projects/learnpythia/badge/?version=latest)](https://learnpythia.readthedocs.io/en/latest/?badge=latest) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Z9fsh10rFtgWe4uy8nvU4mQmqdokdIRR)[![CircleCI](https://circleci.com/gh/facebookresearch/pythia.svg?style=svg)](https://circleci.com/gh/facebookresearch/pythia)\n\n\nPythia is a modular framework for vision and language multimodal research. Built on top\nof PyTorch, it features:\n\n- **Model Zoo**: Reference implementations for state-of-the-art vision and language model including\n[LoRRA](https://arxiv.org/abs/1904.08920) (SoTA on VQA and TextVQA),\n[Pythia](https://arxiv.org/abs/1807.09956) model (VQA 2018 challenge winner) and [BAN](https://arxiv.org/abs/1805.07932).\n- **Multi-Tasking**: Support for multi-tasking which allows training on multiple dataset together.\n- **Datasets**: Includes support for various datasets built-in including VQA, VizWiz, TextVQA and VisualDialog.\n- **Modules**: Provides implementations for many commonly used layers in vision and language domain\n- **Distributed**: Support for distributed training based on DataParallel as well as DistributedDataParallel.\n- **Unopinionated**: Unopinionated about the dataset and model implementations built on top of it.\n- **Customization**: Custom losses, metrics, scheduling, optimizers, tensorboard; suits all your custom needs.\n\nYou can use Pythia to **_bootstrap_** for your next vision and language multimodal research project.\n\nPythia can also act as **starter codebase** for challenges around vision and\nlanguage datasets (TextVQA challenge, VQA challenge)\n\n![Pythia Examples](https://i.imgur.com/BP8sYnk.jpg)\n\n## Documentation\n\nLearn more about Pythia [here](https://learnpythia.readthedocs.io/en/latest/).\n\n## Demo\n\nTry the demo for Pythia model on [Colab](https://colab.research.google.com/drive/1Z9fsh10rFtgWe4uy8nvU4mQmqdokdIRR).\n\n## Getting Started\n\nFirst install the repo using\n\n```\ngit clone https://github.com/facebookresearch/pythia ~/pythia\n\n# You can also create your own conda environment and then enter this step\ncd ~/pythia\npython setup.py develop\n```\n\nNow, Pythia should be ready to use. Follow steps in specific sections to start training\nyour own models using Pythia.\n\n\n## Data\n\nDefault configuration assume that all of the data is present in the `data` folder inside `pythia` folder.\n\nDepending on which dataset you are planning to use download the feature and imdb (image database) data for that particular dataset using\nthe links in the table (_right click -\u003e copy link address_). Feature data has been extracted out from Detectron and are used in the\nreference models. Example below shows the sample commands to be run, once you have\nthe feature (feature_link) and imdb (imdb_link) data links.\n\n```\ncd ~/pythia\nmkdir -p data \u0026\u0026 cd data\nwget http://dl.fbaipublicfiles.com/pythia/data/vocab.tar.gz\n\n# The following command should result in a 'vocabs' folder in your data dir\ntar xf vocab.tar.gz\n\n# Download detectron weights\nwget http://dl.fbaipublicfiles.com/pythia/data/detectron_weights.tar.gz\ntar xf detectron_weights.tar.gz\n\n# Now download the features required, feature link is taken from the table below\n# These two commands below can take time\nwget feature_link\n\n# [features].tar.gz is the file you just downloaded, replace that with your file's name\ntar xf [features].tar.gz\n\n# Make imdb folder and download required imdb\nmkdir -p imdb \u0026\u0026 cd imdb\nwget imdb_link\n\n# [imdb].tar.gz is the file you just downloaded, replace that with your file's name\ntar xf [imdb].tar.gz\n```\n\n| Dataset      | Key | Task | ImDB Link                                                                         | Features Link  | Features checksum                                                                 |\n|--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------|---------|\n| TextVQA      | textvqa | vqa | [TextVQA 0.5 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | `b22e80997b2580edaf08d7e3a896e324` | \n| VQA 2.0      | vqa2 | vqa | [VQA 2.0 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz)                 | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz)              | `ab7947b04f3063c774b87dfbf4d0e981` |\n| VizWiz       | vizwiz | vqa | [VizWiz ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz)           | [VizWiz](https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz)          | `9a28d6a9892dda8519d03fba52fb899f` |\n| VisualDialog | visdial | dialog | Coming soon!                                                                      | Coming soon!                                                                    | Coming soon! | \n\nAfter downloading the features, verify the download by checking the md5sum using \n\n```bash\necho \"\u003cchecksum\u003e  \u003cdataset_name\u003e.tar.gz\" | md5sum -c -\n```\n\n\n## Training\n\nOnce we have the data downloaded and in place, we just need to select a model, an appropriate task and dataset as well related config file. Default configurations can be found  inside `configs` folder in repository's root folder. Configs are divided for models in format of `[task]/[dataset_key]/[model_key].yml` where `dataset_key` can be retrieved from the table above. For example, for `pythia` model, configuration for VQA 2.0 dataset can be found at `configs/vqa/vqa2/pythia.yml`. Following table shows the keys and the datasets\nsupported by the models in Pythia's model zoo.\n\n| Model  | Key | Supported Datasets    | Pretrained Models | Notes                                                     |\n|--------|-----------|-----------------------|-------------------|-----------------------------------------------------------|\n| Pythia | pythia    | vqa2, vizwiz, textvqa | [vqa2 train+val](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth), [vqa2 train only](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth), [vizwiz](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vizwiz/pythia_pretrained_vqa2.pth)  | VizWiz model has been pretrained on VQAv2 and transferred |\n| LoRRA  | lorra     | vqa2, vizwiz, textvqa       | [textvqa](https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pth)      |                               |\n| BAN    | ban       | vqa2, vizwiz, textvqa | Coming soon!      | Support is preliminary and haven't been tested thoroughly. |\n\n\nFor running `LoRRA` on `TextVQA`, run the following command from root directory of your pythia clone:\n\n```\ncd ~/pythia\npython tools/run.py --tasks vqa --datasets textvqa --model lorra --config configs/vqa/textvqa/lorra.yml\n```\n\n## Pretrained Models\n\nWe are including some of the pretrained models as described in the table above.\nFor e.g. to run the inference using LoRRA for TextVQA for EvalAI use following commands:\n\n```\n# Download the model first\ncd ~/pythia/data\nmkdir -p models \u0026\u0026 cd models;\n# Get link from the table above and extract if needed\nwget https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pth\n\ncd ../..\n# Replace tasks, datasets and model with corresponding key for other pretrained models\npython tools/run.py --tasks vqa --datasets textvqa --model lorra --config configs/vqa/textvqa/lorra.yml \\\n--run_type inference --evalai_inference 1 --resume_file data/models/lorra_best.pth\n```\n\nThe table below shows inference metrics for various pretrained models:\n\n| Model  | Dataset          | Metric                     | Notes                         |\n|--------|------------------|----------------------------|-------------------------------|\n| Pythia | vqa2 (train+val) | test-dev accuracy - 68.31% | Can be easily pushed to 69.2% |\n| Pythia | vqa2 (train)     | test-dev accuracy - 66.70%  |  |\n| Pythia | vizwiz (train)     | test-dev accuracy - 54.22%  |    Pretrained on VQA2 and transferred to VizWiz                           |\n| LoRRA  | textvqa (train)  | val accuracy - 27.4%       |                               |\n\n**Note** that, for simplicity, our current released model **does not** incorporate extensive data augmentations (e.g. visual genome, visual dialogue) during training, which was used in our challenge winner entries for VQA and VizWiz 2018. As a result, there can be some performance gap to models reported and released previously. If you are looking for reproducing those results, please checkout the [v0.1](https://github.com/facebookresearch/pythia/releases/tag/v0.1) release.\n\n## Documentation\n\nDocumentation specific on how to navigate around Pythia and making changes will be available soon.\n\n## Citation\n\nIf you use Pythia in your work, please cite:\n\n```\n@inproceedings{singh2019TowardsVM,\n  title={Towards VQA Models That Can Read},\n  author={Singh, Amanpreet and Natarajan, Vivek and Shah, Meet and Jiang, Yu and Chen, Xinlei and Batra, Dhruv and Parikh, Devi and Rohrbach, Marcus},\n  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n  year={2019}\n}\n```\n\nand\n\n```\n@inproceedings{singh2018pythia,\n  title={Pythia-a platform for vision \\\u0026 language research},\n  author={Singh, Amanpreet and Natarajan, Vivek and Jiang, Yu and Chen, Xinlei and Shah, Meet and Rohrbach, Marcus and Batra, Dhruv and Parikh, Devi},\n  booktitle={SysML Workshop, NeurIPS},\n  volume={2018},\n  year={2018}\n}\n```\n\n## Troubleshooting/FAQs\n\n1. If `setup.py` causes any issues, please install fastText first directly from the source and\nthen run `python setup.py develop`. To install fastText run following commands:\n\n```\ngit clone https://github.com/facebookresearch/fastText.git\ncd fastText\npip install -e .\n```\n\n## License\n\nPythia is licensed under BSD license available in [LICENSE](LICENSE) file\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmandliya%2Fpythia_updated","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmandliya%2Fpythia_updated","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmandliya%2Fpythia_updated/lists"}