{"id":19867403,"url":"https://github.com/qinzzz/multimodal-alignment-framework","last_synced_at":"2025-10-23T21:22:38.912Z","repository":{"id":130866335,"uuid":"247822085","full_name":"qinzzz/Multimodal-Alignment-Framework","owner":"qinzzz","description":"Implementation for MAF: Multimodal Alignment Framework","archived":false,"fork":false,"pushed_at":"2020-11-25T17:26:37.000Z","size":298,"stargazers_count":46,"open_issues_count":4,"forks_count":9,"subscribers_count":0,"default_branch":"public","last_synced_at":"2025-04-06T23:14:07.812Z","etag":null,"topics":["localization","python","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qinzzz.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-16T21:42:38.000Z","updated_at":"2025-03-22T11:43:29.000Z","dependencies_parsed_at":"2023-04-09T07:00:40.397Z","dependency_job_id":null,"html_url":"https://github.com/qinzzz/Multimodal-Alignment-Framework","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qinzzz%2FMultimodal-Alignment-Framework","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qinzzz%2FMultimodal-Alignment-Framework/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qinzzz%2FMultimodal-Alignment-Framework/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qinzzz%2FMultimodal-Alignment-Framework/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qinzzz","download_url":"https://codeload.github.com/qinzzz/Multimodal-Alignment-Framework/tar.gz/refs/heads/public","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251998231,"owners_count":21677952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["localization","python","pytorch"],"created_at":"2024-11-12T15:29:14.108Z","updated_at":"2025-10-23T21:22:38.801Z","avatar_url":"https://github.com/qinzzz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multimodal Alignment Framework\n\nImplementation of MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding.\n\nSome of our code is based on [ban-vqa](https://github.com/jnhwkim/ban-vqa). Thanks!\n\n**TODO**\nprovide Faster R-CNN feature extraction script.\n\n\n## Prerequisites\n- python 3.7\n- pytorch 1.4.0 \n\n\n## Data\n\n### Flickr30k Entities\nWe use flickr30k dataset to train and validate our model.\n\nthe raw dataset can be found at [Flickr30k Entites Annotations](https://github.com/BryanPlummer/flickr30k_entities/blob/master/annotations.zip)\n\nRun\n`\n sh tools/prepare_data.sh\n`\nto downloaded and process Flickr30k Annotations, Images and Glove word embeddings.\n\n\n### Object proposals\n\n#### Donwload object proposals:\n\nWe use an off-the-shelf [faster-rcnn](https://github.com/jwyang/faster-rcnn.pytorch) pretrained on Visual Genome \nto generate objects proposals and labels. \nWe use [Bottom-Up Attention](https://github.com/airsplay/py-bottom-up-attention) for visual features.\n\nAs [Issue#1](https://github.com/qinzzz/Multimodal-Alignment-Framework/issues/1#issue-727382153) pointed out, there is some inconsistency\nbetween features generated using our script (faster-rcnn) and Bottom-Up Attention.\nWe therefore upload our generated features.\n\nDownload [train_features_compress.hdf5](https://drive.google.com/file/d/1ABnF0SZMf6pOAC89LJXbXZLMW1X86O96/view?usp=sharing)(6GB), [val features_compress.hdf5](https://drive.google.com/file/d/1iK-yz6PHwRuAciRW1vGkg9Bkj-aBE8yJ/view?usp=sharing), and [test features_compress.hdf5](https://drive.google.com/file/d/1pjntkbr20l2MiUBVQLVV6rQNWpXQymFs/view?usp=sharing) to `data/flickr30k`.\n\nalternative link for train_feature.hdf5 (20GB, same features): [google drive](https://drive.google.com/file/d/1zxghit_mDyIKhZRemN6EDCZ3xMR4xPu5/view?usp=sharing); [baidu drive](https://pan.baidu.com/s/1cyiKNYZzpja-5brcn9QD1A), code: n1yd.\n\nDownload [train_detection_dict.json](https://drive.google.com/file/d/1_S-zyKF7F8SIEht6V66Sqbsz9TBqzY-P/view?usp=sharing), [val_detection_dict.json](https://drive.google.com/file/d/1KmyG0mghwydkb7pEwxDjItwZvNi_DRA4/view?usp=sharing), and [test_detection_dict.json](https://drive.google.com/file/d/1-r4u45EyxY7uaIk6VxCZxCiBxaOlaTC2/view?usp=sharing) and  to `data/`.\n\n#### Generate object proposals by yourself(TODO)\n\n~~run ` sh tools/prepare_detection.sh ` to clone faster-rcnn code and download pre-trained models.~~\n\n~~run ` sh tools/run_faster_rcnn.sh ` to run faster-rcnn detection on flickr30k dataset and generate features.~~\n\n*you may have to customize your environment in order to run faster-rcnn successfully. \nSee [prerequisites](https://github.com/jwyang/faster-rcnn.pytorch#prerequisites)*\n\n\n## Training\n\n`\npython main.py [args]\n`\n\nIn our experiments, we get a ~61% accuracy using the default setting.\n\n\n## Evaluating\n\nOur trained model can be downloaded at [google drive](https://drive.google.com/file/d/1hVLDcsks2MuDJWpl2QB1H8DBCUefKCRY/view?usp=sharing).\n\n`\npython test.py --file \u003csaved model\u003e\n`\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqinzzz%2Fmultimodal-alignment-framework","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqinzzz%2Fmultimodal-alignment-framework","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqinzzz%2Fmultimodal-alignment-framework/lists"}