{"id":13547688,"url":"https://github.com/facebookresearch/mmf","last_synced_at":"2025-05-14T22:05:40.683Z","repository":{"id":37491583,"uuid":"138831170","full_name":"facebookresearch/mmf","owner":"facebookresearch","description":"A modular framework for vision \u0026 language multimodal research from Facebook AI Research (FAIR)","archived":false,"fork":false,"pushed_at":"2025-04-24T02:53:53.000Z","size":18263,"stargazers_count":5563,"open_issues_count":149,"forks_count":939,"subscribers_count":110,"default_branch":"main","last_synced_at":"2025-05-14T22:05:14.395Z","etag":null,"topics":["captioning","deep-learning","dialog","hateful-memes","multi-tasking","multimodal","pretrained-models","pytorch","textvqa","vqa"],"latest_commit_sha":null,"homepage":"https://mmf.sh/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-06-27T04:52:40.000Z","updated_at":"2025-05-14T20:12:27.000Z","dependencies_parsed_at":"2025-05-07T08:46:28.915Z","dependency_job_id":null,"html_url":"https://github.com/facebookresearch/mmf","commit_stats":null,"previous_names":["facebookresearch/pythia"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fmmf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fmmf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fmmf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fmmf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/mmf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254235687,"owners_count":22036962,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["captioning","deep-learning","dialog","hateful-memes","multi-tasking","multimodal","pretrained-models","pytorch","textvqa","vqa"],"created_at":"2024-08-01T12:00:59.664Z","updated_at":"2025-05-14T22:05:40.518Z","avatar_url":"https://github.com/facebookresearch.png","language":"Python","readme":"\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://mmf.sh/img/logo.svg\" width=\"50%\"/\u003e\n\u003c/div\u003e\n\n#\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://mmf.sh/docs\"\u003e\n  \u003cimg alt=\"Documentation Status\" src=\"https://readthedocs.org/projects/mmf/badge/?version=latest\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://circleci.com/gh/facebookresearch/mmf\"\u003e\n  \u003cimg alt=\"CircleCI\" src=\"https://circleci.com/gh/facebookresearch/mmf.svg?style=svg\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n---\n\nMMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-the-art vision and language models and has powered multiple research projects at Facebook AI Research. See full list of project inside or built on MMF [here](https://mmf.sh/docs/notes/projects).\n\nMMF is powered by PyTorch, allows distributed training and is un-opinionated, scalable and fast. Use MMF to **_bootstrap_** for your next vision and language multimodal research project by following the [installation instructions](https://mmf.sh/docs/). Take a look at list of MMF features [here](https://mmf.sh/docs/getting_started/features).\n\nMMF also acts as **starter codebase** for challenges around vision and\nlanguage datasets (The Hateful Memes, TextVQA, TextCaps and VQA challenges). MMF was formerly known as Pythia. The next video shows an overview of how datasets and models work inside MMF. Checkout MMF's [video overview](https://mmf.sh/docs/getting_started/video_overview).\n\n\n## Installation\n\nFollow installation instructions in the [documentation](https://mmf.sh/docs/).\n\n## Documentation\n\nLearn more about MMF [here](https://mmf.sh/docs).\n\n## Citation\n\nIf you use MMF in your work or use any models published in MMF, please cite:\n\n```bibtex\n@misc{singh2020mmf,\n  author =       {Singh, Amanpreet and Goswami, Vedanuj and Natarajan, Vivek and Jiang, Yu and Chen, Xinlei and Shah, Meet and\n                 Rohrbach, Marcus and Batra, Dhruv and Parikh, Devi},\n  title =        {MMF: A multimodal framework for vision and language research},\n  howpublished = {\\url{https://github.com/facebookresearch/mmf}},\n  year =         {2020}\n}\n```\n\n## License\n\nMMF is licensed under BSD license available in [LICENSE](LICENSE) file\n","funding_links":[],"categories":["New Large-Scale Datasets","Python","图像数据与CV","Deep Learning Framework","Frameworks"],"sub_categories":["Libraries","High-Level DL APIs"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Fmmf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2Fmmf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Fmmf/lists"}