{"id":21346305,"url":"https://github.com/muennighoff/vilio","last_synced_at":"2025-09-07T11:35:52.239Z","repository":{"id":104657268,"uuid":"307948873","full_name":"Muennighoff/vilio","owner":"Muennighoff","description":"🥶Vilio: State-of-the-art VL models in PyTorch \u0026 PaddlePaddle","archived":false,"fork":false,"pushed_at":"2023-06-08T08:35:51.000Z","size":10887,"stargazers_count":90,"open_issues_count":6,"forks_count":28,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-12T17:43:10.086Z","etag":null,"topics":["ernie-vil","hateful-memes","lxmert","oscar","transformers","uniter","vision-and-language","vision-transformer","visualbert"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2012.07788","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Muennighoff.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-10-28T08:17:32.000Z","updated_at":"2025-07-06T07:36:50.000Z","dependencies_parsed_at":"2023-11-28T20:15:10.253Z","dependency_job_id":null,"html_url":"https://github.com/Muennighoff/vilio","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Muennighoff/vilio","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muennighoff%2Fvilio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muennighoff%2Fvilio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muennighoff%2Fvilio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muennighoff%2Fvilio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Muennighoff","download_url":"https://codeload.github.com/Muennighoff/vilio/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muennighoff%2Fvilio/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274030314,"owners_count":25210489,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ernie-vil","hateful-memes","lxmert","oscar","transformers","uniter","vision-and-language","vision-transformer","visualbert"],"created_at":"2024-11-22T02:05:57.496Z","updated_at":"2025-09-07T11:35:52.227Z","avatar_url":"https://github.com/Muennighoff.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003ch1 align=\"center\"\u003e 🥶VILIO🥶 \u003c/h1\u003e \n    \u003cbr\u003e\n\u003cp\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://circleci.com/gh/huggingface/transformers\"\u003e\n        \u003cimg alt=\"Build\" src=\"https://img.shields.io/circleci/build/github/huggingface/transformers/master\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/huggingface/transformers/releases\"\u003e\n        \u003cimg alt=\"GitHub release\" src=\"https://img.shields.io/github/release/huggingface/transformers.svg\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://huggingface.co/transformers/index.html\"\u003e\n        \u003cimg alt=\"Transformers Documentation\" src=\"https://img.shields.io/website/http/huggingface.co/transformers/index.html.svg?down_color=red\u0026down_message=offline\u0026up_message=online\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md\"\u003e\n        \u003cimg alt=\"Contributor Covenant\" src=\"https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003e\n\u003cp\u003e State-of-the-art Visio-Linguistic Models 🥶\n\u003c/h3\u003e\n\n## Updates\n\n### 06/2021 - Hateful Memes CSV Files\n\n- The CSV files that were used for the scores in the \u003ca href=\"https://arxiv.org/abs/2012.07788\"\u003evilio paper\u003c/a\u003e are now available \u003ca href=\"https://www.kaggle.com/muennighoff/vilioresults\"\u003ehere\u003c/a\u003e\n\n### 06/2021 - Inference on any meme\n\n- Thanks to the initiative by \u003ca href=\"https://github.com/katrinc\"\u003ekatrinc\u003c/a\u003e, here are two notebooks for using Vilio to perform pure inference on any meme you want :)\n- Just adapt the example input dataset / input model to use a different meme / pretrained model🥶\n- GPU: https://www.kaggle.com/muennighoff/vilioexample-nb\n- CPU: https://www.kaggle.com/muennighoff/vilioexample-nb-cpu\n\n\n## Ordering\n\nVilio aims to replicate the organization of huggingface's transformer repo at:\nhttps://github.com/huggingface/transformers\n\n- /bash\nShell files to reproduce hateful memes results\n\n- /data\nBy default, directory for loading in data \u0026 saving checkpoints\n\n- /ernie-vil\nErnie-vil sub-repository written in PaddlePaddle\n\n- /fts_lmdb\nScripts for handling .lmdb extracted features\n\n- /fts_tsv\nScripts for handling .tsv extracted features\n\n- /notebooks\nJupyter Notebooks for demonstration \u0026 reproducibility\n\n- /py-bottm-up-attention\nSub-repository for tsv feature extraction forked \u0026 adapted from [here](https://github.com/airsplay/py-bottom-up-attention)\n\n- src/vilio\nAll implemented models (also see below for a quick overview of models)\n\n- /utils\nPandas \u0026 ensembling scripts for data handling\n\n- entry.py files\nScripts used to access the models and apply model-specific data preparation\n\n- pretrain.py files\nSame purpose as entry files, but for pre-training; Point of entry for pre-training\n\n- hm.py\nTraining code for the hateful memes challenge; Main point of entry\n\n- param.py\nArgs for running hm.py\n\n\n## Usage\n\nFollow SCORE_REPRO.md for reproducing performance on the Hateful Memes Task. \u003cbr\u003e\nFollow GETTING_STARTED.md for using the framework for your own task. \u003cbr\u003e\nSee the paper at: https://arxiv.org/abs/2012.07788\n\n## Architectures\n\n🥶 Vilio currently provides the following architectures with the outlined language transformers:\n\n1. **[E - ERNIE-VIL](https://arxiv.org/abs/2006.16934)** [ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph](https://arxiv.org/abs/2006.16934)\n    - [ERNIE: Enhanced Language Representation with Informative Entities](https://arxiv.org/abs/1905.07129)\n1. **[D - DeVLBERT](https://arxiv.org/abs/2008.06884)** [DeVLBert: Learning Deconfounded Visio-Linguistic Representations](https://arxiv.org/abs/2008.06884)\n    - [BERT: Bidirectional Transformers](https://arxiv.org/abs/1810.04805)\n1. **[O - OSCAR](https://arxiv.org/abs/2004.06165)** [Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks](https://arxiv.org/abs/2004.06165)\n    - [BERT: Bidirectional Transformers](https://arxiv.org/abs/1810.04805)\n1. **[U - UNITER](https://arxiv.org/abs/1909.11740)** [UNITER: UNiversal Image-TExt Representation Learning](https://arxiv.org/abs/1909.11740)\n    - [BERT: Bidirectional Transformers](https://arxiv.org/abs/1810.04805)\n    - [RoBERTa: Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)\n1. **[V - VisualBERT](https://arxiv.org/abs/1908.03557)** [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/abs/1908.03557)\n    - [ALBERT: A Lite BERT](https://arxiv.org/abs/1909.11942)\n    - [BERT: Bidirectional Transformers](https://arxiv.org/abs/1810.04805)\n    - [RoBERTa: Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)\n1. **[X - LXMERT](https://arxiv.org/abs/1908.07490)** [LXMERT: Learning Cross-Modality Encoder Representations from Transformers](https://arxiv.org/abs/1908.07490)\n    - [ALBERT: A Lite BERT](https://arxiv.org/abs/1909.11942)\n    - [BERT: Bidirectional Transformers](https://arxiv.org/abs/1810.04805)\n    - [RoBERTa: Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)\n\n\n## To-do's\n\n- [ ] Clean-up import statements, python paths \u0026 find a better way to integrate transformers (Right now all import statements only work if in main folder)\n- [ ] Enable loading and running models just via import statements (and not having to clone the repo)\n- [ ] Find a way to better include ERNIE-VIL in this repo (PaddlePaddle to Torch?)\n- [ ] Move tokenization in entry files to model-specific tokenization similar to transformers\n\n\n## Attributions\n\nThe code heavily borrows from the following repositories, thanks for their great work:\n- https://github.com/huggingface/transformers\n- https://github.com/facebookresearch/mmf\n- https://github.com/airsplay/lxmert\n\n## Citation\n\n```bibtex\n@article{muennighoff2020vilio,\n  title={Vilio: State-of-the-art visio-linguistic models applied to hateful memes},\n  author={Muennighoff, Niklas},\n  journal={arXiv preprint arXiv:2012.07788},\n  year={2020}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuennighoff%2Fvilio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmuennighoff%2Fvilio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuennighoff%2Fvilio/lists"}