{"id":20809817,"url":"https://github.com/georgian-io/multimodal-toolkit","last_synced_at":"2025-04-04T12:09:27.225Z","repository":{"id":37093178,"uuid":"289075382","full_name":"georgian-io/Multimodal-Toolkit","owner":"georgian-io","description":"Multimodal model for text and tabular data with HuggingFace transformers as building block for text data","archived":false,"fork":false,"pushed_at":"2024-04-30T14:29:42.000Z","size":75478,"stargazers_count":555,"open_issues_count":7,"forks_count":83,"subscribers_count":25,"default_branch":"master","last_synced_at":"2024-05-14T00:15:55.294Z","etag":null,"topics":["huggingface-transformers","multimodal-learning","natural-language-processing","tabular-data","transformer"],"latest_commit_sha":null,"homepage":"https://multimodal-toolkit.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/georgian-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-20T18:05:34.000Z","updated_at":"2024-06-21T07:10:26.828Z","dependencies_parsed_at":"2024-06-21T07:25:15.944Z","dependency_job_id":null,"html_url":"https://github.com/georgian-io/Multimodal-Toolkit","commit_stats":null,"previous_names":["georgianpartners/multimodal-toolkit"],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/georgian-io%2FMultimodal-Toolkit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/georgian-io%2FMultimodal-Toolkit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/georgian-io%2FMultimodal-Toolkit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/georgian-io%2FMultimodal-Toolkit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/georgian-io","download_url":"https://codeload.github.com/georgian-io/Multimodal-Toolkit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247174423,"owners_count":20896078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["huggingface-transformers","multimodal-learning","natural-language-processing","tabular-data","transformer"],"created_at":"2024-11-17T20:17:45.918Z","updated_at":"2025-04-04T12:09:27.205Z","avatar_url":"https://github.com/georgian-io.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multimodal Transformers | Transformers with Tabular Data\n\n--------------------------------------------------------------------------------\n**[Documentation](https://multimodal-toolkit.readthedocs.io/en/latest/index.html)** | **[Colab Notebook](https://multimodal-toolkit.readthedocs.io/en/latest/notes/colab_example.html)** | **[Blog Post](https://medium.com/georgian-impact-blog/how-to-incorporate-tabular-data-with-huggingface-transformers-b70ac45fcfb4)**\n\nA toolkit for incorporating multimodal data on top of text data for classification\nand regression tasks. It uses HuggingFace transformers as the base model for text features.\nThe toolkit adds a combining module that takes the outputs of the transformer in addition to categorical and numerical features\nto produce rich multimodal features for downstream classification/regression layers.\nGiven a pretrained transformer, the parameters of the combining module and transformer are trained based\non the supervised task. For a brief literature review, check out the accompanying [blog post](https://medium.com/georgian-impact-blog/how-to-incorporate-tabular-data-with-huggingface-transformers-b70ac45fcfb4) on Georgian's Impact Blog. \n\n![](https://drive.google.com/uc?export=view\u0026id=1kyExPDQNkg49NRYgcw2wk8xg4QtQ6Ppt)\n\n\n\n## Installation\nThe code was developed in Python 3.7 with PyTorch and Transformers 4.26.1.\nThe multimodal specific code is in `multimodal_transformers` folder.\n```\npip install multimodal-transformers\n```\n\n## Supported Transformers\nThe following Hugging Face Transformers are supported to handle tabular data. See the documentation [here](https://multimodal-toolkit.readthedocs.io/en/latest/modules/model.html#module-multimodal_transformers.model.tabular_transformers).\n* [BERT](https://huggingface.co/transformers/v3.1.0/model_doc/bert.html) from Devlin et al.:\n[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) (ACL 2019)\n* [ALBERT](https://huggingface.co/transformers/v3.1.0/model_doc/albert.html) from Lan et al.: [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations\n](https://arxiv.org/abs/1909.11942) (ICLR 2020)\n* [DistilBERT](https://huggingface.co/transformers/v3.1.0/model_doc/distilbert.html) from Sanh et al.: \n[DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) (NeurIPS 2019)\n* [RoBERTa](https://huggingface.co/transformers/v3.1.0/model_doc/roberta.html) \nfrom Liu et al.: [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692)\n* [XLM](https://huggingface.co/transformers/v3.1.0/model_doc/xlm.html) from Lample et al.: [Cross-lingual Language Model Pretraining\n](https://arxiv.org/abs/1901.07291) (NeurIPS 2019)\n* [XLNET](https://huggingface.co/transformers/v3.1.0/model_doc/xlnet.html) from Yang et al.:\n[XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) (NeurIPS 2019)\n* [XLM-RoBERTa](https://huggingface.co/transformers/v3.1.0/model_doc/xlmroberta.html) from Conneau et al.:\n[Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) (ACL 2020)\n\n## Included Datasets\nThis repository also includes two kaggle datasets which contain text data and \nrich tabular features\n* [Women's Clothing E-Commerce Reviews](https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews) for Recommendation Prediction (Classification)\n* [Melbourne Airbnb Open Data](https://www.kaggle.com/tylerx/melbourne-airbnb-open-data) for Price Prediction (Regression)\n* [PetFindermy Adoption Prediction](https://www.kaggle.com/c/petfinder-adoption-prediction) for Pet Adoption Speed Prediction (Multiclass Classification)\n \n\n## Working Examples\nTo quickly see these models in action on say one of the above datasets with preset configurations \n```\n$ python main.py ./datasets/Melbourne_Airbnb_Open_Data/train_config.json\n```\n\nOr if you prefer command line arguments run \n```\n$ python main.py \\\n    --output_dir=./logs/test \\\n    --task=classification \\\n    --combine_feat_method=individual_mlps_on_cat_and_numerical_feats_then_concat \\\n    --do_train \\\n    --model_name_or_path=distilbert-base-uncased \\\n    --data_path=./datasets/Womens_Clothing_E-Commerce_Reviews \\\n    --column_info_path=./datasets/Womens_Clothing_E-Commerce_Reviews/column_info.json\n```\n`main.py` expects a `json` file detailing which columns in a dataset contain text, \ncategorical, or numerical input features. It also expects a path to the folder where\nthe data is stored as `train.csv`, and `test.csv`(and if given `val.csv`).For more details on the arguments see \n`multimodal_exp_args.py`.\n### Notebook Introduction\nTo see the modules come together in a notebook: \\\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/georgianpartners/Multimodal-Toolkit/blob/master/notebooks/text_w_tabular_classification.ipynb)\n\n## Included Methods\n| combine feat method |description | requires both cat and num features | \n|:--------------|:-------------------|:-------|\n| text_only | Uses just the text columns as processed by a HuggingFace transformer before final classifier layer(s). Essentially equivalent to HuggingFace's `ForSequenceClassification` models |  False | \n| concat | Concatenate transformer output, numerical feats, and categorical feats all at once before final classifier layer(s) | False |\n| mlp_on_categorical_then_concat | MLP on categorical feats then concat transformer output, numerical feats, and processed categorical feats before final classifier layer(s) | False (Requires cat feats)\n| individual_mlps_on_cat_and_numerical_feats_then_concat | Separate MLPs on categorical feats and numerical feats then concatenation of transformer output, with processed numerical feats, and processed categorical feats before final classifier layer(s). | False\n| mlp_on_concatenated_cat_and_numerical_feats_then_concat | MLP on concatenated categorical and numerical feat then concatenated with transformer output before final classifier layer(s) | True\n| attention_on_cat_and_numerical_feats | Attention based summation of transformer outputs, numerical feats, and categorical feats queried by transformer outputs before final classifier layer(s). | False\n| gating_on_cat_and_num_feats_then_sum | Gated summation of transformer outputs, numerical feats, and categorical feats before final classifier layer(s). Inspired by [Integrating Multimodal Information in Large Pretrained Transformers](https://www.aclweb.org/anthology/2020.acl-main.214.pdf) which performs the mechanism for each token. | False\n| weighted_feature_sum_on_transformer_cat_and_numerical_feats | Learnable weighted feature-wise sum of transformer outputs, numerical feats and categorical feats for each feature dimension before final classifier layer(s) | False\n### Simple baseline model\nIn practice, taking the categorical and numerical features as they are and just tokenizing them and just concatenating them to \nthe text columns as extra text sentences is a strong baseline. To do that here, just specify all the categorical and numerical\ncolumns as text columns and set `combine_feat_method` to `text_only`. For example for each of the included sample datasets in `./datasets`, \nin `train_config.json` change `combine_feat_method` to `text_only` and `column_info_path` to  `./datasets/{dataset}/column_info_all_text.json`.\n\nIn the experiments below this baseline corresponds to Combine Feat Method being `unimodal`.\n\n## Results\nThe following tables shows the results on the two included datasets's respective test sets, by running main.py \nNon specified parameters are the default. \n\n### Review Prediction\nSpecific training parameters can be seen in `datasets/Womens_Clothing_E-Commerce_Reviews/train_config.json`.\n\nThere are **2** text columns, **3** categorical columns, and **3** numerical columns.\n\nModel | Combine Feat Method |F1 | PR AUC\n--------|-------------|---------|------- \nBert Base Uncased | text_only | 0.957 | 0.992\nBert Base Uncased | unimodal | **0.968** | **0.995**\nBert Base Uncased | concat | 0.958 | 0.992\nBert Base Uncased | individual_mlps_on_cat_and_numerical_feats_then_concat | 0.959 | 0.992\nBert Base Uncased | attention_on_cat_and_numerical_feats | 0.959 | 0.992\nBert Base Uncased | gating_on_cat_and_num_feats_then_sum | 0.961 | 0.994\nBert Base Uncased | weighted_feature_sum_on_transformer_cat_and_numerical_feats | 0.962 | 0.994\n\n\n### Pricing Prediction\nSpecific training parameters can be seen in `datasets/Melbourne_Airbnb_Open_Data/train_config.json`.\n\nThere are **3** text columns, **74** categorical columns, and **15** numerical columns.\n\nModel | Combine Feat Method | MAE | RMSE | \n--------|-------------|---------|------- | \nBert Base Multilingual Uncased | text_only | 82.74 | 254.0 |\nBert Base Multilingual Uncased | unimodal | 79.34 | 245.2 |\nBert Base Uncased | concat | **65.68** | 239.3 \nBert Base Multilingual Uncased | individual_mlps_on_cat_and_numerical_feats_then_concat | 66.73 | **237.3**  \nBert Base Multilingual Uncased | attention_on_cat_and_numerical_feats | 74.72 |246.3\nBert Base Multilingual Uncased | gating_on_cat_and_num_feats_then_sum | 66.64 | 237.8 \nBert Base Multilingual Uncased | weighted_feature_sum_on_transformer_cat_and_numerical_feats | 71.19 | 245.2 \n\n\n### Pet Adoption Prediction\nSpecific training parameters can be seen in `datasets/PetFindermy_Adoption_Prediction`\nThere are **2** text columns, **14** categorical columns, and **5** numerical columns.\n\nModel | Combine Feat Method | F1_macro | F1_micro | \n--------|-------------|---------|------- | \nBert Base Multilingual Uncased | text_only | 0.088 | 0.281 |\nBert Base Multilingual Uncased | unimodal | 0.089 | 0.283 |\nBert Base Uncased | concat | 0.199 | 0.362 \nBert Base Multilingual Uncased | individual_mlps_on_cat_and_numerical_feats_then_concat | 0.244 | 0.352\nBert Base Multilingual Uncased | attention_on_cat_and_numerical_feats | 0.254 | 0.375\nBert Base Multilingual Uncased | gating_on_cat_and_num_feats_then_sum | **0.275** | 0.375 \nBert Base Multilingual Uncased | weighted_feature_sum_on_transformer_cat_and_numerical_feats | 0.266 | **0.380**\n\n## Citation\nWe now have a [paper](https://www.aclweb.org/anthology/2021.maiworkshop-1.10/) you can cite for the Multimodal-Toolkit.\n```bibtex\n@inproceedings{gu-budhkar-2021-package,\n    title = \"A Package for Learning on Tabular and Text Data with Transformers\",\n    author = \"Gu, Ken  and\n      Budhkar, Akshay\",\n    booktitle = \"Proceedings of the Third Workshop on Multimodal Artificial Intelligence\",\n    month = jun,\n    year = \"2021\",\n    address = \"Mexico City, Mexico\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://www.aclweb.org/anthology/2021.maiworkshop-1.10\",\n    doi = \"10.18653/v1/2021.maiworkshop-1.10\",\n    pages = \"69--73\",\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeorgian-io%2Fmultimodal-toolkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgeorgian-io%2Fmultimodal-toolkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeorgian-io%2Fmultimodal-toolkit/lists"}