{"id":22066362,"url":"https://github.com/jaketae/ensemble-transformers","last_synced_at":"2026-04-09T10:03:40.017Z","repository":{"id":49169252,"uuid":"474882583","full_name":"jaketae/ensemble-transformers","owner":"jaketae","description":"Ensembling Hugging Face transformers made easy","archived":false,"fork":false,"pushed_at":"2022-12-24T05:58:45.000Z","size":43,"stargazers_count":62,"open_issues_count":2,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-13T01:55:21.701Z","etag":null,"topics":["deep-learning","machine-learning","natural-language-processing","nlp","python","pytorch","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jaketae.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-28T06:58:15.000Z","updated_at":"2025-04-09T05:27:33.000Z","dependencies_parsed_at":"2023-01-30T20:16:00.956Z","dependency_job_id":null,"html_url":"https://github.com/jaketae/ensemble-transformers","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaketae%2Fensemble-transformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaketae%2Fensemble-transformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaketae%2Fensemble-transformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaketae%2Fensemble-transformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jaketae","download_url":"https://codeload.github.com/jaketae/ensemble-transformers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253856639,"owners_count":21974577,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","machine-learning","natural-language-processing","nlp","python","pytorch","transformer"],"created_at":"2024-11-30T19:27:47.838Z","updated_at":"2026-04-09T10:03:39.954Z","avatar_url":"https://github.com/jaketae.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ensemble Transformers\n\nEnsembling Hugging Face Transformers made easy!\n\n## Why Ensemble Transformers?\n\nEnsembling is a simple yet powerful way of combining predictions from different models to increase performance. Since multiple models are used to derive a prediction, ensembling offers a way of decreasing variance and increasing robustness. Ensemble Transformers  provides an intuitive interface for ensembling pretrained models available in Hugging Face [`transformers`](https://huggingface.co/docs/transformers/index).\n\n## Installation\n\nEnsemble Transformers is available on [PyPI](https://pypi.org/project/ensemble-transformers/) and can easily be installed with the `pip` package manager.\n\n```\npip install -U pip wheel\npip install ensemble-transformers\n```\n\nTo try out the latest features, clone this repository and install from source.\n\n```\ngit clone https://github.com/jaketae/ensemble-transformers.git\ncd ensemble-transformers\npip install -e .\n```\n\n## Quickstart\n\nImport an ensemble model class according to your use case, specify the list of backbone models to use, and run training or inference right away.\n\n```python\n\u003e\u003e\u003e from ensemble_transformers import EnsembleModelForSequenceClassification\n\u003e\u003e\u003e ensemble = EnsembleModelForSequenceClassification.from_multiple_pretrained(\n    \"bert-base-uncased\", \"distilroberta-base\", \"xlnet-base-cased\"\n)\n\u003e\u003e\u003e batch = [\"This is a test sentence\", \"This is another test sentence.\"]\n\u003e\u003e\u003e output = ensemble(batch)\n\u003e\u003e\u003e output\nEnsembleModelOutput(\n        logits: [tensor([[ 0.2430, -0.0581],\n        [ 0.2145, -0.0541]], grad_fn=\u003cAddmmBackward0\u003e), tensor([[-0.0094, -0.0117],\n        [-0.0118, -0.0046]], grad_fn=\u003cAddmmBackward0\u003e), tensor([[-0.0962, -1.1581],\n        [-0.2195, -0.7422]], grad_fn=\u003cAddmmBackward0\u003e)],\n)\n\u003e\u003e\u003e stacked_output = ensemble(batch, mean_pool=True)\n\u003e\u003e\u003e stacked_output\nEnsembleModelOutput(\n        logits: tensor([[ 0.0458, -0.4093],\n        [-0.0056, -0.2670]], grad_fn=\u003cSumBackward1\u003e),\n)\n```\n\n## Usage\n\n### Ensembling with Configuration\n\nTo declare an ensemble, first create a configuration object specifying the Hugging Face transformers auto class, as well as the list of models to use to create the ensemble. \n\n```python\nfrom ensemble_transformers import EnsembleConfig, EnsembleModelForSequenceClassification\n\nconfig = EnsembleConfig(\n    \"AutoModelForSequenceClassification\", \n    model_names=[\"bert-base-uncased\", \"distilroberta-base\", \"xlnet-base-cased\"]\n)\n```\n\nThe ensemble model can then be declared via \n\n```python\nensemble = EnsembleModelForSequenceClassification(config)\n```\n\n### Ensembling with `from_multiple_pretrained`\n\nA more convenient way of declaring an ensemble is via `from_multiple_pretrained`, a method similar to `from_pretrained` in Hugging Face transformers. For instance, to perform text classification, we can use the `EnsembleModelForSequenceClassification` class.\n\n```python\nfrom ensemble_transformers import EnsembleModelForSequenceClassification\n\nensemble = EnsembleModelForSequenceClassification.from_multiple_pretrained(\n    \"bert-base-uncased\", \"distilroberta-base\", \"xlnet-base-cased\"\n)\n```\n\nUnlike Hugging Face transformers, which requires users to explicitly declare and initialize a preprocessor (e.g. `tokenizer`, `feature_extractor`, or `processor`) separate from the model, Ensemble Transformers automatically detects the preprocessor class and holds it within the `EnsembleModelForX` class as an internal attribute. Therefore, you do not have to declare a preprocessor yourself; Ensemble Transformers will do it for you.\n\nIn the example below, we see that the `ensemble` object correctly holds 3 tokenizers for each model.\n\n```python\n\u003e\u003e\u003e len(ensemble.preprocessors)\n3\n\u003e\u003e\u003e ensemble.preprocessors\n[PreTrainedTokenizerFast(name_or_path='bert-base-uncased', vocab_size=30522, model_max_len=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}), PreTrainedTokenizerFast(name_or_path='distilroberta-base', vocab_size=50265, model_max_len=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '\u003cs\u003e', 'eos_token': '\u003c/s\u003e', 'unk_token': '\u003cunk\u003e', 'sep_token': '\u003c/s\u003e', 'pad_token': '\u003cpad\u003e', 'cls_token': '\u003cs\u003e', 'mask_token': AddedToken(\"\u003cmask\u003e\", rstrip=False, lstrip=True, single_word=False, normalized=False)}), PreTrainedTokenizerFast(name_or_path='xlnet-base-cased', vocab_size=32000, model_max_len=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '\u003cs\u003e', 'eos_token': '\u003c/s\u003e', 'unk_token': '\u003cunk\u003e', 'sep_token': '\u003csep\u003e', 'pad_token': '\u003cpad\u003e', 'cls_token': '\u003ccls\u003e', 'mask_token': AddedToken(\"\u003cmask\u003e\", rstrip=False, lstrip=True, single_word=False, normalized=False), 'additional_special_tokens': ['\u003ceop\u003e', '\u003ceod\u003e']})]\n```\n\n### Heterogenous Modality\n\nFor the majority of use cases, it does not make sense to ensemble models from different modalities, e.g., a language model and an image model. As mentioned, Ensemble Transformers will auto-detect the modality of each model and prevent unintended mixing of models.\n\n```python\n\u003e\u003e\u003e from ensemble_transformers import EnsembleConfig\n\u003e\u003e\u003e config = EnsembleConfig(\"AutoModelForSequenceClassification\", model_names=[\"bert-base-uncased\", \"google/vit-base-patch16-224-in21k\"])\nTraceback (most recent call last):\n  File \"\u003cstdin\u003e\", line 1, in \u003cmodule\u003e\n  File \"/Users/jaketae/Documents/Dev/github/ensemble-transformers/ensemble_transformers/config.py\", line 37, in __init__\n    raise ValueError(\"Cannot ensemble models of different modalities.\")\nValueError: Cannot ensemble models of different modalities.\n```\n\n### Loading Across Devices\n\nBecause ensembling involves multiple models, it is often impossible to load all models onto a single device. To alleviate memory requirements, Ensemble Transformers offers a way of distributing models across different devices. For instance, say you have access to multiple GPU cards and want to load each model onto different GPUs. This can easily be achieved by the following line.\n\n```python\nensemble.to_multiple(\n    [\"cuda:0\", \"cuda:1\", \"cuda:2\"]\n)\n```\n\nThe familiar `to(device)` method is also supported, and it loads all models onto the same device.\n\n```python\nensemble.to(\"cuda\")\n```\n\n### Forward Propagation\n\nTo run forward propagation, simply pass a batch of raw input to the ensemble. In the case of language models, this is just a batch of text.\n\n```python\n\u003e\u003e\u003e batch = [\"This is a test sentence\", \"This is another test sentence.\"]\n\u003e\u003e\u003e output = ensemble(batch)\n\u003e\u003e\u003e output\nEnsembleModelOutput(\n        logits: [tensor([[ 0.2430, -0.0581],\n        [ 0.2145, -0.0541]], grad_fn=\u003cAddmmBackward0\u003e), tensor([[-0.0094, -0.0117],\n        [-0.0118, -0.0046]], grad_fn=\u003cAddmmBackward0\u003e), tensor([[-0.0962, -1.1581],\n        [-0.2195, -0.7422]], grad_fn=\u003cAddmmBackward0\u003e)]\n)\n\u003e\u003e\u003e output.outputs\n[SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1681, -0.3470],\n        [ 0.1573, -0.1571]], grad_fn=\u003cAddmmBackward0\u003e), hidden_states=None, attentions=None), SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1388, -0.0711],\n        [ 0.1429, -0.0841]], grad_fn=\u003cAddmmBackward0\u003e), hidden_states=None, attentions=None), XLNetForSequenceClassificationOutput(loss=None, logits=tensor([[0.5506, 0.1506],\n        [0.4308, 0.1397]], grad_fn=\u003cAddmmBackward0\u003e), mems=(tensor([[[ 0.0344,  0.0202,  0.0261,  ..., -0.0175, -0.0343,  0.0252],\n         [-0.0281, -0.0198, -0.0387,  ..., -0.0420, -0.0160, -0.0253]],\n       ...,\n        [[ 0.2468, -0.4007, -1.0839,  ..., -0.2943, -0.3944,  0.0605],\n         [ 0.1970,  0.2106, -0.1448,  ..., -0.6331, -0.0655,  0.7427]]])), hidden_states=None, attentions=None)]\n```\n\nBy default, the ensemble returns a `EnsembleModelOutput` instance, which contains all the outputs from each model. The raw outputs from each model is accessible via the `.outputs` field. The `EnsembleModelOutput` class also scans across each of the raw output and collects common keys. In the example above, all model outputs contained a `.logits` field, which is why it appears as a field in the `output` instance.\n\nWe can also stack or mean-pool the output of each model by toggling `mean_pool=True` in the forward call.\n\n```python\n\u003e\u003e\u003e stacked_output = ensemble(batch, mean_pool=True)\n\u003e\u003e\u003e stacked_output\nEnsembleModelOutput(\n        logits: tensor([[ 0.0458, -0.4093],\n        [-0.0056, -0.2670]], grad_fn=\u003cSumBackward1\u003e),\n)\n```\n\nIf the models are spread across different devices, the result is collected in `main_device`, which defaults to the CPU.\n\n### Preprocessor Arguments\n\nPreprocessors accept a number of optional arguments. For instance, for simple batching, `padding=True` is used. Moreover, PyTorch models require `return_tensors=\"pt\"`. Ensemble Transformers already ships with minimal, sensible defaults so that it works out-of-the-box. However, for more custom behavior, you can modify the `preprocessor_kwargs` argument. The example below demonstrates how to use TensorFlow language models without padding.\n\n```python\nensemble(batch, preprocessor_kwargs={\"return_tensors\": \"tf\", \"padding\": False})\n```\n\n## Contributing\n\nThis repository is under active development. Any and all issues and pull requests are welcome. If you would prefer, feel free to reach out to me at jaesungtae@gmail.com.\n\n## License\n\nReleased under the [MIT License](LICENSE).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaketae%2Fensemble-transformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjaketae%2Fensemble-transformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaketae%2Fensemble-transformers/lists"}