{"id":25743359,"url":"https://github.com/ndoll1998/appliedtransformers","last_synced_at":"2026-04-07T19:31:39.541Z","repository":{"id":53974689,"uuid":"289576990","full_name":"ndoll1998/AppliedTransformers","owner":"ndoll1998","description":"State-Of-The-Art Transformer Models","archived":false,"fork":false,"pushed_at":"2021-05-23T15:37:56.000Z","size":466,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-01-01T22:01:49.737Z","etag":null,"topics":["applied-machine-learning","bert","language-model","natural-language-processing","natural-language-understanding","nlp","pytorch","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ndoll1998.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-22T22:30:51.000Z","updated_at":"2023-06-08T07:50:08.000Z","dependencies_parsed_at":"2022-08-13T05:21:14.326Z","dependency_job_id":null,"html_url":"https://github.com/ndoll1998/AppliedTransformers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ndoll1998/AppliedTransformers","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndoll1998%2FAppliedTransformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndoll1998%2FAppliedTransformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndoll1998%2FAppliedTransformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndoll1998%2FAppliedTransformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ndoll1998","download_url":"https://codeload.github.com/ndoll1998/AppliedTransformers/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ndoll1998%2FAppliedTransformers/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31526665,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"ssl_error","status_checked_at":"2026-04-07T16:28:06.951Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["applied-machine-learning","bert","language-model","natural-language-processing","natural-language-understanding","nlp","pytorch","transformers"],"created_at":"2025-02-26T10:19:33.068Z","updated_at":"2026-04-07T19:31:39.517Z","avatar_url":"https://github.com/ndoll1998.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Applied Transformers\r\n\r\n| Branch  | Status |\r\n| ------- | :----: |\r\n| Master  | ![Unit Tests](https://github.com/ndoll1998/AppliedTransformers/workflows/Unit%20Tests%20Master/badge.svg)  |\r\n| Develop | ![Unit Tests](https://github.com/ndoll1998/AppliedTransformers/workflows/Unit%20Tests%20Develop/badge.svg) |\r\n\r\nApplied Transformers is a project collecting state-of-the-art transformer models to tackle typical natural language processing (NLP) task. It provides PyTorch implementations of the models as well as a number of datasets for several NLP tasks. The beauty of Applied Transformers is that one can easily switch between different encoders (i.e. BERT, ALBERT, KnowBERT, etc.) indipendent of the task or classifier.\r\n\r\n## Tasks\r\n\r\nThe project currently holds the following tasks:\r\n\r\n- [AspectBasedSentimentAnalysis](applied/tasks/absa)\r\n- [AspectOpinionExtraction](applied/tasks/aoex)\r\n- [NamedEntityClassification](applied/tasks/nec)\r\n- [RelationExtraction](applied/tasks/relex)\r\n\r\n## How to use\r\n\r\nOur main goal is to provide a very simple yet scalable interface for both training and infering SOTA transformer models for serveral NLP tasks. The following shows a simple example on how to train a `BERT` model for aspect based sentiment analysis (from `examples/train_abse.py`).\r\n\r\n```python\r\nimport torch\r\nfrom applied import encoders\r\nfrom applied import optimizers\r\nfrom applied.tasks import absa\r\nimport matplotlib.pyplot as plt\r\n\r\n# create encoder\r\nencoder = encoders.BERT.from_pretrained(\"bert-base-uncased\")\r\nencoder.init_tokenizer_from_pretrained(\"bert-base-uncased\")\r\n# create model and optimizer\r\nmodel = absa.models.SentencePairClassifier(encoder=encoder, \r\n    num_labels=absa.datasets.SemEval2014Task4.num_labels())\r\noptim = optimizers.AdamW(model.parameters(only_head=True), lr=1e-5, weight_decay=0.01)\r\n# create dataset and prepare it for the model\r\ndataset = absa.datasets.SemEval2014Task4(\r\n    data_base_dir='../data', seq_length=128, batch_size=2)\r\n# create trainer instance and train model\r\ntrainer = absa.Trainer(\r\n    model=model, \r\n    dataset=dataset,\r\n    optimizer=optim\r\n).train(epochs=2)\r\n# save metrics and model\r\ntrainer.metrics.save_table(\"../results/ABSA-Bert/metrics.table\")\r\ntorch.save(model.state_dict(), \"../results/ABSA-Bert/model.bin\")\r\n# plot metrics\r\nfig = trainer.metrics.plot()\r\nplt.show()\r\n```\r\n\r\n## Development\r\nThe overall architecture of this project aims to minimize the effort needed to implemente new ideas. This includes new models, datasets and also whole new tasks. The following section will describe the environment of a single task.\r\n\r\n### Task Directory\r\n\r\nIn general a task is a directory with the following structure\r\n```\r\n+-- taskA\r\n|   +-- models\r\n|   |   +-- __init__.py\r\n|   |   +-- base.py\r\n|   |   +-- ...\r\n|   +-- datasets\r\n|   |   +-- __init__.py\r\n|   |   +-- base.py\r\n|   |   +-- ...\r\n|   +-- __init__.py\r\n|   +-- trainer.py\r\n```\r\nWe'll give a quick overview on the initial files and their purpose\r\n - `models/base.py`\r\n    ```python\r\n    \"\"\" \r\n    Create the base model type. Usually there is nothing really to define here. \r\n    The base class is just for typechecking. \r\n    \"\"\"\r\n    from applied.core.model import Model\r\n    class BaseModel(Model): pass\r\n    ```\r\n - `datasets/base.py`\r\n    ```python\r\n    \"\"\" Create the base dataset type and the dataset item type. \"\"\"\r\n    from applied.core.dataset import Dataset, DatasetItem\r\n    from dataclasses import dataclass\r\n    @dataclass(frozen=True)\r\n    class MyDatasetItem(DatasetItem):\r\n        \"\"\" Define all the features that one single dataset item should contain. \"\"\"\r\n        text:str,\r\n        labels:tuple\r\n        ...\r\n    class BaseDataset(Dataset): pass\r\n        \"\"\" Again this class is just for typechecking. So usually nothing really to do here. \"\"\"\r\n\r\n    ```\r\n- `trainer.py`\r\n    ```python\r\n    \"\"\" \r\n    This file defines the trainer of the task. In most cases the trainer \r\n    only differs from the standard trainer by the metrics that are to be tracked. \r\n    \"\"\"\r\n    from applied.core.trainer import Trainer as BaseTrainer\r\n    from applied.core.metrics import MetricCollection, Losses, MicroF1Score, MacroF1Score\r\n    # import model and dataset\r\n    from .models.base import BaseModel\r\n    from .datasets.base import BaseDataset\r\n\r\n    class Trainer(BaseTrainer):\r\n        # model and dataset type\r\n        BASE_MODEL_TYPE = BaseModel\r\n        BASE_DATASET_TYPE = BaseDataset\r\n        # metrics type\r\n        # change this as needed\r\n        METRIC_TYPE = MetricCollection[Losses, ...]\r\n    ```\r\n\r\n### Custom Models and Datasets\r\nWith the task directory set up one can add new models. Just add a file to the models folder and implement the custom model. Typically the model class is of the following form:\r\n\r\n```python\r\nfrom .base import BaseModel\r\nfrom ..datasets.base import MyDatasetItem\r\nfrom applied.core.Model import Encoder, InputFeatures\r\n\r\n@dataclass\r\nclass CustomInputFeatures(InputFeatures):\r\n    \"\"\" specify additional input features for the custom model\r\n        The features should be of a torch tensor type\r\n        They will be passed to the forward function in the \r\n        same order as they are defined here\r\n    \"\"\"\r\n    # Note that you do not need to set this to a tensor by hand \r\n    # but the values will be collected and converted to the \r\n    # specified tensor type in preprocessing\r\n    additional_input:torch.Tensor\r\n    ...\r\n\r\nclass CustomModel(BaseModel):\r\n\r\n    def __init__(self, encoder:Encoder):\r\n        BaseModel.__init__(self, encoder=encoder)\r\n        # initialize your model here\r\n        # create all submodules, etc.\r\n\r\n    def build_features_from_item(self, item:MyDatasetItem) -\u003e Tuple[FeaturePair]:\r\n        \"\"\" build all feature-pairs from the provided dataset item. \r\n            A FeaturePair instance has to specify the text and labels. \r\n            Additionally you can provide the tokens. \r\n            Note that you can provide labels in any form. So this is NOT a \r\n            restriction to only sentence-level or token-level tasks.\r\n        \"\"\"\r\n        return (\r\n            CustomInputFeatures(text=textA, labels=labelsA),\r\n            CustomInputFeatures(text=textB, labels=labelsB, tokens=tokensB),\r\n        )\r\n\r\n    def build_target_tensors(self, features:Tuple[CustomInputFeatures]) -\u003e Tuple[torch.LongTensor]:\r\n        \"\"\" build all target tensors from the given features. \r\n            Note that even in the case of only one target tensor, \r\n            you still have to return a tuple of tensors.\r\n            Also make sure that the first dimension of each label tensor \r\n            corresponds to the examples, i.e. it should have the size of len(features).\r\n        \"\"\"\r\n        labels = [f.labels for f in features]\r\n        return (torch.LongTensor(labels),)\r\n\r\n    def forward(self, \r\n        # encoder input arguments\r\n        input_ids, \r\n        attention_mask=None, \r\n        token_type_ids=None,\r\n        # additional input arguments\r\n        additional_input=None,\r\n        ...\r\n    ):\r\n        \"\"\" the forward pass for the given model \"\"\"\r\n        # pass through encoder\r\n        last_hidden_state, pooled_output = self.encoder.forward(\r\n            input_ids=input_ids,\r\n            attention_mask=attention_mask,\r\n            token_type_ids=token_type_ids\r\n        )[:2]\r\n        # apply model\r\n        # ...\r\n\r\n        # return logits\r\n        return logits\r\n\r\n    def loss(self, logitsA, logitsB, labelsA, labelsB):\r\n        \"\"\" Compute loss \"\"\"\r\n        return F.cross_entropy(logitsA, labelsA)\r\n```\r\n\r\nImplementing a custom dataset is just as easy:\r\n\r\n```python\r\nfrom .base import BaseDataset, MyDatasetItem\r\n# use this file path class to support auto-download from url\r\nfrom applied.common.path import FilePath\r\n\r\nclass CustomDataset(Dataset):\r\n    def yield_train_items(self) -\u003e iter:\r\n        # load all training data and yield the dataset items one by one\r\n        base_data_dir = self.data_base_dir\r\n        yield MyDatasetItem(...)\r\n    def yield_eval_items(self) -\u003e iter:\r\n        # load all evaluation data and yield the dataset items one by one\r\n        base_data_dir = self.data_base_dir\r\n        yield MyDatasetItem(...)\r\n```\r\n## TODOs\r\n - implement more encoders\r\n - make trainer iterable\r\n - add JustInTime Dataset (JitDataset) that only loads the data currently needed\r\n - is there any benefit of having a task instance? (core.task)\r\n   - yes at least for house keeping this seems helpfull\r\n   - provide functions like show_datasets/show_models etc.\r\n - PyPi\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fndoll1998%2Fappliedtransformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fndoll1998%2Fappliedtransformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fndoll1998%2Fappliedtransformers/lists"}