{"id":24432273,"url":"https://github.com/bayer-group/xtars-naacl2022","last_synced_at":"2026-03-06T19:02:08.124Z","repository":{"id":89208670,"uuid":"480287148","full_name":"Bayer-Group/xtars-naacl2022","owner":"Bayer-Group","description":"Zero/few-shot learning for classification with very large label sets and long-tailed distribution of labels in data points","archived":false,"fork":false,"pushed_at":"2025-05-02T14:41:31.000Z","size":21,"stargazers_count":6,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-01-11T19:59:33.352Z","etag":null,"topics":["bayer-not-classified","bayer-reg-none","beat-not-applicable","few-shot-learning","large-scale-classification","natural-language-processing","text-classification","zero-shot-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Bayer-Group.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-11T08:14:33.000Z","updated_at":"2023-10-21T02:01:10.000Z","dependencies_parsed_at":"2023-03-30T13:17:47.912Z","dependency_job_id":null,"html_url":"https://github.com/Bayer-Group/xtars-naacl2022","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Bayer-Group/xtars-naacl2022","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bayer-Group%2Fxtars-naacl2022","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bayer-Group%2Fxtars-naacl2022/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bayer-Group%2Fxtars-naacl2022/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bayer-Group%2Fxtars-naacl2022/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Bayer-Group","download_url":"https://codeload.github.com/Bayer-Group/xtars-naacl2022/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bayer-Group%2Fxtars-naacl2022/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30192368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T18:54:55.862Z","status":"ssl_error","status_checked_at":"2026-03-06T18:53:04.013Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayer-not-classified","bayer-reg-none","beat-not-applicable","few-shot-learning","large-scale-classification","natural-language-processing","text-classification","zero-shot-learning"],"created_at":"2025-01-20T15:36:42.266Z","updated_at":"2026-03-06T19:02:08.064Z","avatar_url":"https://github.com/Bayer-Group.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# XTARS: zero/few-shot learning for large-scale text classification\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://github.com/psf/black\"\u003e\u003cimg alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nThis repository contains the code of the following paper:\n\n    @inproceedings{ziletti-etal-2022-medical,\n    title = \"{M}edical Coding with Biomedical Transformer Ensembles and Zero/Few-shot Learning\",\n    author = \"Ziletti, Angelo  and\n    Akbik, Alan  and\n    Berns, Christoph  and\n    Herold, Thomas  and\n    Legler, Marion  and\n    Viell, Martina\",\n    booktitle = \"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track\",\n    month = jul,\n    year = \"2022\",\n    address = \"Hybrid: Seattle, Washington + Online\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2022.naacl-industry.21\",\n    doi = \"10.18653/v1/2022.naacl-industry.21\",\n    pages = \"176--187\",\n    abstract = \"Medical coding (MC) is an essential pre-requisite for reliable data retrieval and reporting. Given a free-text \\textit{reported term} (RT) such as {``}pain of right thigh to the knee{''}, the task is to identify the matching \\textit{lowest-level term} (LLT) {--}in this case {``}unilateral leg pain{''}{--} from a very large and continuously growing repository of standardized medical terms. However, automating this task is challenging due to a large number of LLT codes (as of writing over $80\\,000$), limited availability of training data for long tail/emerging classes, and the general high accuracy demands of the medical domain.With this paper, we introduce the MC task, discuss its challenges, and present a novel approach called xTARS that combines traditional BERT-based classification with a recent zero/few-shot learning approach (TARS). We present extensive experiments that show that our combined approach outperforms strong baselines, especially in the few-shot regime. The approach is developed and deployed at Bayer, live since November 2021. As we believe our approach potentially promising beyond MC, and to ensure reproducibility, we release the code to the research community.\",\n    }\n\nWithin this paper, we present a novel approach called **XTARS** that combines traditional BERTbased classification with a recent zero/few-shot\nlearning approach (TARS, by [Halder et al. (2020)](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf)).   \n**XTARS** is suitable for classification tasks with very large label sets and long-tailed distribution of labels in data points.\n\n\n## Installation\n\nWe recommend to create a virtual python 3.8 environment (for instance, with conda: https://docs.anaconda.com/anaconda/install/linux/), and then execute\n\nInstall latest version from the master branch on Github by:\n```\ngit clone \u003cGITHUB-URL\u003e    \ncd xtars    \npython setup.py install     \n```\n\n## Quick start\nThe `XTARSClassifier` in this repository can be used in the same way as the `TARSClassifier` in [Flair](https://github.com/flairNLP/flair).\n\nDocumentation on the usage of the `TARSClassifier` in Flair can be found [here](https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_10_TRAINING_ZERO_SHOT_MODEL.md).\n\nTo use XTARS instead of TARS, simply substitute `TARSClassifier` with `XTARSClassifier` at training time.    \nAn example of training is presented in `main_train.py`. \n\nDuring prediction, a saved `XTARSClassifier` can be loaded in exactly the same way as the `TARSClassifier`.\nWe refer you to the [Flair](https://github.com/flairNLP/flair) documentation for more details.\n\n---------------\n\n## Example code\n\nA script for fine tuning (`main_train.py`) and making predictions (`main_predict.py`) are provided for your convenience.\n\n### Data for Fine Tuning\n\nSample data are provided in the `/sample_data/` folder. \nIf you are using your own data, it must be formatted as the sample data provided.    \nAs prescribed by [Flair](https://github.com/flairNLP/flair), to  create the corpus three files are needed (see [here](https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_6_CORPUS.md)):\n```\ntrain.csv\ndev.csv\ntest.csv\n```\n\nWe prepared a *sample dataset* in `/sample_data/` for your convenience.\n\n### Fine Tuning\n\nUse the `main_train.py` script to fine tune a model on the sample data provided.\n\n```\npython main_train.py \n```\n\n### Predictions\n\nAfter you trained a model, you can use `main_predict.py` script to obtain prediction for the test set.\n\n```\npython main_predict.py \n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbayer-group%2Fxtars-naacl2022","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbayer-group%2Fxtars-naacl2022","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbayer-group%2Fxtars-naacl2022/lists"}