{"id":19168398,"url":"https://github.com/bloomberg/dataless-model-merging","last_synced_at":"2025-08-19T06:34:02.017Z","repository":{"id":152268180,"uuid":"607303280","full_name":"bloomberg/dataless-model-merging","owner":"bloomberg","description":"Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)","archived":false,"fork":false,"pushed_at":"2023-07-25T21:17:22.000Z","size":117,"stargazers_count":89,"open_issues_count":9,"forks_count":6,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-07-27T05:31:20.595Z","etag":null,"topics":["language-model","model-merging"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bloomberg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-27T18:12:26.000Z","updated_at":"2025-06-03T07:47:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"fcec37bc-d876-490d-b725-8103093425ad","html_url":"https://github.com/bloomberg/dataless-model-merging","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bloomberg/dataless-model-merging","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bloomberg%2Fdataless-model-merging","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bloomberg%2Fdataless-model-merging/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bloomberg%2Fdataless-model-merging/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bloomberg%2Fdataless-model-merging/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bloomberg","download_url":"https://codeload.github.com/bloomberg/dataless-model-merging/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bloomberg%2Fdataless-model-merging/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271113408,"owners_count":24701609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["language-model","model-merging"],"created_at":"2024-11-09T09:42:30.971Z","updated_at":"2025-08-19T06:34:01.995Z","avatar_url":"https://github.com/bloomberg.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Dataless Knowledge Fusion by Merging Weights of Language Models\n\nThis repository contains the experimental code to reproduce the results in [Dataless Knowledge Fusion by Merging Weights of Language Models](https://openreview.net/forum?id=FCnohuR6AnM), a paper to be published during the [Eleventh International Conference on Learning Representations (ICLR 2023)](https://iclr.cc/), to be held May 1-5, 2023 in Kigali, Rwanda.\n\n```\n@inproceedings{\n    jin2023dataless,\n    title={Dataless Knowledge Fusion by Merging Weights of Language Models},\n    author={Xisen Jin and Xiang Ren and Daniel Preotiuc-Pietro and Pengxiang Cheng},\n    booktitle={The Eleventh International Conference on Learning Representations},\n    year={2023},\n    url={https://openreview.net/forum?id=FCnohuR6AnM}\n}\n```\n\n## Requirements\nWe used PyTorch 1.13.1. See [requirements.txt](requirements.txt) for other requirements.\n\n## Quick Demo\nIf you are just interested in the Regresssion Mean (RegMean) algorithm, please check [regmean_demo.ipynb](regmean_demo.ipynb).\n\nThis is a standalone Jupyter notebook that merges two Hugging Face transformer models fine-tuned on GLUE. This file does not import files under `src/`.\n\n## Reproducing Results\n### Preparing Emotion Classification Datasets\nPlease download the unified emotion dataset in this [repo](https://github.com/sarnthil/unify-emotion-datasets). The files should be placed under `PROJECT_ROOT/resources/emotion_splits` in the following structure.\n\n```\n.\n├── crowdflower\n│   ├── dev.jsonl\n│   ├── full.jsonl\n│   ├── test.jsonl\n│   └── train.jsonl\n├── dailydialog\n│   ├── dev.jsonl\n│   ├── full.jsonl\n│   ├── test.jsonl\n│   └── train.jsonl\n├── electoraltweets\n│   ├── dev.jsonl\n│   ├── full.jsonl\n│   ├── test.jsonl\n│   └── train.jsonl\n├── emobank\n│   ├── dev.jsonl\n│   ├── full.jsonl\n│   ├── test.jsonl\n│   └── train.jsonl\n...\n```\n\n### Preparing NER Datasets\nPlease prepare CoNLL2003, OntoNotes, and Twitter NER datasets and place them under `PROJECT_ROOT/resources/ner`.\n```\n.\n├── conll2003\n│   ├── dev.conll\n│   ├── test.conll\n│   └── train.conll\n├── ontonotes\n│   ├── onto.development.bc.ner\n│   ├── onto.development.bn.ner\n│   ├── onto.development.mz.ner\n│   ├── onto.development.nw.ner\n│   ├── onto.development.tc.ner\n│   ├── onto.development.wb.ner\n│   ├── onto.test.bc.ner\n│   ├── onto.test.bn.ner\n│   ├── onto.test.mz.ner\n│   ├── onto.test.nw.ner\n│   ├── onto.test.tc.ner\n│   ├── onto.test.wb.ner\n│   ├── onto.train.bc.ner\n│   ├── onto.train.bn.ner\n│   ├── onto.train.mz.ner\n│   ├── onto.train.nw.ner\n│   ├── onto.train.tc.ner\n│   └── onto.train.wb.ner\n└── twitter\n    ├── annotated.twitter-ner-20-21-tweet-dev-withcleaned.json\n    ├── annotated.twitter-ner-20-21-tweet-test-withcleaned.json\n    └── annotated.twitter-ner-20-21-tweet-train-withcleaned.json\n```\n\nHere, CoNLL and OntoNotes datasets contain entries in the CoNLL format.\n\n```\nCRICKET\tO\tConll\n-\tO\tConll\nLEICESTERSHIRE\tB-ORG\tConll\nTAKE\tO\tConll\nOVER\tO\tConll\nAT\tO\tConll\nTOP\tO\tConll\nAFTER\tO\tConll\nINNINGS\tO\tConll\nVICTORY\tO\tConll\n.\tO\tConll\n\nLONDON\tB-LOC\tConll\n1996-08-30\tO\tConll\n...\n```\n\nTwitter NER contains 1 JSON dict per line.\n\n```\n{\"text\": \"Spectacular skies over #Clonmel tonight http://t.co/OxclQkuyTp /via @niallodonovan #lastdayofautumn\", \"id\": \"539106999980797952\", \"entities\": [{\"startCharOffset\": 24, \"endOffset\": 31, \"endCharOffset\": 31, \"surface\": \"Clonmel\", \"startOffset\": 24, \"type\": \"LOC\"}, {\"startCharOffset\": 69, \"endOffset\": 82, \"endCharOffset\": 82, \"surface\": \"niallodonovan\", \"startOffset\": 69, \"type\": \"PER\"}], \"labels\": [\"O\", \"O\", \"O\", \"O\", \"B-LOC\", \"O\", \"O\", \"O\", \"O\", \"B-PER\", \"O\", \"O\"], \"tokens\": [\"Spectacular\", \"skies\", \"over\", \"#\", \"Clonmel\", \"tonight\", \"http://t.co/OxclQkuyTp\", \"/\", \"via\", \"@niallodonovan\", \"#\", \"lastdayofautumn\"], \"domain\": \"TWT\"}\n```\n\n### Preparing GLUE datasets\nGLUE datasets will be downloaded and loaded with Hugging Face's `datasets` library.\n\n### Preparing Pretrained LMs\nPlease download pretrained models (e.g., RoBERTa-base) from the Hugging Face models repository and place them under `PROJECT_ROOT/resources` (e.g., `PROJECT_ROOT/resources/roberta-base`).\n\n### Usage\n- `--config_files`: See under `src/configs`. The training module (`src.run_experiments`) requires three config files defining default arguments (`src/defaults.yaml`), data config (under `src/configs/datasets`), and exp config (under `src/configs/exps`).\n\n- `--filter_model`: Useful when merging only a subset of individual models specificed in data config, e.g., `--filter_model model0 model1` will perform pairwaise merging of model0 and model1 (see the definition of alias like model0, model1 in the data config).\n\n- `--templates`: config files may contain templates like `{seed}`. The values of templates should be specified in command lines (e.g., `--templates seed=1`).\n\nIndividual models (before merging) will be trained and stored under `local_zoo_dir` specified in the config. If none of the individual models in the zoo match the given model type and `zoo_filter` arguments in the config, then the program will automatically train new individual models and store them under `local_zoo_dir`. If individual models are found in `local_zoo_dir`, they will be loaded without re-training.\n\nExample: *RegMean, Emotion, Same Head Init, Merginging Model0 (dailydialogue) and Model1 (crowdflower)*\n\n```\nHF_DATASETS_OFFLINE=1 CUDA_VISIBLE_DEVICES=0 python -m src.run_experiments --config src/configs/defaults.yaml src/configs/datasets/emotion.yaml src/configs/exps/roberta-base/roberta-base-emotion.yaml --templates seed=1 --filter_model model0 model1\n```\n\n### Scripts\n#### Pairwise Merging\nMerging two emotion classification models trained on different datasets (domains).\n- Emotion, RoBERTa-base: `scripts/roberta/pairwise_emotion.py`\n- Emotion, T5-base: `scripts/t5/pairwise_emotion.py`\n- Emotion, RoBERTa-base: `scripts/t5/pairwise_emotion.py`\n\nMerging two models trained on different GLUE tasks. Task-specific classification heads are not merged.\n- GLUE, DistilBERT-base: `scripts/distilbert/pairwise_glue_difftask.py`\n- GLUE, RoBERTa-base: `scripts/roberta/pairwise_glue_difftask.py`\n\nMerging two models trained on two non-IID partitions of the same GLUE task\n- GLUE, DistilBERT-base: `scripts/distilbert/pairwise_glue_subset.py`\n- GLUE, RoBERTa-base: `scripts/roberta/pairwise_glue_subset.py`\n\n#### Greedy Merging\nGreedily merging multiple (two to all) models in the order of OOD performance of individual models.\n- Emotion, RoBERTa-base: `scripts/roberta/incremental_emotion.py`\n- Emotion, T5-base: `scripts/t5/incremental_emotion.py`\n- Emotion, DeBERTa-large: `scripts/deberta/incrementale_emotion.py`\n- NER, RoBERTa-base: `scripts/roberta/incremental_ner.py`\n- NER, DeBERTa-large: `scripts/deberta/incremental_ner.py`\n\nPlease note these scripts run inference on both in-domain and out-of-domain test sets.\n\nEach script above will run Simple, Fisher, and RegMean averaging. They also run the Multi-Task Learning (MTL), model ensemble, and the performance of individual models (before merging) as comparators. You can comment out lines inside these scripts to just run part of each one.\n\n## License\n\nThis project is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for details.\n\n## Code of Conduct\n\nThis project has adopted a [Code of Conduct](https://github.com/bloomberg/.github/blob/master/CODE_OF_CONDUCT.md).\nIf you have any concerns about the Code, or behavior which you have experienced in the project, please\ncontact us at opensource@bloomberg.net.\n\n## Security Vulnerability Reporting\n\nIf you believe you have identified a security vulnerability in this project, please send an email to the project\nteam at opensource@bloomberg.net detailing the suspected issue and any methods you've found to reproduce it.\n\nPlease do NOT open an issue in the GitHub repository, as we'd prefer to keep vulnerability reports private until\nwe've had an opportunity to review and address them.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbloomberg%2Fdataless-model-merging","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbloomberg%2Fdataless-model-merging","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbloomberg%2Fdataless-model-merging/lists"}