{"id":26945479,"url":"https://github.com/UIC-Liu-Lab/ContinualLM","last_synced_at":"2025-04-02T19:02:30.602Z","repository":{"id":169431024,"uuid":"600885350","full_name":"UIC-Liu-Lab/ContinualLM","owner":"UIC-Liu-Lab","description":"An Extensible Continual Learning Framework Focused on Language Models (LMs)","archived":false,"fork":false,"pushed_at":"2024-01-28T21:26:40.000Z","size":713,"stargazers_count":215,"open_issues_count":3,"forks_count":15,"subscribers_count":10,"default_branch":"main","last_synced_at":"2024-05-15T09:44:33.090Z","etag":null,"topics":["catastrophic-forgetting","continual-learning","domain-adaptive-pretraining","knowledge-transfer","language-model","language-modeling","natural-language-processing","transfer-learning","transformer-architecture"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UIC-Liu-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-02-12T21:53:12.000Z","updated_at":"2024-05-14T08:21:00.000Z","dependencies_parsed_at":"2024-01-27T05:29:00.816Z","dependency_job_id":"995189a8-dd9f-465e-b396-fac4b57af347","html_url":"https://github.com/UIC-Liu-Lab/ContinualLM","commit_stats":null,"previous_names":["uic-liu-lab/continuallm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UIC-Liu-Lab%2FContinualLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UIC-Liu-Lab%2FContinualLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UIC-Liu-Lab%2FContinualLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UIC-Liu-Lab%2FContinualLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UIC-Liu-Lab","download_url":"https://codeload.github.com/UIC-Liu-Lab/ContinualLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246875874,"owners_count":20848039,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["catastrophic-forgetting","continual-learning","domain-adaptive-pretraining","knowledge-transfer","language-model","language-modeling","natural-language-processing","transfer-learning","transformer-architecture"],"created_at":"2025-04-02T19:01:53.071Z","updated_at":"2025-04-02T19:02:30.579Z","avatar_url":"https://github.com/UIC-Liu-Lab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\n  \n# ContinualLM    \n \u003cp align=\"center\"\u003e    \n    \u003cbr\u003e    \n    \u003ca href=\"https://github.com/UIC-Liu-Lab//ContinualLM\"\u003e    \n        \u003cimg src=\"https://github.com/UIC-Liu-Lab/ContinualLM/blob/main/docs/icon.png\" width=\"200\"/\u003e    \n    \u003c/a\u003e       \n        \u003cfigcaption\u003eImagine an LM that not only effortlessly acquires new knowledge but also retains its mastery of skills, all while successfully transferring knowledge. Is it even possible?\u003c/figcaption\u003e  \n    \u003cbr\u003e    \n\u003cp\u003e\n\n## News  \n🔥 We have added [checkpoints](#checkpoints-in-huggingface) in Hugging Face for easier reproduction!  \n🔥 We have added [continual_pretrain.ipynb](https://github.com/UIC-Liu-Lab/ContinualLM/blob/main/continual_pretrain.ipynb) as a **self-contained example** of the soft-masking scenario. It runs well without GPUs!  \n🔥 Soft-masking can also work in **conventional continual fine-tuning**. Check out our latest [EMNLP23](https://arxiv.org/abs/2310.09436) paper!  \n🔥 Wondering whether you can adapt a **black-box LLM** without worrying about the update of its parameters? Check out our latest paper on retrieval-augmented generation (RAG) [here](https://arxiv.org/abs/2401.06954)!\n\n## Quick Links  \n  \n - [Introduction](#introduction)  \n - [Simple Example](#simple-example)\n - [Dataset](#dataset)  \n - [Architecture](#architecture)  \n - [Installation](#installation)  \n - [Domain-adaptive Pre-training](#domain-adaptive-pre-training)  \n - [End-task Fine-tuning](#end-task-fine-tuning)  \n - [Checkpoints in Huggingface](#checkpoints-in-huggingface)  \n - [Reference](#reference)  \n - [Contact](#contact)  \n \n## Introduction    \n In 2021, we introduced  [Pycontinual](https://github.com/ZixuanKe/PyContinual), a straightforward and flexible framework for continual learning. Our research has benefited significantly from this framework. Today, we are excited to share the **ContinualLM**, an extensible continual learning framework focused on language models (LMs), designed to sustain the benefits of continual learning (CL) in this field.    \n    \nContinual learning for LMs is distinct from traditional CL because     \n - Each task is treated as a **domain-specific corpus** (at present, our primary focus is on domain-adaptive pre-training, which is also known as pre-finetuning or post-training).  \n - Moreover, the evaluation process involves **fine-tuning** the corresponding end-task.    \n    \nOur repository includes a PyTorch implementation of a collection of state-of-the-art (SoTA) methods, using the same training and evaluation pipeline. This repository is committed to advancing the field of continual learning for LMs. The methods included are:    \n    \n    \n* From our group:\n   * **DAS**: [Continual Learning of Language Models](https://arxiv.org/abs/2210.05549), ICLR 2023    \n   * **CPT**: [Continual Training of Language Models for Few-Shot Learning](https://arxiv.org/abs/2210.05549), EMNLP 2022    \n   * **DGA**: [Adapting a Language Model While Preserving its General Knowledge](https://arxiv.org/abs/2301.08986), EMNLP 2022    \n   * **CTR**: [Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning](https://proceedings.neurips.cc/paper/2021/hash/bcd0049c35799cdf57d06eaf2eb3cff6-Abstract.html), NeurIPS 2021  \n   * **CLASSIC**: [CLASSIC: Continual and Contrastive Learning of Aspect Sentiment Classification Tasks](https://aclanthology.org/2021.emnlp-main.550/), EMNLP 2021    \n   * **B-CL**: [Adapting BERT for Continual Learning of a Sequence of Aspect Sentiment Classification Tasks](https://www.aclweb.org/anthology/2021.naacl-main.378.pdf), NAACL 2021   \n   \n* From other groups **(more to come)**:\n  * **DEMIX**: [Demix layers: Disentangling domains for modular language modeling](https://aclanthology.org/2022.naacl-main.407);, Gururangan et al., NAACL 2022)  \n  * **EWC**: [Overcoming catastrophic forgetting in neural networks](https://arxiv.org/abs/1612.00796), Kirkpatrick et al., PNAS 2017    \n  * **DER++**: [Dark experience for general continual learning: a strong, simple baseline](https://arxiv.org/abs/2004.07211), Buzzega et al., NeuriPS 2020    \n  * **HAT**: [Overcoming catastrophic forgetting with hard attention to the task](http://proceedings.mlr.press/v80/serra18a.html), Serrà et al., ICML 2018    \n       \n* Widely employed baselines for continual learning:\n  * **NCL**: Naive continual learning:continual domain-adaptive pre-training of a sequence of domains, without any specific attention paid to the issues of forgetting or transfer.\n  * **ONE**: Individually conducting domain-adaptive pre-training for each domain.\n  * **Adapter-ONE**: Adds adapter to Transformer for each domain  \n  * **Prompt-ONE**:  Adds prompt to Transformer for each domain  \n  * **KD**: Naive knoweldge Distillation  \n\n## Simple Example\nWe have added ``continual_pretrain.ipynb`` as a self-contained example of the soft-masking scenario. It runs well without GPUs!\n\n## Dataset  \n  \nWhen it comes to the continual learning of language models (LMs), finding appropriate datasets is crucial. The datasets we provide adhere to the following principles:  \n  \n*  **Domain-specific:** The domain corpus must be specific enough to enhance end-task performance.  \n*  **End-task available**: We favor assessing the trained language models through the end-task rather than relying on perplexity, since the former represents a more dependable evaluation approach.  \n\nWe release our dataset comprising **6** distinct domains, each accompanied by its corresponding end-task. The dataset can be found [here](https://drive.google.com/file/d/1_fAu9dPHUpFyAbAN1aBByib3tEVRlZpS/view?usp=sharing). Below are some statistics for each domain:\n\n| Domain Corpus    | Size   | End-task    | Task                                     | #Training | #Testing | #Classes |\n|------------------|--------|-------------|------------------------------------------|-----------|----------|----------|\n| Yelp Restaurant  | 758MB  | Restaurant  | Aspect Sentiment Classification (ASC)    | 3,452     | 1,120    | 3        |\n| Amazon Phone     | 724MB  | Phone       | Aspect Sentiment Classification (ASC)    | 239       | 553      | 2        |\n| Amazon Camera    | 319MB  | Camera      | Aspect Sentiment Classification (ASC)    | 230       | 626      | 2        |\n| ACL Papers       | 867MB  | ACL         | Citation Intent Classification           | 1,520     | 421      | 6        |\n| AI Papers        | 507MB  | AI          | Relation Classification                  | 2,260     | 2,388    | 7        |\n| PubMed Papers    | 989MB  | PubMed      | Chemical-protein Interaction Prediction  | 2,667     | 7,398    | 13       |\n\n\n  \n## Architecture  \nThe architecture of ContinualLM largely follows that of  [Pycontinual](https://github.com/ZixuanKe/PyContinual), [CPT](https://github.com/UIC-Liu-Lab/CPT) and [DGA](https://github.com/UIC-Liu-Lab/DGA).\n\n## Installation\n\n```conda create --name continuallm --file requirements.txt```\n\n:warning: Our model is based on `transformers==4.17.0` and `adapter-transformers==3.0.1`. We recommend using these specific versions, as using other versions may result in unexpected bugs.\n  \n## Domain-adaptive Pre-training  \nThis is where continual learning happens. We will learn a sequnce of domains.   \n  \n```bash  \nmax_samples=640000 \nfor idrandom in 0 \ndo    \n for pt_task in 0 1 2 3 4 5    \n  do    \n python -m torch.distributed.launch --nproc_per_node 4 --use_env posttrain.py \\    \n --per_device_train_batch_size 62 \\ \n --fp16\\    \n --max_seq_length 164 \\ \n --max_samples ${max_samples} \\ \n --idrandom ${idrandom} \\ \n --ntasks 6 \\ \n --pt_task ${pt_task} \\ \n --baseline 'das'\n done \ndone  \n```  \n* `--idrandom`: choose the task sequence. See `./sequences` for more details.  \n* `--baseline`: see the introduction for available baseline models (see ```choices``` in ```config.py```).  \n  \n## End-task Fine-tuning  \nAfter conitinual learning of LMs, now we are able to evaluate the performace by runing end-task fine-tuning **individually**.  \n```bash  \nmax_samples=640000    \n seed=(2021 111 222 333 444 555 666 777 888 999)    \n for round in 0; do    \n  for idrandom in 0;    \n  do    \n    for pt_task in 0 1 2 3 4 5   \n    do    \n      for ft_task in $(seq 0 ${pt_task});    \n      do    \n       python finetune.py \\    \n       --max_seq_length 164 \\ \n       --pt_task ${pt_task} \\ \n       --ft_task ${ft_task} \\ \n       --idrandom ${idrandom} \\ \n       --ntasks 6 \\ \n       --max_samples ${max_samples} \\\n       --seed ${seed[$round]} \\ \n       --baseline 'das'    \n       done    \n    done   \n  done  \ndone  \n```  \n  \n  \n## Checkpoints in Huggingface  \n\nFor those who are interested solely in the resulting model or want to continue per-training the model with their own data, we have good news! We offer checkpoints through Hugging Face.\n\nYou can easily import our continually post-trained model with HuggingFace's `transformers`!\n\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n# Import our model. The package will take care of downloading the models automatically\ntokenizer = AutoTokenizer.from_pretrained(\"UIC-Liu-Lab/DAS-Rest2Cam\")\nmodel = AutoModelForSequenceClassification.from_pretrained(\"UIC-Liu-Lab/DAS-Rest2Cam\", trust_remote_code=True)\n\n# Tokenize input texts\ntexts = [\n    \"There's a kid on a skateboard.\",\n    \"A kid is skateboarding.\",\n    \"A kid is inside the house.\"\n]\ninputs = tokenizer(texts, padding=True, truncation=True, return_tensors=\"pt\")\n\n# Get the model output!\nres = model(**inputs)\n\n```\n\nIf you encounter any problem when directly loading the models by HuggingFace's API, you can also download the models manually from the [repo](https://huggingface.co/UIC-Liu-Lab/DAS-Rest2Cam/tree/main) and use `model = AutoModel.from_pretrained({PATH TO THE DOWNLOAD MODEL})`.\n\n⚠ The continual pre-training sequence is the **first sequence** at ``./sequences/posttrain`` (from **Restaurant to Camera**), you can use the downloaded weights to fine-tune the corresponding end-task.\n\n⚠ If you are interested in the importance files, please refer to ``before_distill0`` and ``after_mlm{domain_id}``. ``before`` signifies the importance computed before pre-training, which is done only once before the first domain for general pre-trained knowledge. ``after`` indicates the importance computed after the pre-training of domain_id.\n  \n## Reference  \nWe highly appreciate your act of staring and citing. Your attention to detail and recognition is greatly valued.  \n  \n  \n```bibtex  \n  \n@inproceedings{ke2022dgs,  \n title={Continual Learning of Language Models}, author={Ke, Zixuan and Shao, Yijia and Lin, Haowei and Konishi, Tatsuya and Kim, Gyuhak and Liu, Bing}, booktitle={International Conference on Learning Representations (ICLR)}, year={2023}}  \n  \n@inproceedings{ke2022dga,  \n title={Adapting a Language Model While Preserving its General Knowledge}, author={Ke, Zixuan and Shao, Yijia and Lin, Haowei and Xu, Hu and Shu, Lei, and Liu, Bing}, booktitle={Empirical Methods in Natural Language Processing (EMNLP)}, year={2022}}  \n  \n@inproceedings{ke2022continual,  \n title={Continual Training of Language Models for Few-Shot Learning}, author={Ke, Zixuan and Lin, Haowei and Shao, Yijia and Xu, Hu and Shu, Lei, and Liu, Bing}, booktitle={Empirical Methods in Natural Language Processing (EMNLP)}, year={2022}}  \n```  \n  \n  \n## Contact\n\nIf you have any questions regarding the code, please feel free to send an email to [Zixuan Ke](https://vincent950129.github.io/), [Yijia Shao](https://shaoyijia.github.io/), or [Haowei Lin](https://linhaowei1.github.io/). Alternatively, you may open an issue. We would like to express our gratitude to [Bing Liu](https://www.cs.uic.edu/~liub/), [Hu Xu](https://howardhsu.github.io/), and [Lei Shu](https://leishu02.github.io/) for their valuable comments and opinions \n\n[comment]: \u003c\u003e (With gratitude, we want to acknowledge that the creation of this repository would not have been possible without the invaluable contributions of [Zixuan Ke]\u0026#40;https://vincent950129.github.io/\u0026#41;, [Yijia Shao]\u0026#40;https://shaoyijia.github.io/\u0026#41;, [Haowei Lin]\u0026#40;https://linhaowei1.github.io/\u0026#41;, [Hu Xu]\u0026#40;https://howardhsu.github.io/\u0026#41;, [Lei Shu]\u0026#40;https://leishu02.github.io/\u0026#41;, and [Bing Liu]\u0026#40;https://www.cs.uic.edu/~liub/\u0026#41;.)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUIC-Liu-Lab%2FContinualLM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FUIC-Liu-Lab%2FContinualLM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUIC-Liu-Lab%2FContinualLM/lists"}