{"id":18752747,"url":"https://github.com/tlkh/t2t-tuner","last_synced_at":"2025-04-13T00:31:27.992Z","repository":{"id":46830069,"uuid":"408288132","full_name":"tlkh/t2t-tuner","owner":"tlkh","description":"Convenient Text-to-Text Training for Transformers","archived":false,"fork":false,"pushed_at":"2021-12-10T11:36:57.000Z","size":17654,"stargazers_count":19,"open_issues_count":2,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-26T18:52:31.936Z","etag":null,"topics":["gpt","huggingface","language-model","nlp","pytorch","t5","transformers"],"latest_commit_sha":null,"homepage":"https://tlkh.github.io/t2t-tuner/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tlkh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-20T02:27:30.000Z","updated_at":"2024-01-04T17:01:33.000Z","dependencies_parsed_at":"2022-09-26T21:31:23.812Z","dependency_job_id":null,"html_url":"https://github.com/tlkh/t2t-tuner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlkh%2Ft2t-tuner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlkh%2Ft2t-tuner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlkh%2Ft2t-tuner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlkh%2Ft2t-tuner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tlkh","download_url":"https://codeload.github.com/tlkh/t2t-tuner/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650590,"owners_count":21139670,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gpt","huggingface","language-model","nlp","pytorch","t5","transformers"],"created_at":"2024-11-07T17:22:25.153Z","updated_at":"2025-04-13T00:31:22.982Z","avatar_url":"https://github.com/tlkh.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# t2t-tuner\n\nConvenient Text-to-Text Training for Transformers\n\n```shell\npip install t2t-tuner\n```\n\nRequires PyTorch: either follow [PyTorch installation instructions](https://pytorch.org/get-started/locally/) or [use a PyTorch container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch).\n\n## Features\n\n* Easy training for text-to-text (and text generation) tasks\n* Training methods/features:\n  * Supervised fine-tuning\n  * Gradient checkpointing\n  * Model parallelism\n  * Soft prompt tuning ([based on this paper](https://arxiv.org/abs/2104.08691))\n  * Freeze encoder/decoder/embeddings\n  * Move embeddings to CPU\n  * Print model summary\n  * [DeepSpeed](https://github.com/microsoft/DeepSpeed)\n\n\nBased on the wonderful [HuggingFace Transformers](https://github.com/huggingface/transformers) library. Tested on T5 and GPT type of models. In theory, it should work with other models that support [AutoModelForSeq2SeqLM](https://huggingface.co/transformers/model_doc/auto.html#automodelforseq2seqlm) or [AutoModelForCausalLM](https://huggingface.co/transformers/model_doc/auto.html#automodelforcausallm) as well.\n\nThe Trainer in this library here is a higher level interface to work based on HuggingFace's [run_translation.py](https://github.com/huggingface/transformers/tree/master/examples/pytorch/translation) script for text-to-text generation tasks. I decided I want a more more convenient interface for training and inferencing, along with access to things like gradient checkpointing and model parallel to fit larger models - these are already in the HuggingFace library but not exposed in the script. I also added in some features that I wanted (prompt tuning, model summary), integrated it with autoregressive LM training and wrapped it as a single library that can be pip installed. \n\n## Examples\n\n### Training Models\n\n```python\nimport t2t\n\ntrainer_arguments = t2t.TrainerArguments(model_name_or_path=\"t5-small\",\n                                         train_file=YOUR_DATASET)\n\ntrainer = t2t.Trainer(arguments=trainer_arguments)\n\n# train without validation\ntrainer.train(valid=False)\n```\n\nFor more concrete examples, check out the notebooks linked below:\n\n* [Simple example](examples/tldr.ipynb)\n* [Simple example on Colab](https://colab.research.google.com/drive/1_BsldxfPl6lVh2dB9VLOvARRxfswfIzL?usp=sharing)\n* [Soft Prompt Tuning](examples/soft_prompt_tuning.ipynb)\n* [Gradient checkpointing](examples/gradient_checkpointing.ipynb)\n* [Model parallelism](examples/model_parallel.ipynb)\n\n### Data Format\n\n**Seq2Seq Training**\n\n```json\n{\"translation\": {\"s\": \"TEXT\", \"t\": \"LABEL\"}}\n```\n\n* The format of data is json-lines, following HuggingFace original script. Each example is one line.\n* Define the source and target IDs in `TrainingArguments.source_id` and `TrainingArguments.target_id` (defaults to `s` and `t`).\n* Include the prefix in the data file, or define the prefix to prepend to the text in `TrainingArguments.prefix`.\n* [Example notebook for data preprocessing from CSV file](sample_data/make_seq2seq_dataset.ipynb)\n\n**Autoregressive LM Training**\n\n* Any text file will work\n\n## Training Large Models\n\nThis section will outline how to train large language models (\u003e 1 bil parameters) on relatively simple setups.\n\nSome notes for the configurations reported below:\n\n* GradCheckpoint: Gradient checkpointing to reduce VRAM usage, but increase computation (set `TrainerArguments.gradient_checkpointing`).\n* FreezeEmbeds: Freeze (do not train) embedding layer to reduce VRAM usage and computation (set `trainer.freeze(embeddings=True)`).\n* Adafactor uses less VRAM than Adam, but is slightly slower and can converge slightly differently.\n* You can use gradient accumulation (`TrainingArguments.gradient_accumulation_steps`) to make up to a larger batch size if needed. The batch sizes reported are **without** gradient accumulation.\n* Moving embeddings to CPU seems to have almost no impact on both VRAM usage and performance, therefore is not used.\n\n### GPT Models\n\nSome GPT configurations that were tested to able to train on a single RTX 3090 (24GB) card (without DeepSpeed):\n\n| Model | Params | Precision | Optimizer | InputLen | BatchSize | Other |\n| ----- | ------ | --------- | --------- | --------- | --------- | ----- |\n| [gpt2](https://huggingface.co/gpt2-xl) | 1.5b | FP16 | Adafactor | 128 | 4 | None |\n| [gpt2](https://huggingface.co/gpt2-xl) | 1.5b | FP16 | Adafactor | 512 | 1 | None |\n| [gpt2](https://huggingface.co/gpt2-xl) | 1.5b | FP16 | Adafactor | 1024 | 4 | GradCheckpoint |\n| [gpt-neo](https://huggingface.co/EleutherAI/gpt-neo-1.3B) | 1.3b | FP16 | Adafactor | 1024 | 1 | None |\n| [gpt-neo](https://huggingface.co/EleutherAI/gpt-neo-1.3B) | 1.3b | FP16 | Adafactor | 2048 | 4 | GradCheckpoint |\n| [gpt-neo](https://huggingface.co/EleutherAI/gpt-neo-2.7B) | 2.7b | FP16 | Adafactor | 2048 | 4 | GradCheckpoint,FreezeEmbeds |\n\n### T5 Models\n\nSome T5 configurations that were tested to able to train on a single RTX 3090 (24GB) card (without DeepSpeed):\n\n| Model | Params | Precision | Optimizer | Seq2SeqLen | BatchSize | Other |\n| ----- | ------ | --------- | --------- | --------- | --------- | ----- |\n| [t5](https://huggingface.co/t5-3b) | 3b | FP32 | Adafactor | 128-\u003e128 | 1 | FreezeEmbeds |\n| [t5](https://huggingface.co/t5-3b) | 3b | FP32 | Adafactor | 128-\u003e128 | 1 | GradCheckpoint |\n| [t5](https://huggingface.co/t5-3b) | 3b | FP32 | Adafactor | 128-\u003e128 | 128 | GradCheckpoint,FreezeEmbeds |\n| [t5](https://huggingface.co/t5-3b) | 3b | FP32 | Adafactor | 512-\u003e512 | 32 | GradCheckpoint,FreezeEmbeds |\n\n**Model Parallelism for T5-11b models**\n\nUsing this library, you also can fine-tune the [t5-11b checkpoints](https://huggingface.co/models?search=11b) quite easily (single node) with the following settings (without Deepspeed):\n\n* Suggested checkpoint: [t5-11b](https://huggingface.co/t5-11b)\n* Batch size 1 + gradient accumulation to make up to whatever batch size you need.\n* Batch size of 8 is possible with gradient checkpointing, but doesn't improve the speed.\n* Model parallel across multiple GPUs:\n  * At least ~90 GB of VRAM\n  * Examples: 8x 16GB or 4x 32GB GPU (V100), or 2x 48GB (RTX8000/A6000)\n* FP32 (no need for mixed precision/FP16)\n  * FP16 would actually be better, but the pretrained T5 checkpoints don't play well with FP16.\n  * On Ampere cards (RTX30XX, A100, A6000), TF32 is used, which is faster than FP32 and doesn't suffer from the same issues as FP16.\n  * Likely reason: the existing activations are too large ([github issue tracking](https://github.com/huggingface/transformers/pull/10956#issuecomment-813162960), [some more info](https://discuss.huggingface.co/t/mixed-precision-for-bfloat16-pretrained-models/5315))\n\n![Model parallel T5-11b](images/model_parallel.jpg)\n\nNote that depending on your system, the loading time for the checkpoint (46GB) can be very long. You'll need ample CPU RAM (at least ~90GB) to load it successfully. \n\n## ONNX RT\n\nONNX RT works with some models (not T5, yet) and can provide a small boost in speed.\n\nInstall ORT, then set `TrainingArguments.torch_ort=True`\n\n```shell\npip install torch-ort -f https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_stable_torch190.cu111.html\n\npython -m torch_ort.configure\n```\n\n## Development\n\n**Building Package**\n\n```shell\npython3 -m pip install --upgrade build twine\npython3 -m build\npython3 -m twine upload dist/*\n```\n\n## Disclaimers\n\nThis library as developed as a personal project for my own use. Please feel free to fork or use it for your own purposes as well. I will not take responsibility for any mishaps that occur as a result of this library's usage. \n\nNote for 3090 FE cards, if your fans hit 100%, it means your VRAM temps are high (\u003e100 deg C). Training for long hours at these temperatures in theory should be fine, but if you want a peace of mind (like me), you can lower the power limit incur minor impact on training speeds. As long as your fans never hit 100%, your VRAM temperatures should be good. For example, to lower power limit to 300W (from 350W):\n\n```shell\nsudo nvidia-smi -pl 300\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlkh%2Ft2t-tuner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftlkh%2Ft2t-tuner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlkh%2Ft2t-tuner/lists"}