{"id":16813778,"url":"https://github.com/albertoimpl/llm-private-fine-tuning","last_synced_at":"2025-08-09T16:13:13.268Z","repository":{"id":189502842,"uuid":"680797563","full_name":"Albertoimpl/llm-private-fine-tuning","owner":"Albertoimpl","description":"Privately fine-tune an LLM using Parameter Efficient Fine-Tuning with private data.","archived":false,"fork":false,"pushed_at":"2023-10-30T12:46:03.000Z","size":2642,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-23T21:28:52.394Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Albertoimpl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-20T12:45:16.000Z","updated_at":"2023-11-01T13:38:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"8a097cde-7372-4773-8cc4-9a33418edc95","html_url":"https://github.com/Albertoimpl/llm-private-fine-tuning","commit_stats":null,"previous_names":["albertoimpl/llm-private-fine-tuning"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Albertoimpl%2Fllm-private-fine-tuning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Albertoimpl%2Fllm-private-fine-tuning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Albertoimpl%2Fllm-private-fine-tuning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Albertoimpl%2Fllm-private-fine-tuning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Albertoimpl","download_url":"https://codeload.github.com/Albertoimpl/llm-private-fine-tuning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244029596,"owners_count":20386443,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T10:27:48.412Z","updated_at":"2025-03-17T11:42:50.285Z","avatar_url":"https://github.com/Albertoimpl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Private Fine-Tuning\n\nPrivately fine-tune an LLM using Parameter Efficient Fine-Tuning with private data in any Kubernetes cluster and in\nKubeFlow.\n\n```mermaid\ngraph TD;\n    dataset-relocation-server--\u003edataset-pvc[(dataset-pvc)];\n    model-relocation-server--\u003ebase-model-pvc[(base-model-pvc)];\n    tokenizer-relocation-server--\u003ebase-tokenizer-pvc[(base-tokenizer-pvc)];\n    dataset-pvc--\u003emodel-peft-server;\n    base-model-pvc--\u003emodel-peft-server;\n    base-tokenizer-pvc--\u003emodel-peft-server;\n    \n    model-peft-server--\u003efine-tuned-model-pvc[(fine-tuned-model-pvc)];\n\n    model-reference-relocation-server--\u003ereference-model-pvc[(reference-model-pvc)];\n    reference-model-pvc--\u003emodel-evaluation-server;\n\n    tensorboard(((tensorboard)))--\u003efine-tuned-model-pvc;\n    fine-tuned-model-pvc--\u003emodel-evaluation-server;\n    fine-tuned-model-pvc--\u003einference-server;\n    base-tokenizer-pvc--\u003einference-server;\n```\n\n## Running locally with KinD\n\nThis project is not a complete end-to-end Machine Learning solution, all the main steps can be run locally\nwith KinD. Including models relocation, evaluation, fine-tuning, and inference.\n\n### Downloading the models and tokenizers\n\nTo download the base and reference models and tokenizers, a script is provided that will automatically download `gpt2`\nand `gpt2-large`:\n\nTo install the required script dependencies:\n\n```bash\npip install -r scripts/requirements.txt\n```\n\nTo run the script to download the models and tokenizers:\n\n```bash\npython ./scripts/download-models.py\n```\n\n### Creating the local cluster\n\nOnce the models and tokenizers are downloaded we can create our local KinD cluster:\n\n```bash\nkind create cluster\n```\n\n```bash\nCreating cluster \"kind\" ...\n ✓ Ensuring node image (kindest/node:v1.27.3) 🖼\n ✓ Preparing nodes 📦\n ✓ Writing configuration 📜\n ✓ Starting control-plane 🕹️\n ✓ Installing CNI 🔌\n ✓ Installing StorageClass 💾\nSet kubectl context to \"kind-kind\"\nYou can now use your cluster with:\n\nkubectl cluster-info --context kind-kind\n```\n\n### Installing\n\nWith Skaffold everything can be built and run with one command for local iteration, the first time will take a while\nbecause all the images are being created.\n\n```bash\nskaffold run --port-forward=true\n```\n\n\u003e The evaluation server takes over 80 minutes in a CPU but since the inference server is created in parallel, we can use\n\u003e it while evaluation happens.\n\n\nAfter running the `skaffold` command, it should display something like:\n\n```bash\nWaiting for deployments to stabilize...\n - deployment/model-inference-server is ready.\nDeployments stabilized in 4.135 seconds\nPort forwarding service/model-inference-service in namespace default, remote port 5000 -\u003e http://127.0.0.1:5001\n```\n\nAnd the inference server is ready to receive requests:\n\n```bash\n curl -XPOST 127.0.0.1:5001/prompt -d '{\"input_prompt\":\"What is the best hotel in Seville?\"}' -H 'Content-Type: application/json'\n```\n\n## Running in KubeFlow\n\nKubeFlow automates all the creation of Kubernetes objects, and their synchronization and adds utilities to help us with\nexperiments and visualization in TensorBoard.\n\n### Building the pipeline\n\n[Using the notebook](kubeflow/GeneratePipeline.ipynb) or running [pipeline.py](kubeflow/pipeline.py)\n\n![Built pipeline](./docs/pipeline.png)\n\n### Visualising data in TensorBoard\n\nSelect the PVC from `fine-tuned-model-pvc`.\nWith the following mount path: `model/gpt2/logs/`.\n\n\u003e The mount path will vary depending on the chosen model.\n\n\u003e Since some environments can't easily create Persistent Volumes with `ReadWriteMany`, we have to wait for completion or\n\u003e to delete the board once we analyzed it.\n\n![Example output in TensorBoard](./docs/tensorboard.png)\n\n## [Pushing components to registries](docs/registries.md)\n\nAt the moment, when running locally with KinD, it will load the images from a local registry, and when using KubeFlow,\nthe images will be downloaded from ghcr.io.\nIf you would like to use push your own images, [this guide](docs/registries.md) should cover everything.\n\n## Parameter-Efficient Fine-Tuning\n\nParameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models to\nvarious downstream applications without fine-tuning all the model's parameters.\n\nEnhancing a Large Language Model with knowledge and relations from private data can be challenging and the outcomes will\nvary depending on the technique used. Retraining a whole model is not only a very costly operation, but it also can lead\nto [Catastrophic Forgetting](https://arxiv.org/abs/2308.08747), making the model behave worse than before training.\n\nRecent State-of-the-Art PEFT techniques achieve performance comparable to that of full fine-tuning.\nFor this example, we have chosen [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)\nto freeze some of the parameters or just add new ones on top, that way the trainable weights are smaller, and the\nfine-tuning can be\ndone in a single GPU, and it will be less prone to catastrophic forgetting.\nThe LoRa rank chosen for this experiment was 16, but it should have been treated as a hyperparameter since, according to\nthe paper, with a rank of 4 or 8 should perform well for our use case.\n\n## Model evaluation\n\nThe evaluation of Large Language Models is still in an early stage of development and an active area of research.\n\nThe evaluation performed in `model-evaluation-server` evaluates the model capabilities, in order to understand if we are\nimproving or not and by how much.\n\nFor this reason, we are not only using our base and fine-tuned models but also a reference model with larger\ncapabilities than our base one.\n\n### Recall-Oriented Understudy for Gisting Evaluation\n\n[ROUGE](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/was2004.pdf) is a set of metrics used for\nevaluating automatic summarization and machine translation.\nThis metric helps determine the quality of a summary by comparing it to other summaries created by humans.\n\n### General Language Understanding Evaluation with Corpus of Linguistic Acceptability\n\n[GLUE](https://arxiv.org/pdf/1905.00537.pdf) is a collection of resources for training, evaluating, and analyzing\nnatural language understanding systems.\nWe have used The Corpus of Linguistic Acceptability, or COLA, consists of English acceptability judgments drawn\nfrom books and journal articles on linguistic theory.\nTo help us determine whether the output of the LLM is a grammatically correct English sentence.\n\n### Perplexity\n\nThe [Perplexity score](https://en.wikipedia.org/wiki/Perplexity) is one of the most common metrics for evaluating LLMs.\nIt measures how much the model was perplexed after seeing new data. The lower the perplexity, the better the training\nwent.\nThere are some\ninteresting [correlation of word error rate and perplexity](https://www.sciencedirect.com/science/article/abs/pii/S0167639301000413?via%3Dihub).\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falbertoimpl%2Fllm-private-fine-tuning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falbertoimpl%2Fllm-private-fine-tuning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falbertoimpl%2Fllm-private-fine-tuning/lists"}