{"id":21701421,"url":"https://github.com/aman-17/medisoap","last_synced_at":"2026-05-15T23:41:04.341Z","repository":{"id":243282111,"uuid":"811973031","full_name":"aman-17/MediSOAP","owner":"aman-17","description":"FineTuning LLMs on conversational medical dataset.","archived":false,"fork":false,"pushed_at":"2024-07-03T02:49:17.000Z","size":41786,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-17T21:41:04.416Z","etag":null,"topics":["fine-tuning","generative-ai","llama","llama-2","llm-training","lora","medical","peft","peft-fine-tuning-llm","qlora","summarization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aman-17.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-07T17:18:38.000Z","updated_at":"2025-02-14T04:43:53.000Z","dependencies_parsed_at":"2024-08-03T09:03:53.801Z","dependency_job_id":null,"html_url":"https://github.com/aman-17/MediSOAP","commit_stats":{"total_commits":16,"total_committers":1,"mean_commits":16.0,"dds":0.0,"last_synced_commit":"b6a123a367fb3cb962cf4c8ae8a1f02993436d2a"},"previous_names":["aman-17/medisoap"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman-17%2FMediSOAP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman-17%2FMediSOAP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman-17%2FMediSOAP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman-17%2FMediSOAP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aman-17","download_url":"https://codeload.github.com/aman-17/MediSOAP/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244645483,"owners_count":20486986,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fine-tuning","generative-ai","llama","llama-2","llm-training","lora","medical","peft","peft-fine-tuning-llm","qlora","summarization"],"created_at":"2024-11-25T20:19:52.076Z","updated_at":"2026-05-15T23:41:04.316Z","avatar_url":"https://github.com/aman-17.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MediSOAP: MediSOAP: Enhanced Clinical Note Generation with Fine-Tuned Llama2\n\n## Project Overview\n\nThis project involves fine-tuning the Llama2-7B model from scratch using LoRA and QLoRA techniques. The goal is to generate structured SOAP (Subjective, Objective, Assessment, Plan) notes from patient-doctor conversations. The dataset used for training comprises transcribed medical dialogues that follow the SOAP note format.\n\n## Table of Contents\n1. [Introduction](#introduction)\n2. [Prerequisites](#prerequisites)\n3. [Installation](#installation)\n4. [Dataset](#dataset)\n5. [Fine-Tuning Process](#fine-tuning-process)\n6. [Evaluation](#evaluation)\n7. [Usage](#usage)\n8. [Results](#results)\n9. [License](#license)\n\n## Introduction\nSOAP notes are a method of documentation employed by healthcare providers to write out notes in a patient's chart, along with other common formats. This project automates the generation of SOAP notes from patient-doctor conversations using a fine-tuned Llama2-7B model. The model leverages Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA) for efficient training.\n\n## Prerequisites\n- Python 3.11 or higher\n- PyTorch 1.10.0 or higher\n- CUDA 10.2 or higher (for GPU support)\n\n## Installation\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/aman-17/MediSOAP.git\n   cd MediSOAP\n   ```\n\n2. Create and activate a virtual environment:\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`\n   ```\n\n3. Install the required packages:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n## Dataset\n\nThe dataset used for this project is a collection of patient-doctor conversation transcripts formatted into SOAP notes. The dataset must be preprocessed into the required format before training.\n\nTo preprocess your custom dataset, follow the format of train.jsonl, then:\n1. Place your raw data files in the `data/` directory.\n2. Run the preprocessing script:\n   ```bash\n   python data_preprocessing.py\n   ```\n\n## Fine-Tuning Process\n\nFine-tuning involves adapting the pre-trained Llama2-7B and phi2 model to our specific task using LoRA technique.\n\n### Steps:\n\n1. **Data Preparation**:\n   Ensure your preprocessed data is in the `data/` directory.\n\n2. **Training**:\n   Run the training script:\n   ```bash\n   python train_phi2.py\n   ```\n\n## Evaluation\n\nEvaluate the model's performance on a test dataset:\n```bash\npython evaluate.py --model-path path/to/fine-tuned-model --test-data path/to/test-data\n```\n\nMetrics such as BLEU, ROUGE, and accuracy can be used to assess the model's performance.\n\n## Usage\n\nTo generate SOAP notes from new patient-doctor conversations, use the inference script:\n```bash\npython generate.py --model-path path/to/fine-tuned-model --input path/to/conversation.txt\n```\n\nThe output will be a structured SOAP note based on the input conversation.\n\n## Results\n\nSummarize the results obtained from the model's performance on the test dataset, including key metrics and example outputs.\n\n## Contributing\n\nWe welcome contributions from the community. To contribute, please follow these steps:\n1. Fork the repository.\n2. Create a new branch (`git checkout -b feature-branch`).\n3. Make your changes.\n4. Commit your changes (`git commit -m 'Add new feature'`).\n5. Push to the branch (`git push origin feature-branch`).\n6. Create a new Pull Request.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n---\n\nFeel free to update this README with additional details as needed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faman-17%2Fmedisoap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faman-17%2Fmedisoap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faman-17%2Fmedisoap/lists"}