{"id":15014083,"url":"https://github.com/muneeb1030/finetune-tiny-llama","last_synced_at":"2026-04-08T14:03:07.857Z","repository":{"id":248226846,"uuid":"824753150","full_name":"Muneeb1030/FineTune-Tiny-Llama","owner":"Muneeb1030","description":"Fine-tuning the Tiny Llama model to mimic my professor's writing style using the Llama Factory. The project involves data collection, preprocessing, preparation, fine-tuning, and evaluation.","archived":false,"fork":false,"pushed_at":"2024-07-17T09:39:25.000Z","size":399,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-31T00:35:00.589Z","etag":null,"topics":["data","data-preparation","data-preprocessing","finetuning","llama-factory","llm","pymupdf","selenium-python","spacy","tinyllama","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Muneeb1030.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-05T21:30:28.000Z","updated_at":"2025-11-14T17:34:56.000Z","dependencies_parsed_at":"2024-07-13T08:53:57.736Z","dependency_job_id":"c6231ff3-27c8-45e1-981a-d05ebda8aa43","html_url":"https://github.com/Muneeb1030/FineTune-Tiny-Llama","commit_stats":{"total_commits":3,"total_committers":2,"mean_commits":1.5,"dds":"0.33333333333333337","last_synced_commit":"34a920c95422846f861239e4ce0f8cf76af9d80a"},"previous_names":["muneeb1030/finetune-tiny-llama"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Muneeb1030/FineTune-Tiny-Llama","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muneeb1030%2FFineTune-Tiny-Llama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muneeb1030%2FFineTune-Tiny-Llama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muneeb1030%2FFineTune-Tiny-Llama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muneeb1030%2FFineTune-Tiny-Llama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Muneeb1030","download_url":"https://codeload.github.com/Muneeb1030/FineTune-Tiny-Llama/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Muneeb1030%2FFineTune-Tiny-Llama/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31558389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T10:21:54.569Z","status":"ssl_error","status_checked_at":"2026-04-08T10:21:38.171Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-preparation","data-preprocessing","finetuning","llama-factory","llm","pymupdf","selenium-python","spacy","tinyllama","webscraping"],"created_at":"2024-09-24T19:45:10.145Z","updated_at":"2026-04-08T14:03:07.820Z","avatar_url":"https://github.com/Muneeb1030.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fine-Tuning Tiny Llama using Llama Factory To Mimic Professor's Writing Style\n## Overview\nThis project aims to fine-tune the Tiny Llama model using the Llama Factory to mimic my professor's writing style. The process involves several phases, including data collection, preprocessing, preparation, model fine-tuning, and evaluation. The final goal is to create a model that can generate text in the style of my professor's academic writings.\n\n## Phases of the Project\n### Phase 1: Data Collection\nThe first step in this project was to collect data by scraping my professor's Google Scholar page. The objective was to gather a comprehensive set of research articles published by the professor.\n\n- **Tool Used:** Selenium\n- **Details:** Selenium was used to automate the process of accessing the Google Scholar page and downloading the available PDFs of the research articles.\n### Phase 2: Data Preprocessing\nAfter collecting the PDFs, the next step was to preprocess these documents to ensure they were in a usable format for training the model.\n\n- **Purpose:** Normalize the content while preserving the writing style.\n- **Tools Used:** pyMuPDF\n### Steps:\n- Remove page headers, footers, images, and tables along with their captions.\n- Convert the remaining content into paragraph format, as individual words and phrases are insufficient for capturing writing style.\n### Phase 3: Data Preparation\nThe preprocessed data needed to be formatted according to the requirements of the Llama Factory model training process.\n\n- **Initial Tools Tried:** spaCy, TF-IDF, BERT\n- **Tool That Worked:** OpenAI API\n### Process:\n- Use the OpenAI API to generate the required data format.\n- Ensure that the data is structured correctly for input into the Llama Factory model.\n### Phase 4: Model Fine-Tuning\nWith the data prepared, the next phase involved fine-tuning the Tiny Llama model.\n\n- **Environment:** Google Colab\n- **Tools Used:** Llama Factory\n### Steps:\n- Set up the Google Colab notebook and import necessary libraries.\n- Load the Llama Factory UI and integrate the dataset.\n- Define the prompt format and other configurations required by Llama Factory.\n- Run the fine-tuning process to train the Tiny Llama model on the professor's writing style.\n### Phase 5: Model Evaluation\nThe final phase focused on evaluating the performance of the fine-tuned model to ensure it accurately mimics the professor's writing style.\n\n### Process:\n- Generate sample texts using the fine-tuned model.\n- Compare the generated texts with the original writings to assess similarity in style and content.\n- Make any necessary adjustments and re-train if needed.\n## Getting Started\n### Prerequisites\n- Python 3.x\n- Selenium\n- pyMuPDF\n- OpenAI API\n### Installation\n- Clone the repository\n```bash\ngit clone https://github.com/yourusername/finetuning-tiny-llama.git\n```\n- Install the necessary Python packages\n```bash\npip install selenium pymupdf openai\n```\n## Contributing\nContributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.\n\n## Contact\nFor any questions or suggestions, please contact me at muhammadmuneeburrehman.vercel.app\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuneeb1030%2Ffinetune-tiny-llama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmuneeb1030%2Ffinetune-tiny-llama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuneeb1030%2Ffinetune-tiny-llama/lists"}