https://github.com/swamikannan/llamaindex-finetuning-with-tensorboard
Monitoring Llama index finetuning losses via Tensorboard
https://github.com/swamikannan/llamaindex-finetuning-with-tensorboard
Last synced: 7 months ago
JSON representation
Monitoring Llama index finetuning losses via Tensorboard
- Host: GitHub
- URL: https://github.com/swamikannan/llamaindex-finetuning-with-tensorboard
- Owner: SwamiKannan
- License: mit
- Created: 2024-03-01T09:18:20.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-03-06T11:13:24.000Z (over 1 year ago)
- Last Synced: 2025-01-14T02:47:16.469Z (9 months ago)
- Language: Python
- Homepage:
- Size: 546 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Adding a Tensorboard report to LlamaIndex Finetuning
![]()
## Introduction
LlamaIndex is a really great open-source data framework that connects custom data sources to large language models (LLMs) for document Q&A, data augmented chatbots, and structured analytics.
However, apart from the standard data ingestion, vector store and database integration tools, LlamaIndex also has a pretty good library for finetuning embeddings - llamaindex.finetune()However, the finetune library calls sentence-transformers to do the actual training. And sentence transformers has been in the process of developing a framework for reporting for quite some time. Hence, currently there is no official repo / library to monitor training on sentence-transformers and hence llama-index.finetuning using Tensorboard
Hence, I got a little impatient and built this repo to track finetuning using Tensorboard
#### Note1:
Some of the code in this repo is based on Mogady's PR for Sentence Transformers. Huge credits to him for this submission.#### Note2:
I have already submitted a PR to Llamaindex based on changes to the repo. However, this is a standalone repo with no changes to the main sentence-transformers/ llamaindex library. However, those packages do need to be installed.## Requirements:
1. llama-index finetune library
```
pip install llama-index-finetuning
```
2. sentence-transformers
```
pip install sentence-transformers
```## Usage:
1. Get into the src directory
```
cd src
```
2. Run the code below updating details for the input json file, model_output_path, model_id, epochs and the path where the logs should be stored (writer_path)
```
from llama_index.core.evaluation import EmbeddingQAFinetuneDataset
from llama_index.finetuning import SentenceTransformersFinetuneEngine
from sftb import TBSTFEtrain_dataset = EmbeddingQAFinetuneDataset.from_json(json_file)
finetuner = TBSTFE(
dataset=train_dataset,
model_id=model_id,
model_output_path=model_output_path,
epochs=epochs,
writer_path=w_path
)
finetuner.finetune()
finetuned_model = finetuner.get_finetuned_model()
finetuned_model.to_json()
```## Outputs: