https://github.com/swamikannan/llamaindex-finetuning-with-tensorboard

Monitoring Llama index finetuning losses via Tensorboard
https://github.com/swamikannan/llamaindex-finetuning-with-tensorboard

Last synced: 7 months ago
JSON representation

Monitoring Llama index finetuning losses via Tensorboard

Host: GitHub
URL: https://github.com/swamikannan/llamaindex-finetuning-with-tensorboard
Owner: SwamiKannan
License: mit
Created: 2024-03-01T09:18:20.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-03-06T11:13:24.000Z (over 1 year ago)
Last Synced: 2025-01-14T02:47:16.469Z (9 months ago)
Language: Python
Homepage:
Size: 546 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Adding a Tensorboard report to LlamaIndex Finetuning 







## Introduction

LlamaIndex is a really great open-source data framework that connects custom data sources to large language models (LLMs) for document Q&A, data augmented chatbots, and structured analytics. 

However, apart from the standard data ingestion, vector store and database integration tools, LlamaIndex also has a pretty good library for finetuning embeddings - llamaindex.finetune() 

However, the finetune library calls sentence-transformers  to do the actual training. And sentence transformers has been in the process of developing a framework for reporting for quite some time. Hence, currently there is no official repo / library to monitor training on sentence-transformers and hence llama-index.finetuning using Tensorboard 

Hence, I got a little impatient and built this repo to track finetuning using Tensorboard

#### Note1: 

Some of the code in this repo is based on Mogady's  PR  for Sentence Transformers. Huge credits to him for this submission.

#### Note2:

I have already submitted a PR  to Llamaindex based on changes to the repo. However, this is a standalone repo with no changes to the main sentence-transformers/ llamaindex library. However, those packages do need to be installed.

## Requirements:

1. llama-index finetune library

    ```

    pip install llama-index-finetuning

   ```

2. sentence-transformers

   ```

   pip install sentence-transformers

   ```

## Usage:

1. Get into the src directory

    ```

    cd src

    ```

2. Run the code below updating details for the input json file, model_output_path, model_id, epochs and the path where the logs should be stored (writer_path)

    ```

    from llama_index.core.evaluation import EmbeddingQAFinetuneDataset

    from llama_index.finetuning import SentenceTransformersFinetuneEngine

    from sftb import TBSTFE

    train_dataset = EmbeddingQAFinetuneDataset.from_json(json_file)

    finetuner = TBSTFE(

      dataset=train_dataset,

      model_id=model_id,

      model_output_path=model_output_path,

      epochs=epochs,

      writer_path=w_path

    )

    finetuner.finetune()

    finetuned_model = finetuner.get_finetuned_model()

    finetuned_model.to_json()

    ```

## Outputs:

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/swamikannan/llamaindex-finetuning-with-tensorboard

Awesome Lists containing this project

README