Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Mayo-Radiology-Informatics-Lab/conflare

This is the repository for the CONFLARE (CONformal LArge language model REtrieval) Python package.
https://github.com/Mayo-Radiology-Informatics-Lab/conflare

Last synced: about 1 month ago
JSON representation

This is the repository for the CONFLARE (CONformal LArge language model REtrieval) Python package.

Host: GitHub
URL: https://github.com/Mayo-Radiology-Informatics-Lab/conflare
Owner: Mayo-Radiology-Informatics-Lab
License: mit
Created: 2024-04-02T16:50:44.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-04-09T13:46:52.000Z (9 months ago)
Last Synced: 2024-04-14T10:51:07.265Z (8 months ago)
Language: Python
Size: 1.11 MB
Stars: 6
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-conformal-prediction - code

README

        

# CONFLARE: CONFormal LArge language model REtrieval

This is the official repo for the [CONFLARE paper](https://arxiv.org/abs/2404.04287) and the related python package: `conflare`.

## Installation

[![Downloads](https://static.pepy.tech/badge/conflare)](https://pepy.tech/project/conflare)

```bash

pip install conflare

```

Here are the 3 main tasks this package helps you with:

1. Loading the source documents (+ cleaning and chunking them)

2. Creating (or loading) a Calibration set

3. Retrieval Augmented Generation by applying conformal prediction

## How to use

First, install the `conflare` package using `pip`. Then, use the following example as a starting point to use this package.

Example:

```python

# 1

import os

os.environ['OPENAI_API_KEY'] = 'your openai secret key'

# to use HuggingFace models w/o needing an openai key, look at the arguments section below.

import conflare

from conflare import initialize_pipeline

from conflare.conformal.calibration import create_calibration_records

from conflare.augmented_retrieval.rag import ConformalRetrievalQA

document_dir = './data/documents'

docs, qa_pipeline, vector_db = initialize_pipeline(document_dir)

# 2

calibration_records = create_calibration_records(

    docs,

    qa_pipeline=qa_pipeline,

    vector_db=vector_db,

    size=100,

    topic_of_interest="Deep Learning"

)

# 3

conformal_rag = ConformalRetrievalQA(

    qa_pipeline=qa_pipeline,

    vector_db=vector_db,

    calibration_records=calibration_records,

    error_rate=0.10,

    verbose=True

)

response, retrieved_docs = conformal_rag(

    "How can a transformer model be used in detection of COVID?"

)

print(response)

```

```

>>>

Input Error Rate: 10.00%

Selected cosine distance thereshold: 0.456

Number of retrieved documents: 2

A transformer model can be used in the detection of COVID-19 by analyzing medical images ...

```

If you have run this script once before and saved the calibration records to disk, you can use the following to load the calibration records. We've provided example `.pkl` files of generated questions and calibration recordings in the `./data/calibration_set/` directory of this repo.

```python

from conflare.conformal.calibration import QuestionEvaluation

q_evaluation = QuestionEvaluation.from_pickle(path_to_pickle)

calibration_records = q_evaluation.get_calibration_records()

```

## Arguments

Here are some of the more important arguments that the functions and classes in this package use.

You can also take a look at the definition of `initialize_pipeline` function to see most of them.

Looking at the definition of `initialize_pipeline`, you can see the sequence of the functions called inside it and use them in your own custom way if neccessary.

`model`: the model name used for QA and retreivals. If set to `gpt-*` models, it will use the OpenAI models and an OpenAI API Key will be required. It can also be set to models names on HuggingFace like `mistralai/Mistral-7B-Instruct-v0.1` to use HF models w/o needing a key. 




`embedding_model`: the model from `sentence-transformers` library to be used to create embeddings for text chunks and user questions.

## Citation

If you use this code in your research, please cite the following paper:

```

@article{conflare,

  title={CONFLARE: CONFormal LArge language model REtrieval},

  author={Pouria Rouzrokh and Shahriar Faghani and Cooper U. Gamble and Moein Shariatnia and Bradley J. Erickson},

  journal={arXiv preprint arXiv:2404.04287},

  year={2024},

  eprint={2404.04287},

  archivePrefix={arXiv},

  primaryClass={cs.CL}

}

```