https://github.com/innodatalabs/redlite

Opinionated tool for benchmarking Conversational Language Models
https://github.com/innodatalabs/redlite

benchmark innodata llm red-teaming red-teaming-tools

Last synced: 4 months ago
JSON representation

Opinionated tool for benchmarking Conversational Language Models

Host: GitHub
URL: https://github.com/innodatalabs/redlite
Owner: innodatalabs
License: mit
Created: 2024-01-18T21:48:48.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-10-23T18:07:10.000Z (7 months ago)
Last Synced: 2024-10-23T19:37:51.552Z (7 months ago)
Topics: benchmark, innodata, llm, red-teaming, red-teaming-tools
Language: Python
Homepage:
Size: 1.83 MB
Stars: 0
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome_ai_agents - Redlite - Opinionated tool for benchmarking Conversational Language Models (Building / Tools)

README

        # RedLite

[![PyPI version](https://badge.fury.io/py/redlite.svg)](https://badge.fury.io/py/redlite)

[![Documentation](https://img.shields.io/badge/documentation-latest-brightgreen)](https://innodatalabs.github.io/redlite/)

[![Test and Lint](https://github.com/innodatalabs/redlite/actions/workflows/test.yaml/badge.svg)](https://github.com/innodatalabs/redlite)

[![GitHub Pages](https://github.com/innodatalabs/redlite/actions/workflows/docs.yaml/badge.svg)](https://github.com/innodatalabs/redlite)

An opinionated toolset for testing Conversational Language Models.

## Documentation

## Usage

1. Install required dependencies

    ```bash

    pip install redlite[all]

    ```

2. Generate several runs (using Python scripting, see [examples](https://github.com/innodatalabs/redlite/tree/master/samples), and below)

3. Review and compare runs

    ```bash

    redlite server --port 

    ```

4. Optionally, upload to Zeno

    ```bash

    ZENO_API_KEY=zen_XXXX redlite upload

    ```

## Python API

```python

import os

from redlite import run, load_dataset

from redlite.model.openai_model import OpenAIModel

from redlite.metric import MatchMetric

model = OpenAIModel(api_key=os.environ["OPENAI_API_KEY"])

dataset = load_dataset("hf:innodatalabs/rt-gsm8k-gaia")

metric = MatchMetric(ignore_case=True, ignore_punct=True, strategy='prefix')

run(model=model, dataset=dataset, metric=metric)

```

_Note: the code above uses OpenAI model via their API.

You will need to register with OpenAI and get an API access key, then set it in the environment as `OPENAI_API_KEY`._

## Goals

* simple, easy-to-learn API

* lightweight

* only necessary dependencies

* framework-agnostic (PyTorch, Tensorflow, Keras, Flax, Jax)

* basic analytic tools included

## Develop

```bash

python -m venv .venv

. .venv/bin/activate

pip install -e .[dev,all]

```

Make commands:

* test

* test-server

* lint

* wheel

* docs

* docs-server

* black

## Zeno  integration

Benchmarks can be uploaded to Zeno interactive AI evaluation platform :

```bash

redlite upload --project my-cool-project

```

All tasks will be concatenated and uploaded as a single dataset, with extra fields:

* `task_id`

* `dataset`

* `metric`

All models will be uploaded. If model was not tested on a specific task, a simulated zero-score dataframe is used instead.

Use `task_id` (or `dataset` as appropriate) to create task slices. Slices can be used to

navigate data or create charts.

## Serving as a static website

UI server data and code can be exported to a local directory that then can be served statically.

This is useful for publishing as a static website on cloud storage (S3, Google Storage).

```bash

redlite server-freeze /tmp/my-server

gsutil -m rsync -R /tmp/my-server gs://{your GS bucket}

```

Note that you have to configure cloud bucket in a special way, so that cloud provider serves it as a website. How to do this depends on

the cloud provider.

## TODO

- [x] deps cleanup (randomname!)

- [x] review/improve module structure

- [x] automate CI/CD

- [x] write docs

- [x] publish docs automatically (CI/CD)

- [x] web UI styling

- [ ] better test server

- [ ] tests

- [x] Integrate HF models

- [x] Integrate OpenAI models

- [x] Integrate Anthropic models

- [x] Integrate AWS Bedrock models

- [ ] Integrate vLLM models

- [x] Fix data format in HF datasets (innodatalabs/rt-* ones) to match standard

- [ ] more robust backend API (future-proof)

- [ ] better error handling for missing deps

- [ ] document which deps we need when

- [ ] export to CSV

- [x] Upload to Zeno

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/innodatalabs/redlite

Awesome Lists containing this project

README