Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/innodatalabs/redlite
Opinionated tool for benchmarking Conversational Language Models
https://github.com/innodatalabs/redlite
benchmark innodata llm red-teaming red-teaming-tools
Last synced: 3 days ago
JSON representation
Opinionated tool for benchmarking Conversational Language Models
- Host: GitHub
- URL: https://github.com/innodatalabs/redlite
- Owner: innodatalabs
- License: mit
- Created: 2024-01-18T21:48:48.000Z (12 months ago)
- Default Branch: master
- Last Pushed: 2024-10-23T18:07:10.000Z (3 months ago)
- Last Synced: 2024-10-23T19:37:51.552Z (3 months ago)
- Topics: benchmark, innodata, llm, red-teaming, red-teaming-tools
- Language: Python
- Homepage:
- Size: 1.83 MB
- Stars: 0
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome_ai_agents - Redlite - Opinionated tool for benchmarking Conversational Language Models (Building / Tools)
README
# RedLite
[![PyPI version](https://badge.fury.io/py/redlite.svg)](https://badge.fury.io/py/redlite)
[![Documentation](https://img.shields.io/badge/documentation-latest-brightgreen)](https://innodatalabs.github.io/redlite/)
[![Test and Lint](https://github.com/innodatalabs/redlite/actions/workflows/test.yaml/badge.svg)](https://github.com/innodatalabs/redlite)
[![GitHub Pages](https://github.com/innodatalabs/redlite/actions/workflows/docs.yaml/badge.svg)](https://github.com/innodatalabs/redlite)An opinionated toolset for testing Conversational Language Models.
## Documentation
## Usage
1. Install required dependencies
```bash
pip install redlite[all]
```2. Generate several runs (using Python scripting, see [examples](https://github.com/innodatalabs/redlite/tree/master/samples), and below)
3. Review and compare runs
```bash
redlite server --port
```4. Optionally, upload to Zeno
```bash
ZENO_API_KEY=zen_XXXX redlite upload
```## Python API
```python
import os
from redlite import run, load_dataset
from redlite.model.openai_model import OpenAIModel
from redlite.metric import MatchMetricmodel = OpenAIModel(api_key=os.environ["OPENAI_API_KEY"])
dataset = load_dataset("hf:innodatalabs/rt-gsm8k-gaia")
metric = MatchMetric(ignore_case=True, ignore_punct=True, strategy='prefix')run(model=model, dataset=dataset, metric=metric)
```_Note: the code above uses OpenAI model via their API.
You will need to register with OpenAI and get an API access key, then set it in the environment as `OPENAI_API_KEY`._## Goals
* simple, easy-to-learn API
* lightweight
* only necessary dependencies
* framework-agnostic (PyTorch, Tensorflow, Keras, Flax, Jax)
* basic analytic tools included## Develop
```bash
python -m venv .venv
. .venv/bin/activate
pip install -e .[dev,all]
```Make commands:
* test
* test-server
* lint
* wheel
* docs
* docs-server
* black## Zeno integration
Benchmarks can be uploaded to Zeno interactive AI evaluation platform :
```bash
redlite upload --project my-cool-project
```All tasks will be concatenated and uploaded as a single dataset, with extra fields:
* `task_id`
* `dataset`
* `metric`All models will be uploaded. If model was not tested on a specific task, a simulated zero-score dataframe is used instead.
Use `task_id` (or `dataset` as appropriate) to create task slices. Slices can be used to
navigate data or create charts.## Serving as a static website
UI server data and code can be exported to a local directory that then can be served statically.
This is useful for publishing as a static website on cloud storage (S3, Google Storage).
```bash
redlite server-freeze /tmp/my-server
gsutil -m rsync -R /tmp/my-server gs://{your GS bucket}
```Note that you have to configure cloud bucket in a special way, so that cloud provider serves it as a website. How to do this depends on
the cloud provider.## TODO
- [x] deps cleanup (randomname!)
- [x] review/improve module structure
- [x] automate CI/CD
- [x] write docs
- [x] publish docs automatically (CI/CD)
- [x] web UI styling
- [ ] better test server
- [ ] tests
- [x] Integrate HF models
- [x] Integrate OpenAI models
- [x] Integrate Anthropic models
- [x] Integrate AWS Bedrock models
- [ ] Integrate vLLM models
- [x] Fix data format in HF datasets (innodatalabs/rt-* ones) to match standard
- [ ] more robust backend API (future-proof)
- [ ] better error handling for missing deps
- [ ] document which deps we need when
- [ ] export to CSV
- [x] Upload to Zeno