https://github.com/servicenow/insight-bench

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/servicenow/insight-bench
Owner: ServiceNow
License: mit
Created: 2024-06-11T19:55:37.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2025-08-05T18:49:37.000Z (11 months ago)
Last Synced: 2025-08-05T20:36:55.043Z (11 months ago)
Language: Jupyter Notebook
Size: 21.1 MB
Stars: 51
Watchers: 2
Forks: 13
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Insight-Bench

![Banner](data/banner.jpg)

## Evaluating Data Analytics Agents Through Multi-Step Insight Generation

[[Paper]](https://arxiv.org/pdf/2407.06423)[[Website]](https://insightbench.github.io/)[[Dataset]](https://huggingface.co/datasets/ServiceNow/insight_bench)

Insight-Bench is a benchmark dataset designed to evaluate end-to-end data analytics by evaluating agents' ability to perform comprehensive data analysis across diverse use cases, featuring carefully curated insights, an evaluation mechanism based on LLaMA-3-Eval or G-EVAL, and a data analytics agent, AgentPoirot.

## Data

All groundtruth notebooks are in `data/notebooks`. 

An example notebook can be found here: `data/notebooks/flag-1.ipynb`

## 1. Install the python libraries

```

pip install --upgrade git+https://github.com/ServiceNow/insight-bench

```

## 2. Usage

Evaluate agent on a single notebook

```python

import os

from insightbench import benchmarks, agents

# Set OpenAI API Key

# os.environ["OPENAI_API_KEY"] = ""

# Get Dataset

dataset_dict = benchmarks.load_dataset_dict("data/notebooks/flag-1.json")

# Run an Agent

agent = agents.Agent(

    model_name="gpt-4o-mini",

    max_questions=2,

    branch_depth=1,

    n_retries=2,

    savedir="results/sample",

)

pred_insights, pred_summary = agent.get_insights(

    dataset_csv_path=dataset_dict["dataset_csv_path"], return_summary=True

)

# Evaluate

score_insights = benchmarks.evaluate_insights(

    pred_insights=pred_insights,

    gt_insights=dataset_dict["insights"],

    score_name="rouge1",

)

score_summary = benchmarks.evaluate_summary(

    pred=pred_summary, gt=dataset_dict["summary"], score_name="rouge1"

)

# Print Score

print("score_insights: ", score_insights)

print("score_summary: ", score_summary)

```

## 3. Evaluate Agent on Multiple Insights

```bash

python main.py --openai_api_key 

               --savedir_base 

```

## Citation

```bibtex

@article{sahu2024insightbench,

  title={InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation},

  author={Sahu, Gaurav and Puri, Abhay and Rodriguez, Juan and Abaskohi, Amirhossein and Chegini, Mohammad and Drouin, Alexandre and Taslakian, Perouz and Zantedeschi, Valentina and Lacoste, Alexandre and Vazquez, David and Chapados, Nicolas and Pal, Christopher and others},

  journal={arXiv preprint arXiv:2407.06423},

  year={2024}

}

```

## 🤝 Contributing

- Please check the outstanding issues and feel free to open a pull request.

- Please include any feedback or suggestions or feature requests in the issues section.

- You are welcome to contribute to the codebase and add new datasets and flags

### Thank you!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/servicenow/insight-bench

Awesome Lists containing this project

README