https://github.com/rose-stl-lab/anomllm
https://github.com/rose-stl-lab/anomllm
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/rose-stl-lab/anomllm
- Owner: Rose-STL-Lab
- License: mit
- Created: 2024-08-12T21:14:52.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-16T21:30:06.000Z (over 1 year ago)
- Last Synced: 2025-03-24T10:11:10.390Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 2.53 MB
- Stars: 16
- Watchers: 1
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - rose-stl-lab/anomllm
README
AnomLLM
Can LLMs Understand Time Series Anomalies?
## | Introduction
We challenge common assumptions about Large Language Models' capabilities in time series understanding. This repository contains the code for reproducing results and **benchmarking** your own large language models' (as long as they are compatible with OpenAI API) anomaly detection capabilities.
## | Citation
[[2410.05440] Can LLMs Understand Time Series Anomalies?](https://arxiv.org/abs/2410.05440)
```
@misc{zhou2024llmsunderstandtimeseries,
title={Can LLMs Understand Time Series Anomalies?},
author={Zihao Zhou and Rose Yu},
year={2024},
eprint={2410.05440},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.05440},
}
```
## | Installation
- Dependencies: `conda`
- Run `export PYTHONPATH=$PYTHONPATH:$(pwd)/src` first
- Jpyter notebook path shall be the root directory of the project.
```bash
conda env create --file environment.yml
conda activate anomllm
poetry install --no-root
# Or `poetry install --no-root --with dev` if you need jupyter and etc.
```
## | Dataset Download
We recommend using [`s5cmd`](https://github.com/peak/s5cmd/tree/master) to download the dataset from the NRP S3 bucket.
```bash
s5cmd --no-sign-request --endpoint-url https://s3-west.nrp-nautilus.io cp "s3://anomllm/data/*" data/
```
Alternatively, you can download the dataset from the following link: [Google Drive](https://drive.google.com/file/d/19KNCiOm3UI_JXkzBAWOdqXwM0VH3xOwi/view?usp=sharing) or synthesize your own dataset using `synthesize.sh`. Make sure the dataset is stored in the `data` directory.
## | API Configuration
Create a `credentials.yml` file in the root directory with the following content:
```yaml
gpt-4o:
api_key:
base_url: "https://api.openai.com/v1"
gpt-4o-mini:
api_key:
base_url: "https://api.openai.com/v1"
gemini-1.5-flash:
api_key:
internvlm-76b:
api_key:
base_url: (ended with v1)
qwen:
api_key:
base_url: (ended with v1)
```
## | Example Usage for Single Time Series
Check out the [example notebook](https://github.com/Rose-STL-Lab/AnomLLM/blob/dev/notebook/example.ipynb).
To run the example notebook, you only need the `gemini-1.5-flash` model in the `credentials.yml` file.
## | Batch Run using OpenAI BatchAPI
`python src/batch_api.py --data $datum --model $model --variant $variant`
See `test.sh` for comprehensive lists of models, variants, and datasets. The [Batch API](https://platform.openai.com/docs/guides/batch/overview) only works with OpenAI proprietary models and will reduce the cost by 50%, but it does not finish in real-time. Your first run will create a request file, and subsequent runs will check the status of the request and retrieve the results when they are ready.
## | Online Run using OpenAI API
`python src/online_api.py --data $datum --model $model --variant $variant`
The online API works with all OpenAI-compatible model hosting services.
## | Evaluation
`python src/result_agg.py --data $datum`
The evaluation script will aggregate the results from the API and generate the evaluation metrics, for all models and variants.
## | License
This project is licensed under the MIT License.
