https://github.com/y-scope/log-archival-bench

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/y-scope/log-archival-bench
Owner: y-scope
License: apache-2.0
Created: 2025-07-21T21:34:12.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-09-22T06:14:20.000Z (9 months ago)
Last Synced: 2025-09-22T08:24:53.487Z (9 months ago)
Language: Python
Size: 37.1 KB
Stars: 4
Watchers: 2
Forks: 4
Open Issues: 9
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

          # Log Archival Bench How To

## Setup

Initialize and update submodules:

```shell

git submodule update --init --recursive

```

Run the following code to setup the virtual environment, add the python files in src to python's

import path, then run the venv

```

python3 -m venv venv

echo "$(pwd)" > $(find venv/lib -maxdepth 1 -mindepth 1 -type d)/site-packages/project_root.pth

. venv/bin/activate

pip3 install -r requirements.txt

```

## Download Datasets

You can download all the datasets we use in the benchmark using the [download\_all.py](/scripts/download_all.py) script we provide.

The [download\_all.py](/scripts/download_all.py) script will download all datasets into the correct directories **with** the specified names, concentrate multi-file datasets together into a single file, and generate any modified version of the dataset needed for tools like Presto \+ CLP.

## Run Everything

Follow the instructions above to set up your virtual environment.

Stay in the [Log Archival Bench](/) directory and run [scripts/benchall.py](/scripts/benchall.py). This script runs the tools \+ parameters in its "benchmarks" variable across all datasets under [data/](/data).

## Run One Tool

Execute `./assets/{tool name}/main.py {path to .log}` to run ingestion and search on that dataset.

## Contributing

Follow the steps below to develop and contribute to the project.

### Requirements

* [Task] 3.40.0 or higher

### Linting

Before submitting a pull request, ensure you've run the linting commands below and have fixed all

violations and suppressed any benign warnings.

To run all linting checks:

```shell

task lint:check

```

To run all linting checks AND fix some violations:

```shell

task lint:fix

```

To see how to run a subset of linters for a specific file type:

```shell

task -a

```

Look for tasks under the `lint` namespace (identified by the `lint:` prefix).

[Task]: https://taskfile.dev

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/y-scope/log-archival-bench

Awesome Lists containing this project

README