https://github.com/tjake/osbench

Last synced: 10 months ago
JSON representation

Host: GitHub
URL: https://github.com/tjake/osbench
Owner: tjake
Created: 2025-09-10T16:25:54.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-09-15T13:31:35.000Z (11 months ago)
Last Synced: 2025-09-15T15:23:43.746Z (11 months ago)
Language: Python
Size: 13.7 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# OpenSearch Parquet Loader

This script efficiently loads data from Parquet files into an OpenSearch index, leveraging batch processing and multiple connections to maximize indexing speed.

## Prerequisites

- Python 3.8+
- Access to an OpenSearch cluster

## Setup

1. **Clone the repository:**
```bash
git clone https://github.com/your-username/osbench.git
cd osbench
```

2. **Install dependencies:**
```bash
pip install -r requirements.txt
```

3. **Configure environment variables:**
Create a `.env` file in the project root and add your OpenSearch connection details:
```
OPENSEARCH_HOSTS='["http://localhost:9200"]'
OPENSEARCH_INDEX="my-index"
```

## Usage

Run the script from the command line, providing the path to your Parquet file and the number of parallel workers:

```bash
python osbench.py /path/to/your/data.parquet --workers 4
```

- `/path/to/your/data.parquet`: The Parquet file to load.
- `--workers`: The number of parallel connections to use.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tjake/osbench

Awesome Lists containing this project

README