https://github.com/tjake/osbench
https://github.com/tjake/osbench
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/tjake/osbench
- Owner: tjake
- Created: 2025-09-10T16:25:54.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-15T13:31:35.000Z (9 months ago)
- Last Synced: 2025-09-15T15:23:43.746Z (9 months ago)
- Language: Python
- Size: 13.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# OpenSearch Parquet Loader
This script efficiently loads data from Parquet files into an OpenSearch index, leveraging batch processing and multiple connections to maximize indexing speed.
## Prerequisites
- Python 3.8+
- Access to an OpenSearch cluster
## Setup
1. **Clone the repository:**
```bash
git clone https://github.com/your-username/osbench.git
cd osbench
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Configure environment variables:**
Create a `.env` file in the project root and add your OpenSearch connection details:
```
OPENSEARCH_HOSTS='["http://localhost:9200"]'
OPENSEARCH_INDEX="my-index"
```
## Usage
Run the script from the command line, providing the path to your Parquet file and the number of parallel workers:
```bash
python osbench.py /path/to/your/data.parquet --workers 4
```
- `/path/to/your/data.parquet`: The Parquet file to load.
- `--workers`: The number of parallel connections to use.