https://github.com/hyparam/parquet-grep
Grep your parquet files
https://github.com/hyparam/parquet-grep
Last synced: about 1 month ago
JSON representation
Grep your parquet files
- Host: GitHub
- URL: https://github.com/hyparam/parquet-grep
- Owner: hyparam
- License: mit
- Created: 2025-11-20T00:27:25.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2025-12-17T00:19:56.000Z (6 months ago)
- Last Synced: 2026-01-17T21:11:27.823Z (5 months ago)
- Language: JavaScript
- Size: 52.7 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-parquet - parquet-grep - A CLI tool to search for strings in Parquet files. (Tools / Command-line)
README
# parquet-grep
[](https://www.npmjs.com/package/parquet-grep)
[](https://www.npmjs.com/package/parquet-grep)
[](https://github.com/hyparam/parquet-grep/actions)
[](https://opensource.org/licenses/MIT)
A CLI tool for searching text within Apache Parquet files. Works like `grep` but for Parquet files, with support for recursive directory search and multiple output formats.
Built on top of [hyparquet](https://github.com/hyparam/hyparquet) for high-performance Parquet parsing.
## Installation
```bash
npm install -g parquet-grep
```
Or use directly with npx:
```bash
npx parquet-grep "search term" file.parquet
```
## Usage
```bash
parquet-grep [options] [parquet-file]
```
### Options
- `-i` - Force case-insensitive search (by default: case-insensitive if query is lowercase, case-sensitive if query contains uppercase)
- `-v` - Invert match (show non-matching rows)
- `-m ` / `--limit ` - Limit matches per file (default: 5, 0 = unlimited). Shows "..." when limit is exceeded
- `--offset ` - Skip first N matches per file (default: 0). Useful with --limit for pagination
- `--table` - Output in markdown table format (default, grouped by file)
- `--jsonl` - Output as JSON lines (one match per line with filename, rowOffset, and value)
If no file is specified, recursively searches all `.parquet` files in the current directory, skipping `node_modules` and hidden directories.
### Examples
**Search a single file:**
```bash
parquet-grep "Holland" bunnies.parquet
```
**Search recursively in current directory:**
```bash
parquet-grep "search term"
```
**Case-insensitive search:**
```bash
parquet-grep -i "HOLLAND" bunnies.parquet
```
**JSONL output:**
```bash
parquet-grep --jsonl "Holland" bunnies.parquet
```
**Limit results:**
```bash
parquet-grep --limit 10 "search term" file.parquet # Show at most 10 matches per file
parquet-grep --limit 0 "search term" file.parquet # Unlimited matches
```
**Pagination with offset and limit:**
```bash
parquet-grep --offset 5 --limit 10 "search term" file.parquet # Show matches 5-14 (skip first 5)
parquet-grep --offset 0 --limit 5 "search term" file.parquet # Show first 5 matches
```