https://github.com/duriantaco/jonq
A sqlish (a lil pythonic) and hopefully human readable style wrapper for JQ
https://github.com/duriantaco/jonq
jq json python
Last synced: about 2 months ago
JSON representation
A sqlish (a lil pythonic) and hopefully human readable style wrapper for JQ
- Host: GitHub
- URL: https://github.com/duriantaco/jonq
- Owner: duriantaco
- License: mit
- Created: 2025-03-09T01:05:19.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-06-22T07:08:13.000Z (4 months ago)
- Last Synced: 2025-08-16T00:59:51.927Z (about 2 months ago)
- Topics: jq, json, python
- Language: Python
- Homepage: https://jonq.readthedocs.io/en/latest/
- Size: 2.44 MB
- Stars: 40
- Watchers: 3
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: changelog
- License: License
Awesome Lists containing this project
README
![]()
# jonq
### Human‑readable syntax for **jq**
[](https://pypi.org/project/jonq/)
[](https://pypi.org/project/jonq/)
[](https://github.com/duriantaco/jonq/actions)
[](https://jonq.readthedocs.io)
[](LICENSE)---
## About
`jq` is unbeatable for JSON processing, but its syntax requires a lot of learning.
**jonq** wraps `jq` in a SQL-lish/pythonic layer you can read and remember.**Who It's For**: Jonq is designed for anyone who needs to work with JSON data. It's good for quick JSON exploration, lightweight ETL tasks, or validating config files in CI pipelines.
**jonq is NOT a database. It's NOT competing with DuckDB or Pandas. jonq is a command-line tool that makes jq accessible by wrapping it in SQL-like syntax.**
---### What jonq IS for:
* Quick JSON exploration
* jq beginners
* Ad-hoc JSON tasks - No setup, just run a command
* Config file validation - Quick checks in scripts### What jonq is NOT for:
1. Data analysis - Use Pandas, or Polars or DuckDB
2. Complex joins - Use DuckDB, PostgreSQL, or whatever DB you want
3. Business intelligence - Use proper BI tools## Features at a glance
| Category | What you can do | Example |
|-------------------|-----------------|---------|
| **Selection** | Pick fields | `select name, age` |
| **Wildcard** | All fields | `select *` |
| **Filtering** | Python‑style ops
`and / or / between / contains` | `if age > 30 and city = 'NY'` |
| **Aggregations** | `sum avg min max count` | `select avg(price) as avg_price` |
| **Grouping** | `group by` + `having` | `… group by city having count > 2` |
| **Ordering** | `sort [asc\|desc] ` | `sort age desc 5` |
| **Nested arrays** | `from [].orders` or inline paths | `select products[].name …` |
| **Inline maths** | Real expressions | `sum(items.price) * 2 as double_total` |
| **CSV / stream** | `--format csv`, `--stream` |---
## Why Jonq?
### Jonq vs raw jq
| Task | Raw **jq** filter | **jonq** one‑liner |
|------|------------------|--------------------|
| Select specific fields | `jq '.[]|{name:.name,age:.age}'` | `jonq data.json "select name, age"` |
| Filter rows | `jq '.[]|select(.age > 30)|{name,age}'` | `… "select name, age if age > 30"` |
| Sort + limit | `jq 'sort_by(.age) | reverse | .[0:2]'` | `… "select name, age sort age desc 2"` |
| Deep filter | `jq '.[]|select(.profile.address.city=="NY")|{name,city:.profile.address.city}'` | `… "select name, profile.address.city if profile.address.city = 'NY'"` |
| Count items | `jq 'map(select(.age>25)) | length'` | `… "select count(*) as over_25 if age > 25"` |
| Group & count | `jq 'group_by(.city) | map({city:.[0].city,count:length})'` | `… "select city, count(*) as count group by city"` |
| Complex boolean | `jq '.[] | select(.age>25 and (.city=="NY" or .city=="Chicago"))'` | `… "select * if age > 25 and (city = 'NY' or city = 'Chicago')"` |
| Group & HAVING | `jq 'group_by(.city) | map(select(length>2)) | map({city:.[0].city,count:length})'` | `… "select city, count(*) group by city having count > 2"` |
| Aggregation expression | - | `… "select sum(price) * 1.07 as total_gst"` |
| Nested‑array aggregation | - | `… "select avg(products[].versions[].pricing.monthly) as avg_price"` |**Take‑away:** a single `jonq` string replaces many pipes and brackets while still producing pure jq under the hood.
---
### Jonq vs DuckDB vs Pandas (JSON extension)
| Aspect | **jonq** | **DuckDB** | **Pandas** |
|--------|------------|----------|----------|
| Primary Use Case | Fast, lightweight JSON querying directly from the command line | General-purpose data manipulation and analysis in Python | Analytical SQL queries on large datasets, including JSON |
| Setup | No DB, streams any JSON | Requires DB file / extension | Requires a Python environment with pandas and its dependencies installed |
| Query language | Familiar SQL‑ish, no funky `json_extract`| SQL + JSON functions | Python code for data manipulation and analysis |
| Footprint | Minimal: requires only jq (a ~500 KB binary); no environment setup | ~ 140 MB binary | Larger: ~20 MB for pandas and its dependencies |
| Streaming | `--stream` processes line‑delimited JSON lazily | Must load into table | Can process large files using chunking, but not as memory-efficient as streaming |
| Memory Usage | Low; streams data to avoid loading full JSON into memory | In-memory database, but optimized for large data with columnar storage | Loads data into memory; can strain RAM with large datasets |
| jq ecosystem | Leverages **all** jq filters for post‑processing | No | Part of the Python data science ecosystem; integrates with NumPy, Matplotlib, scikit-learn, etc |---
### Why you’ll reach for Jonq
1. **Instant JSON Querying, No Setup Hassle**
You have a JSON file (data.json) and need to extract all records where age > 30 in seconds.
* With `jonq`: Run `jonq "SELECT * FROM data.json WHERE age > 30"`. Done. No environment setup, no imports—just install jq and go.
* Pandas: Fire up Python, write a script (`import pandas as pd; df = pd.read_json('data.json'); df[df['age'] > 30]`), and run it. More steps.
* DuckDB: Set up a database, load the JSON (`SELECT * FROM read_json('data.json') WHERE age > 30`), and execute. Powerful, but overkill for a quick task.
2. **Command-Line Power**
Use Case: Chain commands in a pipeline, like cat data.json | jonq "SELECT name, age FROM stdin" | grep "John".
`Jonq` thrives in shell scripts or CI/CD workflows. Pandas and DuckDB require scripting or a heavier integration layer.
3. **Lightweight and Efficient**
`Jonq` uses `jq’s` streaming mode (`--stream`) for large JSON files, processing data piece-by-piece instead of loading it all into memory.
Comparison: Pandas loads everything into a DataFrame (RAM-intensive), and while DuckDB is memory-efficient for analytics, it’s still a full database engine, thus there'll be significant overhead.
4. **SQL Simplicity for JSON**
Example: `jonq "SELECT name, email FROM users.json WHERE status = 'active' ORDER BY name"`.
Advantage: If you know SQL, "jonq" feels natural for JSON—no need to learn jq’s super difficult syntax.
5. **Speed for Ad-Hoc Tasks**
Test Case: Querying a 1 GB JSON file for specific fields.
* Jonq: Streams it in seconds with minimal memory use.
* Pandas: Might choke or require chunking hacks.
* DuckDB: Fast, but setup and SQL complexity add time.
## Installation
**Supported Platforms**: Jonq works on Linux, macOS, and Windows with WSL.
### Prerequisites
- Python 3.9+
- `jq` command line tool installed (https://stedolan.github.io/jq/download/)### Setup
**From PyPI**
```bash
pip install jonq # latest stable
```**From source**
```bash
git clone https://github.com/duriantaco/jonq.git
cd jonq && pip install -e .
```
**Verify Installation**: After installation, run `jonq --version` to ensure it's working correctly.### Optional: Rust-powered extension
For users dealing with large or complex nested JSON structures, we recommend installing the optional `jonq_fast` Rust extension.
```bash
pip install jonq-fast
```We will explain more about this down below
### Quick Start
# Create a simple JSON file
`echo '[{"name":"Alice","age":30},{"name":"Bob","age":25}]' > data.json`# Run a query
```bash
jonq data.json "select name, age if age > 25"# Output: [{"name":"Alice","age":30}]
```
## Query Syntax
The query syntax follows a simplified format:
```bash
select [if ] [sort [asc|desc] [limit]]
```
where:* `` - Comma-separated list of fields to select or aggregations
* `if ` - Optional filtering condition
* `group by ` - Optional grouping by one or more fields
* `sort ` - Optional field to sort by
* `asc|desc` - Optional sort direction (default: asc)
* `limit` - Optional integer to limit the number of results## Example Simple JSON
You can also refer to the `json_test_files` for the test jsons and look up `USAGE.md` guide. Anyway let's start with `simple.json`.
Imagine a json like the following:
```json
[
{ "id": 1, "name": "Alice", "age": 30, "city": "New York" },
{ "id": 2, "name": "Bob", "age": 25, "city": "Los Angeles" },
{ "id": 3, "name": "Charlie", "age": 35, "city": "Chicago" }
]
```### To select all fields:
```bash
jonq path/to/simple.json "select *"
```### Select specific fields:
```bash
jonq path/to/simple.json "select name, age"
```### Filter with conditions:
```bash
jonq path/to/simple.json "select name, age if age > 30"
```### Sorting:
```bash
jonq path/to/simple.json "select name, age sort age desc 2"
```### Aggregation:
```bash
jonq path/to/simple.json "select sum(age) as total_age"
jonq path/to/simple.json "select avg(age) as average_age"
jonq path/to/simple.json "select count(age) as count"
```Simple enough i hope? Now let's move on to nested jsons
## Example with Nested JSON
Imagine a nested json like below:
```json
[
{
"id": 1,
"name": "Alice",
"profile": {
"age": 30,
"address": { "city": "New York", "zip": "10001" }
},
"orders": [
{ "order_id": 101, "item": "Laptop", "price": 1200 },
{ "order_id": 102, "item": "Phone", "price": 800 }
]
},
{ "id": 2, "name": "Bob", "profile": { "age": 25, "address": { "city": "Los Angeles", "zip": "90001" } }, "orders": [ { "order_id": 103, "item": "Tablet", "price": 500 } ] }
]
```### Common patterns
```bash
# nested field access
jonq nested.json "select name, profile.age"
jonq nested.json "select name, profile.address.city"# count array elements
jonq nested.json "select name, count(orders) as order_count"# boolean logic (AND / OR / parentheses)
jonq nested.json "select name if profile.address.city = 'New York' or orders[0].price > 1000"
jonq nested.json "select name if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"
```### Advanced Filtering with Complex Boolean Expressions
```bash
jonq nested.json "select name, profile.age if profile.address.city = 'New York' or orders[0].price > 1000"### Find users who are both under 30 **and** from Los Angeles
jonq nested.json "select name, profile.age if profile.age < 30 and profile.address.city = 'Los Angeles'"### Using parentheses for complex logic
jonq nested.json "select name, profile.age if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"
```## Output Formats
### CSV Output
jonq can output results in CSV format using the `--format csv` or `-f csv` option:```bash
jonq path/to/simple.json "select name, age" --format csv > output.csv
```### Python code
Using flatten_json in your code:
```python
from jonq.csv_utils import flatten_json
import csvdata = {
"user": {
"name": "Alice",
"address": {"city": "New York"},
"orders": [
{"id": 1, "item": "Laptop", "price": 1200},
{"id": 2, "item": "Phone", "price": 800}
]
}
}flattened = flatten_json(data, sep=".")
print(flattened)
```## Optional: Rust-powered extension
For users dealing with large or complex nested JSON structures, we recommend installing the optional `jonq_fast` Rust extension for significantly improved performance.
### Using jonq_fast on the command line:
Once installed, you can use jonq_fast from the command line with the `--fast` or `-F` flag:
```bash
jonq data.json "select name, age" --format csv --fast > output.csv
```This flag improves performance when converting to CSV format by using a faster JSON flattening implementation. The performance benefit is most noticeable with large or deeply nested JSON structures.
### Using jonq_fast in your code
You can also use jonq_fast directly in your Python code:```python
import jonq_fast
import csv# fake data
data = {
"user": {
"name": "Alice",
"address": {"city": "New York"},
"orders": [
{"id": 1, "item": "Laptop", "price": 1200},
{"id": 2, "item": "Phone", "price": 800}
]
}
}flattened = jonq_fast.flatten(data, ".")
print(flattened)
# Output: {
# "user.name": "Alice",
# "user.address.city": "New York",
# "user.orders.0.id": 1,
# "user.orders.0.item": "Laptop",
# "user.orders.0.price": 1200,
# "user.orders.1.id": 2,
# "user.orders.1.item": "Phone",
# "user.orders.1.price": 800
# }## write to your csv here
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(flattened.keys())
writer.writerow(flattened.values())```
## Streaming Mode
For processing large JSON files efficiently, jonq supports streaming mode with the `--stream` or `-s` option:
```bash
jonq path/to/large.json "select name, age" --stream
```**New**: Streaming now uses async processing to handle chunks concurrently, providing performance improvements on large files. No changes to commands. Same flags, same syntax, just faster thats all.
## Troubleshooting
### Common Errors
#### Error: Command 'jq' not found* Make sure jq is installed on your system
* Verify jq is in your PATH by running `jq --version`
* Install jq: https://stedolan.github.io/jq/download/#### Error: Invalid JSON in file
* Check your JSON file for syntax errors
* Verify the file exists and is readable
* Use a JSON validator to check your file structure#### Error: Syntax error in query
* Verify your query follows the correct syntax format
* Ensure field names match exactly what's in your JSON
* Check for missing quotes around string values in conditions#### Error: No results returned
* Verify your condition isn't filtering out all records
* Check if your field names match the casing in the JSON
* For nested fields, ensure the dot notation path is correct## Known Limitations
* Performance: For very large JSON files (100MB+), processing may be slow.
* Advanced jq Features: Some advanced jq features aren't exposed in the jonq syntax.
* Multiple File Joins: No support for joining data from multiple JSON files.
* Custom Functions: User-defined functions aren't supported in the current version.
* Date/Time Operations: Limited support for date/time parsing or manipulation.## Go Tos:
Pandas: Go here for complex analysis (e.g., merging datasets, statistical ops, plotting). `Jonq` won’t crunch numbers or integrate with machine learning libraries.DuckDB: Pick this for big data analytics with joins, aggregates, or window functions across multiple files. `Jonq` is simpler, **not** a database.
## Docs
Docs here: `https://jonq.readthedocs.io/en/latest/`
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.## License
This project is licensed under the MIT License - see the LICENSE file for details.
### Misc.- **jq**: This tool depends on the [jq command-line JSON processor](https://stedolan.github.io/jq/), which is licensed under the MIT License. jq is copyright (C) 2012 Stephen Dolan.
The jq tool itself is not included in this package - users need to install it separately.