https://github.com/duriantaco/jonq

A sqlish (a lil pythonic) and hopefully human readable style wrapper for JQ
https://github.com/duriantaco/jonq
jq json python
Last synced: about 2 months ago
JSON representation
A sqlish (a lil pythonic) and hopefully human readable style wrapper for JQ
Host: GitHub
URL: https://github.com/duriantaco/jonq
Owner: duriantaco
License: mit
Created: 2025-03-09T01:05:19.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-06-22T07:08:13.000Z (4 months ago)
Last Synced: 2025-08-16T00:59:51.927Z (about 2 months ago)
Topics: jq, json, python
Language: Python
Homepage: https://jonq.readthedocs.io/en/latest/
Size: 2.44 MB
Stars: 40
Watchers: 3
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: changelog
- License: License
Awesome Lists containing this project

README

          


  

# jonq

### Human‑readable syntax for **jq**

[![PyPI version](https://img.shields.io/pypi/v/jonq.svg)](https://pypi.org/project/jonq/)

[![Python Versions](https://img.shields.io/pypi/pyversions/jonq.svg)](https://pypi.org/project/jonq/)

[![CI tests](https://github.com/duriantaco/jonq/actions/workflows/tests.yml/badge.svg)](https://github.com/duriantaco/jonq/actions)

[![Documentation Status](https://readthedocs.org/projects/jonq/badge/?version=latest)](https://jonq.readthedocs.io)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)



---

## About 

`jq` is unbeatable for JSON processing, but its syntax requires a lot of learning.  

**jonq** wraps `jq` in a SQL-lish/pythonic layer you can read and remember.

**Who It's For**: Jonq is designed for anyone who needs to work with JSON data. It's good for quick JSON exploration, lightweight ETL tasks, or validating config files in CI pipelines.

**jonq is NOT a database. It's NOT competing with DuckDB or Pandas. jonq is a command-line tool that makes jq accessible by wrapping it in SQL-like syntax.**

---

### What jonq IS for:

* Quick JSON exploration

* jq beginners

* Ad-hoc JSON tasks - No setup, just run a command

* Config file validation - Quick checks in scripts

### What jonq is NOT for:

1. Data analysis - Use Pandas, or Polars or DuckDB

2. Complex joins - Use DuckDB, PostgreSQL, or whatever DB you want

3. Business intelligence - Use proper BI tools

## Features at a glance

| Category          | What you can do | Example |

|-------------------|-----------------|---------|

| **Selection**     | Pick fields     | `select name, age` |

| **Wildcard**      | All fields      | `select *` |

| **Filtering**     | Python‑style ops
`and / or / between / contains` | `if age > 30 and city = 'NY'` |

| **Aggregations**  | `sum avg min max count` | `select avg(price) as avg_price` |

| **Grouping**      | `group by` + `having`   | `… group by city having count > 2` |

| **Ordering**      | `sort  [asc\|desc] ` | `sort age desc 5` |

| **Nested arrays** | `from [].orders` or inline paths | `select products[].name …` |

| **Inline maths**  | Real expressions | `sum(items.price) * 2 as double_total` |

| **CSV / stream**  | `--format csv`, `--stream` |

---

## Why Jonq?  

### Jonq vs raw jq  

| Task | Raw **jq** filter | **jonq** one‑liner |

|------|------------------|--------------------|

| Select specific fields | `jq '.[]|{name:.name,age:.age}'` | `jonq data.json "select name, age"` |

| Filter rows | `jq '.[]|select(.age > 30)|{name,age}'` | `… "select name, age if age > 30"` |

| Sort + limit | `jq 'sort_by(.age) | reverse | .[0:2]'` | `… "select name, age sort age desc 2"` |

| Deep filter | `jq '.[]|select(.profile.address.city=="NY")|{name,city:.profile.address.city}'` | `… "select name, profile.address.city if profile.address.city = 'NY'"` |

| Count items | `jq 'map(select(.age>25)) | length'` | `… "select count(*) as over_25 if age > 25"` |

| Group & count | `jq 'group_by(.city) | map({city:.[0].city,count:length})'` | `… "select city, count(*) as count group by city"` |

| Complex boolean | `jq '.[] | select(.age>25 and (.city=="NY" or .city=="Chicago"))'` | `… "select * if age > 25 and (city = 'NY' or city = 'Chicago')"` |

| Group & HAVING | `jq 'group_by(.city) | map(select(length>2)) | map({city:.[0].city,count:length})'` | `… "select city, count(*) group by city having count > 2"` |

| Aggregation expression | - | `… "select sum(price) * 1.07 as total_gst"` |

| Nested‑array aggregation | - | `… "select avg(products[].versions[].pricing.monthly) as avg_price"` |

**Take‑away:** a single `jonq` string replaces many pipes and brackets while still producing pure jq under the hood.

---

### Jonq vs DuckDB vs Pandas (JSON extension)

| Aspect | **jonq** | **DuckDB** | **Pandas** |

|--------|------------|----------|----------|

| Primary Use Case | Fast, lightweight JSON querying directly from the command line | General-purpose data manipulation and analysis in Python | Analytical SQL queries on large datasets, including JSON |

| Setup | No DB, streams any JSON | Requires DB file / extension | Requires a Python environment with pandas and its dependencies installed |

| Query language | Familiar SQL‑ish, no funky `json_extract`| SQL + JSON functions | Python code for data manipulation and analysis |

| Footprint | Minimal: requires only jq (a ~500 KB binary); no environment setup | ~ 140 MB binary | Larger: ~20 MB for pandas and its dependencies |

| Streaming | `--stream` processes line‑delimited JSON lazily | Must load into table | Can process large files using chunking, but not as memory-efficient as streaming |

| Memory Usage | Low; streams data to avoid loading full JSON into memory | In-memory database, but optimized for large data with columnar storage | Loads data into memory; can strain RAM with large datasets |

| jq ecosystem | Leverages **all** jq filters for post‑processing | No | Part of the Python data science ecosystem; integrates with NumPy, Matplotlib, scikit-learn, etc |

---

### Why you’ll reach for Jonq

1. **Instant JSON Querying, No Setup Hassle**

You have a JSON file (data.json) and need to extract all records where age > 30 in seconds.

* With `jonq`: Run `jonq "SELECT * FROM data.json WHERE age > 30"`. Done. No environment setup, no imports—just install jq and go.

* Pandas: Fire up Python, write a script (`import pandas as pd; df = pd.read_json('data.json'); df[df['age'] > 30]`), and run it. More steps. 

* DuckDB: Set up a database, load the JSON (`SELECT * FROM read_json('data.json') WHERE age > 30`), and execute. Powerful, but overkill for a quick task.

2. **Command-Line Power**

Use Case: Chain commands in a pipeline, like cat data.json | jonq "SELECT name, age FROM stdin" | grep "John".

`Jonq` thrives in shell scripts or CI/CD workflows. Pandas and DuckDB require scripting or a heavier integration layer. 

3. **Lightweight and Efficient**

`Jonq` uses `jq’s` streaming mode (`--stream`) for large JSON files, processing data piece-by-piece instead of loading it all into memory.

Comparison: Pandas loads everything into a DataFrame (RAM-intensive), and while DuckDB is memory-efficient for analytics, it’s still a full database engine, thus there'll be significant overhead.

4. **SQL Simplicity for JSON**

Example: `jonq "SELECT name, email FROM users.json WHERE status = 'active' ORDER BY name"`.

Advantage: If you know SQL, "jonq" feels natural for JSON—no need to learn jq’s super difficult syntax.

5. **Speed for Ad-Hoc Tasks**

Test Case: Querying a 1 GB JSON file for specific fields.

* Jonq: Streams it in seconds with minimal memory use.

* Pandas: Might choke or require chunking hacks.

* DuckDB: Fast, but setup and SQL complexity add time.

## Installation

**Supported Platforms**: Jonq works on Linux, macOS, and Windows with WSL.

### Prerequisites

- Python 3.9+

- `jq` command line tool installed (https://stedolan.github.io/jq/download/)

### Setup

**From PyPI**

```bash

pip install jonq # latest stable

```

**From source**

```bash

git clone https://github.com/duriantaco/jonq.git

cd jonq && pip install -e .

```

 

**Verify Installation**: After installation, run `jonq --version` to ensure it's working correctly.

### Optional: Rust-powered extension

For users dealing with large or complex nested JSON structures, we recommend installing the optional `jonq_fast` Rust extension. 

```bash

pip install jonq-fast

```

We will explain more about this down below

### Quick Start 

# Create a simple JSON file

`echo '[{"name":"Alice","age":30},{"name":"Bob","age":25}]' > data.json`

# Run a query

```bash

jonq data.json "select name, age if age > 25"

# Output: [{"name":"Alice","age":30}]

```

## Query Syntax

The query syntax follows a simplified format:

```bash

select  [if ] [sort  [asc|desc] [limit]]

```

where:

* `` - Comma-separated list of fields to select or aggregations

* `if ` - Optional filtering condition

* `group by ` - Optional grouping by one or more fields

* `sort ` - Optional field to sort by

* `asc|desc` - Optional sort direction (default: asc)

* `limit` - Optional integer to limit the number of results

## Example Simple JSON

You can also refer to the `json_test_files` for the test jsons and look up `USAGE.md` guide. Anyway let's start with `simple.json`. 

Imagine a json like the following: 

```json

[

  { "id": 1, "name": "Alice",   "age": 30, "city": "New York"    },

  { "id": 2, "name": "Bob",     "age": 25, "city": "Los Angeles" },

  { "id": 3, "name": "Charlie", "age": 35, "city": "Chicago"     }

]

```

### To select all fields:

```bash

jonq path/to/simple.json "select *"

```

### Select specific fields:

```bash

jonq path/to/simple.json "select name, age"

```

### Filter with conditions:

```bash

jonq path/to/simple.json "select name, age if age > 30"

```

### Sorting:

```bash

jonq path/to/simple.json "select name, age sort age desc 2"

```

### Aggregation:

```bash

jonq path/to/simple.json "select sum(age) as total_age"

jonq path/to/simple.json "select avg(age) as average_age"

jonq path/to/simple.json "select count(age) as count"

```

Simple enough i hope? Now let's move on to nested jsons 

## Example with Nested JSON 

Imagine a nested json like below:

```json

[

  {

    "id": 1,

    "name": "Alice",

    "profile": {

      "age": 30,

      "address": { "city": "New York", "zip": "10001" }

    },

    "orders": [

      { "order_id": 101, "item": "Laptop", "price": 1200 },

      { "order_id": 102, "item": "Phone",  "price": 800  }

    ]

  },

  { "id": 2, "name": "Bob", "profile": { "age": 25, "address": { "city": "Los Angeles", "zip": "90001" } }, "orders": [ { "order_id": 103, "item": "Tablet", "price": 500 } ] }

]

```

### Common patterns

```bash

# nested field access

jonq nested.json "select name, profile.age"

jonq nested.json "select name, profile.address.city"

# count array elements

jonq nested.json "select name, count(orders) as order_count"

# boolean logic (AND / OR / parentheses)

jonq nested.json "select name if profile.address.city = 'New York' or orders[0].price > 1000"

jonq nested.json "select name if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"

```

### Advanced Filtering with Complex Boolean Expressions

```bash

 jonq nested.json "select name, profile.age if profile.address.city = 'New York' or orders[0].price > 1000"

### Find users who are both under 30 **and** from Los Angeles

jonq nested.json "select name, profile.age if profile.age < 30 and profile.address.city = 'Los Angeles'"

### Using parentheses for complex logic

jonq nested.json "select name, profile.age if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"

```

## Output Formats

### CSV Output

jonq can output results in CSV format using the `--format csv` or `-f csv` option:

```bash

jonq path/to/simple.json "select name, age" --format csv  > output.csv

```

### Python code

Using flatten_json in your code:

```python

from jonq.csv_utils import flatten_json

import csv

data = {

    "user": {

        "name": "Alice",

        "address": {"city": "New York"},

        "orders": [

            {"id": 1, "item": "Laptop", "price": 1200},

            {"id": 2, "item": "Phone", "price": 800}

        ]

    }

}

flattened = flatten_json(data, sep=".")

print(flattened)

```

## Optional: Rust-powered extension

For users dealing with large or complex nested JSON structures, we recommend installing the optional `jonq_fast` Rust extension for significantly improved performance.

### Using jonq_fast on the command line:

Once installed, you can use jonq_fast from the command line with the `--fast` or `-F` flag:

```bash

jonq data.json "select name, age" --format csv --fast > output.csv

```

This flag improves performance when converting to CSV format by using a faster JSON flattening implementation. The performance benefit is most noticeable with large or deeply nested JSON structures.

### Using jonq_fast in your code

You can also use jonq_fast directly in your Python code:

```python

import jonq_fast

import csv 

# fake data

data = {

    "user": {

        "name": "Alice",

        "address": {"city": "New York"},

        "orders": [

            {"id": 1, "item": "Laptop", "price": 1200},

            {"id": 2, "item": "Phone", "price": 800}

        ]

    }

}

flattened = jonq_fast.flatten(data, ".")

print(flattened)

# Output: {

#   "user.name": "Alice", 

#   "user.address.city": "New York",

#   "user.orders.0.id": 1,

#   "user.orders.0.item": "Laptop",

#   "user.orders.0.price": 1200,

#   "user.orders.1.id": 2,

#   "user.orders.1.item": "Phone",

#   "user.orders.1.price": 800

# }

## write to your csv here

with open('output.csv', 'w', newline='') as csvfile:

    writer = csv.writer(csvfile)

    

    writer.writerow(flattened.keys())

    

    writer.writerow(flattened.values())

```

## Streaming Mode

For processing large JSON files efficiently, jonq supports streaming mode with the `--stream` or `-s` option:

```bash

jonq path/to/large.json "select name, age" --stream

```

**New**: Streaming now uses async processing to handle chunks concurrently, providing performance improvements on large files. No changes to commands. Same flags, same syntax, just faster thats all.

## Troubleshooting

### Common Errors

#### Error: Command 'jq' not found

* Make sure jq is installed on your system

* Verify jq is in your PATH by running `jq --version`

* Install jq: https://stedolan.github.io/jq/download/

#### Error: Invalid JSON in file

* Check your JSON file for syntax errors

* Verify the file exists and is readable

* Use a JSON validator to check your file structure

#### Error: Syntax error in query

* Verify your query follows the correct syntax format

* Ensure field names match exactly what's in your JSON

* Check for missing quotes around string values in conditions

#### Error: No results returned

* Verify your condition isn't filtering out all records

* Check if your field names match the casing in the JSON

* For nested fields, ensure the dot notation path is correct

## Known Limitations

* Performance: For very large JSON files (100MB+), processing may be slow.

* Advanced jq Features: Some advanced jq features aren't exposed in the jonq syntax.

* Multiple File Joins: No support for joining data from multiple JSON files.

* Custom Functions: User-defined functions aren't supported in the current version.

* Date/Time Operations: Limited support for date/time parsing or manipulation.

## Go Tos: 

Pandas: Go here for complex analysis (e.g., merging datasets, statistical ops, plotting). `Jonq` won’t crunch numbers or integrate with machine learning libraries.

DuckDB: Pick this for big data analytics with joins, aggregates, or window functions across multiple files. `Jonq` is simpler, **not** a database.

## Docs

Docs here: `https://jonq.readthedocs.io/en/latest/`

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

 

### Misc. 

- **jq**: This tool depends on the [jq command-line JSON processor](https://stedolan.github.io/jq/), which is licensed under the MIT License. jq is copyright (C) 2012 Stephen Dolan.

The jq tool itself is not included in this package - users need to install it separately.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/duriantaco/jonq

Awesome Lists containing this project

README