An open API service indexing awesome lists of open source software.

https://github.com/rmoralespp/jsonl

A lightweight Python library for handling jsonlines files
https://github.com/rmoralespp/jsonl

bzip2 deserialization files gzip json jsonl jsonlines ndjson python serialization tar utils xz zip

Last synced: 2 months ago
JSON representation

A lightweight Python library for handling jsonlines files

Awesome Lists containing this project

README

          


jsonl


A lightweight, dependency-free Python library for JSON Lines โ€” read, write, compress, and stream with ease.


PyPI version
Python versions
CI
Coverage
License
Downloads


Documentation
ยท
Changelog
ยท
Issues

---

**jsonl** provides a simple, Pythonic API for working with [JSON Lines](https://jsonlines.org/) data.
It follows the conventions of Python's standard `json` module โ€” if you know `json.dump` and `json.load`,
you already know how to use **jsonl**.

Fully compliant with the [jsonlines](https://jsonlines.org/) and [ndjson](https://github.com/ndjson/ndjson-spec)
specifications.

## Features

| Feature | Description |
|--------------------------------|----------------------------------------------------------------------------|
| ๐ŸŒŽ **Familiar API** | Interface similar to the standard `json` module (`dump`, `load`, `dumps`) |
| โšก **Streaming by default** | Read and write incrementally via iterators, keeping memory usage low |
| ๐Ÿ—œ๏ธ **Built-in compression** | Transparent support for `gzip`, `bzip2`, and `xz` |
| ๐Ÿ“ฆ **Archive support** | Read and write `ZIP` and `TAR` archives (`.tar.gz`, `.tar.bz2`, `.tar.xz`) |
| ๐Ÿ“ฅ **Load from URLs** | Pass a URL directly to `load()` or `load_archive()` |
| ๐Ÿš€ **Pluggable serialization** | Swap in [`orjson`](https://github.com/ijl/orjson), or any JSON library |
| ๐Ÿ”ง **Error tolerance** | Optionally skip malformed lines instead of crashing |
| ๐Ÿ **Zero dependencies** | Uses only the Python standard library โ€” nothing else |

## Installation

```bash
pip install py-jsonl
```

> Requires **Python 3.8+**. No external dependencies.

## Quick Start

### Write

```python
import jsonl

data = [
{"name": "Gilbert", "wins": [["straight", "7โ™ฃ"], ["one pair", "10โ™ฅ"]]},
{"name": "May", "wins": []},
]

jsonl.dump(data, "players.jsonl")
```

### Read

```python
import jsonl

for item in jsonl.load("players.jsonl"):
print(item)
```

### Read from a URL

```python
import jsonl

for item in jsonl.load("https://example.com/data.jsonl"):
print(item)
```

### Compressed files

The compression format is determined automatically โ€” by file extension when writing,
and by [magic numbers](https://en.wikipedia.org/wiki/List_of_file_signatures) when reading
if the file extension is not recognized:

```python
import jsonl

data = [{"key": "value"}]

jsonl.dump(data, "file.jsonl.gz") # gzip
jsonl.dump(data, "file.jsonl.bz2") # bzip2
jsonl.dump(data, "file.jsonl.xz") # xz

for item in jsonl.load("file.jsonl.gz"):
print(item)
```

### Archives (ZIP / TAR)

```python
import jsonl

# Write multiple files into an archive
data = [
("users.jsonl", [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]),
("orders.jsonl", [{"id": 1, "total": 99.90}, {"id": 2, "total": 45.00}]),
]
jsonl.dump_archive("data.tar.gz", data)

# Read them back
for filename, items in jsonl.load_archive("data.tar.gz"):
print(f"--- {filename} ---")
for item in items:
print(item)
```

### Multiple output files

```python
import jsonl

data = [
("file1.jsonl", [{"name": "Alice"}, {"name": "Bob"}]),
("file2.jsonl", [{"name": "Charlie"}]),
("file1.jsonl", [{"name": "Eve"}]), # appended to file1.jsonl
]

jsonl.dump_fork(data)
```

## API Overview

### Reading

| Function | Description |
|--------------------------------------|---------------------------------------------------|
| `jsonl.load(source, **kw)` | Read from a file, URL, or file-like object |
| `jsonl.load_archive(file, **kw)` | Unpack JSON Lines files from a ZIP or TAR archive |
| `jsonl.loader(stream, broken, **kw)` | Low-level generator deserializing a line stream |

> [!TIP]
> All **read** functions accept `json_loads` and `**json_loads_kwargs` for custom deserialization.

### Writing

| Function | Description |
|----------------------------------------|----------------------------------------------------------|
| `jsonl.dump(iterable, file, **kw)` | Write objects to a JSON Lines file |
| `jsonl.dumps(iterable, **kw)` | Serialize to a JSON Lines string |
| `jsonl.dump_fork(paths, **kw)` | Write to multiple JSON Lines files at once |
| `jsonl.dump_archive(path, data, **kw)` | Pack multiple JSON Lines files into a ZIP or TAR archive |
| `jsonl.dumper(iterable, **kw)` | Low-level generator yielding formatted lines |

> [!TIP]
> All **write** functions accept `json_dumps` and `**json_dumps_kwargs` for custom serialization.

For complete parameter documentation, see the [full docs โ†’](https://rmoralespp.github.io/jsonl/)

## Custom Serialization

Plug in any JSON-compatible serializer. For example, [`orjson`](https://github.com/ijl/orjson)
for high-performance encoding:

```python
import orjson # ensure orjson is installed: pip install orjson
import jsonl

data = [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]

# Write with orjson (returns bytes โ†’ set text_mode=False)
jsonl.dump(data, "fast.jsonl", json_dumps=orjson.dumps, text_mode=False)

# Read with orjson
for item in jsonl.load("fast.jsonl", json_loads=orjson.loads):
print(item)
```

Extra keyword arguments are forwarded to the underlying serializer:

```python
import jsonl

data = [{"name": "Alice", "score": 9.5}, {"name": "Bob", "score": 7.2}]

jsonl.dump(data, "compact.jsonl", separators=(",", ":")) # compact output
jsonl.dump(data, "sorted.jsonl", sort_keys=True) # deterministic keys
```

## Supported Formats

| Type | Extensions |
|-------------|------------------------------------------|
| Plain | `.jsonl` |
| Compressed | `.jsonl.gz`, `.jsonl.bz2`, `.jsonl.xz` |
| ZIP archive | `.zip` |
| TAR archive | `.tar`, `.tar.gz`, `.tar.bz2`, `.tar.xz` |

> When reading, if the file extension is not recognized, **jsonl** falls back to
> [magic-number detection](https://en.wikipedia.org/wiki/List_of_file_signatures)
> to identify the compression format automatically.

## Contributing

```bash
# Install dev dependencies
pip install --group=test --upgrade

# Run tests
python -m pytest tests/
python -m pytest tests/ --cov # run with coverage reporting

# Lint
pip install --group=lint --upgrade
ruff check .

# Docs
pip install --group=doc --upgrade

# zensical usage: https://zensical.org/docs/usage/
zensical build
zensical serve
```

## License

MIT โ€” see [LICENSE](LICENSE) for details.