https://github.com/rmoralespp/jsonl
A lightweight Python library for handling jsonlines files
https://github.com/rmoralespp/jsonl
bzip2 deserialization files gzip json jsonl jsonlines ndjson python serialization tar utils xz zip
Last synced: 2 months ago
JSON representation
A lightweight Python library for handling jsonlines files
- Host: GitHub
- URL: https://github.com/rmoralespp/jsonl
- Owner: rmoralespp
- License: mit
- Created: 2024-08-13T10:24:51.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-09-16T14:30:11.000Z (8 months ago)
- Last Synced: 2025-09-19T08:47:13.570Z (8 months ago)
- Topics: bzip2, deserialization, files, gzip, json, jsonl, jsonlines, ndjson, python, serialization, tar, utils, xz, zip
- Language: Python
- Homepage:
- Size: 3.45 MB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
jsonl
A lightweight, dependency-free Python library for JSON Lines โ read, write, compress, and stream with ease.
Documentation
ยท
Changelog
ยท
Issues
---
**jsonl** provides a simple, Pythonic API for working with [JSON Lines](https://jsonlines.org/) data.
It follows the conventions of Python's standard `json` module โ if you know `json.dump` and `json.load`,
you already know how to use **jsonl**.
Fully compliant with the [jsonlines](https://jsonlines.org/) and [ndjson](https://github.com/ndjson/ndjson-spec)
specifications.
## Features
| Feature | Description |
|--------------------------------|----------------------------------------------------------------------------|
| ๐ **Familiar API** | Interface similar to the standard `json` module (`dump`, `load`, `dumps`) |
| โก **Streaming by default** | Read and write incrementally via iterators, keeping memory usage low |
| ๐๏ธ **Built-in compression** | Transparent support for `gzip`, `bzip2`, and `xz` |
| ๐ฆ **Archive support** | Read and write `ZIP` and `TAR` archives (`.tar.gz`, `.tar.bz2`, `.tar.xz`) |
| ๐ฅ **Load from URLs** | Pass a URL directly to `load()` or `load_archive()` |
| ๐ **Pluggable serialization** | Swap in [`orjson`](https://github.com/ijl/orjson), or any JSON library |
| ๐ง **Error tolerance** | Optionally skip malformed lines instead of crashing |
| ๐ **Zero dependencies** | Uses only the Python standard library โ nothing else |
## Installation
```bash
pip install py-jsonl
```
> Requires **Python 3.8+**. No external dependencies.
## Quick Start
### Write
```python
import jsonl
data = [
{"name": "Gilbert", "wins": [["straight", "7โฃ"], ["one pair", "10โฅ"]]},
{"name": "May", "wins": []},
]
jsonl.dump(data, "players.jsonl")
```
### Read
```python
import jsonl
for item in jsonl.load("players.jsonl"):
print(item)
```
### Read from a URL
```python
import jsonl
for item in jsonl.load("https://example.com/data.jsonl"):
print(item)
```
### Compressed files
The compression format is determined automatically โ by file extension when writing,
and by [magic numbers](https://en.wikipedia.org/wiki/List_of_file_signatures) when reading
if the file extension is not recognized:
```python
import jsonl
data = [{"key": "value"}]
jsonl.dump(data, "file.jsonl.gz") # gzip
jsonl.dump(data, "file.jsonl.bz2") # bzip2
jsonl.dump(data, "file.jsonl.xz") # xz
for item in jsonl.load("file.jsonl.gz"):
print(item)
```
### Archives (ZIP / TAR)
```python
import jsonl
# Write multiple files into an archive
data = [
("users.jsonl", [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]),
("orders.jsonl", [{"id": 1, "total": 99.90}, {"id": 2, "total": 45.00}]),
]
jsonl.dump_archive("data.tar.gz", data)
# Read them back
for filename, items in jsonl.load_archive("data.tar.gz"):
print(f"--- {filename} ---")
for item in items:
print(item)
```
### Multiple output files
```python
import jsonl
data = [
("file1.jsonl", [{"name": "Alice"}, {"name": "Bob"}]),
("file2.jsonl", [{"name": "Charlie"}]),
("file1.jsonl", [{"name": "Eve"}]), # appended to file1.jsonl
]
jsonl.dump_fork(data)
```
## API Overview
### Reading
| Function | Description |
|--------------------------------------|---------------------------------------------------|
| `jsonl.load(source, **kw)` | Read from a file, URL, or file-like object |
| `jsonl.load_archive(file, **kw)` | Unpack JSON Lines files from a ZIP or TAR archive |
| `jsonl.loader(stream, broken, **kw)` | Low-level generator deserializing a line stream |
> [!TIP]
> All **read** functions accept `json_loads` and `**json_loads_kwargs` for custom deserialization.
### Writing
| Function | Description |
|----------------------------------------|----------------------------------------------------------|
| `jsonl.dump(iterable, file, **kw)` | Write objects to a JSON Lines file |
| `jsonl.dumps(iterable, **kw)` | Serialize to a JSON Lines string |
| `jsonl.dump_fork(paths, **kw)` | Write to multiple JSON Lines files at once |
| `jsonl.dump_archive(path, data, **kw)` | Pack multiple JSON Lines files into a ZIP or TAR archive |
| `jsonl.dumper(iterable, **kw)` | Low-level generator yielding formatted lines |
> [!TIP]
> All **write** functions accept `json_dumps` and `**json_dumps_kwargs` for custom serialization.
For complete parameter documentation, see the [full docs โ](https://rmoralespp.github.io/jsonl/)
## Custom Serialization
Plug in any JSON-compatible serializer. For example, [`orjson`](https://github.com/ijl/orjson)
for high-performance encoding:
```python
import orjson # ensure orjson is installed: pip install orjson
import jsonl
data = [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]
# Write with orjson (returns bytes โ set text_mode=False)
jsonl.dump(data, "fast.jsonl", json_dumps=orjson.dumps, text_mode=False)
# Read with orjson
for item in jsonl.load("fast.jsonl", json_loads=orjson.loads):
print(item)
```
Extra keyword arguments are forwarded to the underlying serializer:
```python
import jsonl
data = [{"name": "Alice", "score": 9.5}, {"name": "Bob", "score": 7.2}]
jsonl.dump(data, "compact.jsonl", separators=(",", ":")) # compact output
jsonl.dump(data, "sorted.jsonl", sort_keys=True) # deterministic keys
```
## Supported Formats
| Type | Extensions |
|-------------|------------------------------------------|
| Plain | `.jsonl` |
| Compressed | `.jsonl.gz`, `.jsonl.bz2`, `.jsonl.xz` |
| ZIP archive | `.zip` |
| TAR archive | `.tar`, `.tar.gz`, `.tar.bz2`, `.tar.xz` |
> When reading, if the file extension is not recognized, **jsonl** falls back to
> [magic-number detection](https://en.wikipedia.org/wiki/List_of_file_signatures)
> to identify the compression format automatically.
## Contributing
```bash
# Install dev dependencies
pip install --group=test --upgrade
# Run tests
python -m pytest tests/
python -m pytest tests/ --cov # run with coverage reporting
# Lint
pip install --group=lint --upgrade
ruff check .
# Docs
pip install --group=doc --upgrade
# zensical usage: https://zensical.org/docs/usage/
zensical build
zensical serve
```
## License
MIT โ see [LICENSE](LICENSE) for details.