https://github.com/dkosarevsky/ajpegli
Fast JPEG-to-NumPy image loading powered by Google jpegli
https://github.com/dkosarevsky/ajpegli
albumentations augmentation computer-vision image image-classification image-load image-loader image-loading image-loading-library image-processing images python pytorch tensorflow
Last synced: 2 days ago
JSON representation
Fast JPEG-to-NumPy image loading powered by Google jpegli
- Host: GitHub
- URL: https://github.com/dkosarevsky/ajpegli
- Owner: dKosarevsky
- License: bsd-3-clause
- Created: 2026-05-18T12:45:35.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-18T21:33:41.000Z (about 1 month ago)
- Last Synced: 2026-05-18T21:34:53.225Z (about 1 month ago)
- Topics: albumentations, augmentation, computer-vision, image, image-classification, image-load, image-loader, image-loading, image-loading-library, image-processing, images, python, pytorch, tensorflow
- Language: Python
- Homepage: https://pypi.org/project/ajpegli
- Size: 136 KB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Roadmap: docs/roadmap.md
Awesome Lists containing this project
README
# ajpegli
[](https://pypi.org/project/ajpegli/)
[](https://github.com/dKosarevsky/ajpegli/actions/workflows/ci.yml)
[](https://github.com/dKosarevsky/ajpegli/actions/workflows/ruff.yml)
[](https://github.com/dKosarevsky/ajpegli/actions/workflows/ty.yml)
[](https://pypistats.org/packages/ajpegli)

[](LICENSE)
Fast JPEG-to-NumPy image loading powered by Google jpegli.
`ajpegli` is a dependency-light JPEG loader for Python: pass a file path or
preloaded JPEG bytes, get a NumPy array. Decoding is powered by Google jpegli
and built for high-throughput data pipelines. The path API is `cv2.imread`-like,
but it is not a drop-in OpenCV replacement: color images are returned as RGB by
default. Pass `mode="BGR"` for OpenCV-style pipelines.
Current status: stable Python API. `imread()` and `imdecode()` are the primary
loading APIs, with `encode()` and `info()` available for production use in the
documented v1 scope. Benchmarks are published as measured regression baselines,
not as claims that `ajpegli` is faster than OpenCV or Pillow.
## Development
Clone with submodules before building native wheels:
```bash
git submodule update --init --recursive
uv sync --extra dev
just check
just build
just bench-imread
```
`third_party/jpegli` is pinned as a submodule. The pinned commit is exposed at
runtime through `ajpegli.__jpegli_commit__` and `ajpegli.jpegli_commit()`.
Release and publishing instructions live in [releasing.md](docs/releasing.md).
## Installation
Install from PyPI:
```bash
pip install ajpegli
```
With uv:
```bash
uv add ajpegli
```
## Quickstart
ajpegli ships prebuilt wheels for common Linux, macOS, and Windows CPython
builds. NumPy is the only runtime dependency.
```python
import ajpegli
image = ajpegli.imread("image.jpg")
assert image.dtype == "uint8"
assert image.ndim == 3
rgb = ajpegli.imread("image.jpg", mode="RGB") # default
bgr = ajpegli.imread("image.jpg", mode="BGR") # for OpenCV-style pipelines
gray = ajpegli.imread("image.jpg", mode="L")
with open("image.jpg", "rb") as file:
data = file.read()
rgb_from_memory = ajpegli.imdecode(data, mode="RGB")
bgr_from_memory = ajpegli.imdecode(data, mode="BGR")
jpeg = ajpegli.encode(rgb_from_memory, quality=90, progressive=2)
header = ajpegli.info(jpeg)
assert header.width == rgb_from_memory.shape[1]
```
`imread()` reads the file in the native extension and returns a NumPy array.
`imdecode()` accepts JPEG `bytes` or another bytes-like object and decodes from
memory with the same mode options. `decode()` is kept as an equivalent alias.
The v1 decode API supports `uint8` RGB, BGR, grayscale, CMYK, and native output
modes. File I/O and jpegli decode work release the GIL so threaded callers and
DataLoader workers do not serialize on Python while the native codec is
running.
## RAM / bytes decode
Use `imdecode()` when the benchmark or input pipeline has already loaded JPEG
bytes into memory:
```python
from pathlib import Path
import ajpegli
data = Path("image.jpg").read_bytes()
image = ajpegli.imdecode(data, mode="RGB")
```
`imdecode()` is the direct comparison point for `cv2.imdecode()`. It accepts
`bytes`, `bytearray`, `memoryview`, and contiguous NumPy `uint8` buffers without
making a Python-side copy before entering the native decoder.
NumPy is the only runtime dependency. OpenCV, Pillow, and PyTorch are optional
benchmark tools and are not required by `pip install ajpegli`.
## Encode and Info
`encode()` writes JPEG bytes from `uint8` NumPy arrays. The stable v1 encode
scope is grayscale (`HxW` or `HxWx1`) and RGB (`HxWx3`) input with explicit
alpha rejection unless `alpha="drop"` is passed. It supports quality,
distance/PSNR controls, progressive level, RGB subsampling, adaptive
quantization, and raw ICC/EXIF/XMP/comment marker writing. `info()` reads JPEG
headers without full image decode and returns `JpegInfo` dimensions, component
count, mode, progressive flag, subsampling, density, and ICC/EXIF/XMP presence.
Unsupported paths fail explicitly instead of silently changing data. `uint16`,
`float32`, `float16`, CMYK encode, XYB encode, and parsed EXIF metadata are
outside the v1 stable scope.
## Stability Contract
Starting with `1.0.0`, ajpegli follows SemVer for the documented Python API.
Function names, keyword names, default values, exception classes, and return
types documented in this README are stable across `1.x`. The private
`ajpegli._ajpegli` extension module is not public API.
The exact JPEG bitstream produced by `encode()` and benchmark throughput are
not part of the stability contract: both can change when the pinned jpegli
commit changes. Runtime dependencies stay limited to NumPy throughout `1.x`
unless a future major version changes that contract.
For local source builds, clone with submodules:
```bash
git clone --recursive https://github.com/dKosarevsky/ajpegli.git
cd ajpegli
uv sync --extra dev
just check
```
## Benchmarks
The benchmark script keeps comparison tools optional so `pip install ajpegli`
only needs NumPy at runtime. See [Benchmarks](docs/benchmarks.md),
[Benchmark Results](docs/benchmark-results.md),
[DataLoader Benchmarking](docs/dataloader.md), and
[DataLoader Results](docs/dataloader-results.md).
```bash
just bench-imread path/to/a.jpg 1000 8 RGB ajpegli,cv2,pillow
just bench-imread-dataloader path/to/a.jpg 1000 4 RGB 32
```
`benchmarks/bench_imread.py` reports JSON with sequential throughput, threaded
throughput, and optional PyTorch `DataLoader` throughput. Missing optional
comparison packages are reported as skipped entries instead of failing the run.
Use `--thread-workers` for threaded reader throughput and
`--dataloader-workers` for PyTorch `DataLoader` worker count. Use
`--source bytes` when benchmarking preloaded JPEG bytes from RAM instead of
path reads.
The checked-in reports are intentionally honest: on the current vendored smoke
corpora, OpenCV and Pillow are still faster than `ajpegli`. Treat them as
regression baselines and do not make project-level speed claims without broader
dataset-specific measurements.
For local comparison runs, install only what you want to measure in that
environment:
```bash
uv pip install opencv-python-headless pillow
uv pip install torch # only for --include-dataloader
```