https://github.com/aazuspan/dask-progress-matrix
Visualize Dask computations by chunk
https://github.com/aazuspan/dask-progress-matrix
dask visualization
Last synced: 4 months ago
JSON representation
Visualize Dask computations by chunk
- Host: GitHub
- URL: https://github.com/aazuspan/dask-progress-matrix
- Owner: aazuspan
- License: mit
- Created: 2025-08-11T20:22:25.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-08-11T20:28:23.000Z (5 months ago)
- Last Synced: 2025-09-14T21:01:12.000Z (4 months ago)
- Topics: dask, visualization
- Language: Python
- Homepage:
- Size: 229 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://pypi.org/p/dask-progress-matrix)
[](https://github.com/aazuspan/dask-progress-matrix/actions/workflows/ci.yaml)
Visualize Dask computations by chunk.

## Install
```bash
pip install dask-progress-matrix
```
## Quick-start
### API
Use `ProgressMatrix` as a context manager to track Dask computations:
```python
import dask.array as da
from dask_progress_matrix import ProgressMatrix
with ProgressMatrix(cmap="inferno"):
da.random.random((2, 128, 256), chunks=(1, 16, 16)).compute()
```
### CLI
Track Dask computations in any Python file using the CLI. For example, using [uv](https://docs.astral.sh/uv/):
```bash
$ uvx dask-progress-matrix compute_something.py --cmap=inferno
```
## Features
* **Terminal or Jupyter** - Progress matrixes can be displayed in both terminal environments and Jupyter notebooks.
* **Modes** - When a computation is complete, the `ProgressMatrix` displays a summary with either the completed index or the elapsed time for each chunk, depending on the `mode` parameter.
* **Data structures** - Computing any Dask-backed object will display a progress matrix, including Xarray objects.
* **Dimensionality** - You can track the computation of any Dask array, regardless of dimensionality. For visualization, arrays are truncated to the last two dimensions, so e.g. an array with chunks `(3, 16, 16)` will be rendered as a 16 x 16 matrix where each chunk tracks the progress of 3 different computations.
## Limitations
* **Distributed schedulers** - Support for distributed schedulers like `dask.distributed.Client` isn't currently implemented.
* **Character width** - Each computation chunk is rendered with a minimum width of 2 characters, so arrays with huge numbers of chunks may render slowly or poorly.
* **Chunk shapes** - All chunks are represented by squares, regardles of their shape.
## FAQ
### Why use a progress matrix?
This was mostly developed out of curiosity, but it has some practical debugging and tuning applications, like identifying chunks that are slow to compute.
### Why does it take a long time to start doing anything?
The progress matrix only tracks terminal tasks that correspond directly to chunks in the computed output array. If your computation has a lot of intermediate tasks, you won't see any progress until those are completed.