https://github.com/dynamical-inference/ituna
🐟 ituna – tune machine learning models for empirical identifiability and consistency
https://github.com/dynamical-inference/ituna
hyperparameter-optimization hyperparameter-tuning identifiability pytorch sklearn
Last synced: 3 months ago
JSON representation
🐟 ituna – tune machine learning models for empirical identifiability and consistency
- Host: GitHub
- URL: https://github.com/dynamical-inference/ituna
- Owner: dynamical-inference
- Created: 2026-02-06T03:30:29.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-06T19:29:36.000Z (4 months ago)
- Last Synced: 2026-02-06T23:30:12.389Z (4 months ago)
- Topics: hyperparameter-optimization, hyperparameter-tuning, identifiability, pytorch, sklearn
- Language: Python
- Homepage: https://dynamical-inference.github.io/ituna/
- Size: 1.79 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
Awesome Lists containing this project
README
# 🐟iTuna
[](https://github.com/dynamical-inference/ituna)
[](https://dynamical-inference.github.io/ituna/)
[](https://pypi.org/project/ituna/)
[](https://pypi.org/project/ituna/)
[](./LICENSE)
[](https://github.com/dynamical-inference/ituna/actions/workflows/build.yml)
**Tune machine learning models for empirical identifiability and consistency**
## Why 🐟iTuna?
Applying machine learning to scientific data analysis often suffers from an **identifiability gap**: many models along the data-to-analysis pipeline lack statistical guarantees about the uniqueness of their learned representations. This means that re-running the same algorithm can yield different embeddings, making downstream interpretation unreliable without manual verification.
Identifiable representation learning addresses this by ensuring models recover representations that are unique up to a known class of transformations (permutation, linear, affine, etc.). However, even theoretically identifiable models need **empirical validation** to confirm they behave consistently in practice.
🐟iTuna closes this gap by providing a lightweight, model-agnostic framework to:
1. Train multiple instances of a model with different random seeds
2. Align their embeddings under the appropriate indeterminacy class
3. Measure how consistent the learned representations are
Think of it as a **unit test for reproducibility** of learned embeddings.
## Features
- **sklearn-compatible**: Works with any transformer implementing `fit`, `transform`, and standard sklearn conventions
- **Built-in indeterminacy classes**:
- `Identity` - no transformation needed (model is already fully identifiable)
- `Permutation` - handles sign flips and component reordering (e.g., FastICA)
- `Linear` - linear transformation alignment (e.g., PCA)
- `Affine` - linear transformation with intercept (e.g., CEBRA)
- **Consistency scoring**: Quantifies how stable embeddings are across runs
- **Embedding alignment**: Returns aligned embeddings for downstream analysis
- **Flexible backends**: In-memory, disk caching, distributed execution, and DataJoint support
## Installation
```bash
pip install ituna
```
or alternative install from source
```bash
pip install git+https://github.com/dynamical-inference/ituna.git
```
Optional extras:
```bash
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[datajoint]" # DataJoint backend for database-backed caching
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[dev]" # Development dependencies (pytest, etc.)
```
## Quickstart
```python
import numpy as np
from sklearn.decomposition import FastICA
from ituna import ConsistencyEnsemble, metrics
# Generate sample data
X = np.random.randn(1000, 64)
# Create a consistency ensemble
ensemble = ConsistencyEnsemble(
estimator=FastICA(n_components=16, max_iter=500),
consistency_transform=metrics.PairwiseConsistency(
indeterminacy=metrics.Permutation(), # FastICA is identifiable up to permutation
symmetric=False,
include_diagonal=True,
),
random_states=5, # Train 5 instances with different seeds
)
# Fit and evaluate
ensemble.fit(X)
print("Consistency score:", ensemble.score(X))
# Get aligned embeddings
emb = ensemble.transform(X)
print("Embedding shape:", emb.shape)
```
## Documentation
Full documentation is available at **[dynamical-inference.github.io/ituna](https://dynamical-inference.github.io/ituna/)**.
- **Quickstart notebook**: [`docs/tutorials/quickstart.ipynb`](docs/tutorials/quickstart.ipynb) - minimal working example
- **Core concepts**: [`docs/tutorials/core.ipynb`](docs/tutorials/core.ipynb) - in-depth walkthrough
- **Backends**: [`docs/tutorials/backends.ipynb`](docs/tutorials/backends.ipynb) - caching and distributed execution
## Backends
🐟iTuna supports different backends for caching and distributed computation:
```python
from ituna import ConsistencyEnsemble, config, metrics
from sklearn.decomposition import FastICA
ensemble = ConsistencyEnsemble(
estimator=FastICA(n_components=16, max_iter=500),
consistency_transform=metrics.PairwiseConsistency(
indeterminacy=metrics.Permutation(),
),
random_states=10,
)
# Enable disk caching (avoids re-fitting identical models)
with config.config_context(DEFAULT_BACKEND="disk_cache"):
ensemble.fit(X)
# Distributed execution with multiple workers
with config.config_context(
DEFAULT_BACKEND="disk_cache_distributed",
BACKEND_KWARGS={"trigger_type": "auto", "num_workers": 4},
):
ensemble.fit(X)
```
### CLI Commands
For large-scale experiments, use the command-line tools:
```bash
# Local distributed backend
ituna-fit-distributed --sweep-name --cache-dir ./cache
# DataJoint backend
ituna-fit-distributed-datajoint --sweep-name --schema-name myschema
```
## Development
```bash
# Clone and install in development mode
git clone https://github.com/dynamical-inference/ituna.git
cd ituna
pip install -e .[dev]
# Run tests
pytest tests -v
# Setup pre-commit hooks
pre-commit install
```
For the full development guide — branching conventions, code style, building docs, and the release process — see [CONTRIBUTING.md](CONTRIBUTING.md).
## Citation
If you use 🐟iTuna in your research, please cite:
```bibtex
@software{ituna,
author = {Schmidt, Tobias and Schneider, Steffen},
title = {iTuna: Tune machine learning models for empirical identifiability and consistency},
url = {https://github.com/dynamical-inference/ituna},
version = {0.1.0},
}
```
## License
🐟iTuna is released under the [MIT License](./LICENSE). If you re-use parts of the iTuna code in your own package, please make sure to copy & paste the contents of the `LICENSE` file into a `NOTICE` in your repository.