https://github.com/dynamical-inference/ituna

🐟 ituna – tune machine learning models for empirical identifiability and consistency
https://github.com/dynamical-inference/ituna

hyperparameter-optimization hyperparameter-tuning identifiability pytorch sklearn

Last synced: 4 months ago
JSON representation

🐟 ituna – tune machine learning models for empirical identifiability and consistency

Host: GitHub
URL: https://github.com/dynamical-inference/ituna
Owner: dynamical-inference
Created: 2026-02-06T03:30:29.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-02-06T19:29:36.000Z (5 months ago)
Last Synced: 2026-02-06T23:30:12.389Z (5 months ago)
Topics: hyperparameter-optimization, hyperparameter-tuning, identifiability, pytorch, sklearn
Language: Python
Homepage: https://dynamical-inference.github.io/ituna/
Size: 1.79 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md

Awesome Lists containing this project

README

          # 🐟iTuna

[![iTuna](https://img.shields.io/badge/repo-🐟_iTuna-3C92ED?logo=github&logoColor=white)](https://github.com/dynamical-inference/ituna)

[![Documentation](https://img.shields.io/badge/docs-latest-blue)](https://dynamical-inference.github.io/ituna/)

[![PyPI version](https://img.shields.io/pypi/v/ituna.svg?cacheSeconds=3600)](https://pypi.org/project/ituna/)

[![Python versions](https://img.shields.io/pypi/pyversions/ituna.svg?cacheSeconds=3600)](https://pypi.org/project/ituna/)

[![License](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)

[![Build](https://github.com/dynamical-inference/ituna/actions/workflows/build.yml/badge.svg)](https://github.com/dynamical-inference/ituna/actions/workflows/build.yml)

**Tune machine learning models for empirical identifiability and consistency**

## Why 🐟iTuna?

Applying machine learning to scientific data analysis often suffers from an **identifiability gap**: many models along the data-to-analysis pipeline lack statistical guarantees about the uniqueness of their learned representations. This means that re-running the same algorithm can yield different embeddings, making downstream interpretation unreliable without manual verification.

Identifiable representation learning addresses this by ensuring models recover representations that are unique up to a known class of transformations (permutation, linear, affine, etc.). However, even theoretically identifiable models need **empirical validation** to confirm they behave consistently in practice.

🐟iTuna closes this gap by providing a lightweight, model-agnostic framework to:

1. Train multiple instances of a model with different random seeds

2. Align their embeddings under the appropriate indeterminacy class

3. Measure how consistent the learned representations are

Think of it as a **unit test for reproducibility** of learned embeddings.

## Features

- **sklearn-compatible**: Works with any transformer implementing `fit`, `transform`, and standard sklearn conventions

- **Built-in indeterminacy classes**:

  - `Identity` - no transformation needed (model is already fully identifiable)

  - `Permutation` - handles sign flips and component reordering (e.g., FastICA)

  - `Linear` - linear transformation alignment (e.g., PCA)

  - `Affine` - linear transformation with intercept (e.g., CEBRA)

- **Consistency scoring**: Quantifies how stable embeddings are across runs

- **Embedding alignment**: Returns aligned embeddings for downstream analysis

- **Flexible backends**: In-memory, disk caching, distributed execution, and DataJoint support

## Installation

```bash

pip install ituna

```

or alternative install from source

```bash

pip install git+https://github.com/dynamical-inference/ituna.git

```

Optional extras:

```bash

pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[datajoint]"  # DataJoint backend for database-backed caching

pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[dev]"        # Development dependencies (pytest, etc.)

```

## Quickstart

```python

import numpy as np

from sklearn.decomposition import FastICA

from ituna import ConsistencyEnsemble, metrics

# Generate sample data

X = np.random.randn(1000, 64)

# Create a consistency ensemble

ensemble = ConsistencyEnsemble(

    estimator=FastICA(n_components=16, max_iter=500),

    consistency_transform=metrics.PairwiseConsistency(

        indeterminacy=metrics.Permutation(),  # FastICA is identifiable up to permutation

        symmetric=False,

        include_diagonal=True,

    ),

    random_states=5,  # Train 5 instances with different seeds

)

# Fit and evaluate

ensemble.fit(X)

print("Consistency score:", ensemble.score(X))

# Get aligned embeddings

emb = ensemble.transform(X)

print("Embedding shape:", emb.shape)

```

## Documentation

Full documentation is available at **[dynamical-inference.github.io/ituna](https://dynamical-inference.github.io/ituna/)**.

- **Quickstart notebook**: [`docs/tutorials/quickstart.ipynb`](docs/tutorials/quickstart.ipynb) - minimal working example

- **Core concepts**: [`docs/tutorials/core.ipynb`](docs/tutorials/core.ipynb) - in-depth walkthrough

- **Backends**: [`docs/tutorials/backends.ipynb`](docs/tutorials/backends.ipynb) - caching and distributed execution

## Backends

🐟iTuna supports different backends for caching and distributed computation:

```python

from ituna import ConsistencyEnsemble, config, metrics

from sklearn.decomposition import FastICA

ensemble = ConsistencyEnsemble(

    estimator=FastICA(n_components=16, max_iter=500),

    consistency_transform=metrics.PairwiseConsistency(

        indeterminacy=metrics.Permutation(),

    ),

    random_states=10,

)

# Enable disk caching (avoids re-fitting identical models)

with config.config_context(DEFAULT_BACKEND="disk_cache"):

    ensemble.fit(X)

# Distributed execution with multiple workers

with config.config_context(

    DEFAULT_BACKEND="disk_cache_distributed",

    BACKEND_KWARGS={"trigger_type": "auto", "num_workers": 4},

):

    ensemble.fit(X)

```

### CLI Commands

For large-scale experiments, use the command-line tools:

```bash

# Local distributed backend

ituna-fit-distributed --sweep-name  --cache-dir ./cache

# DataJoint backend

ituna-fit-distributed-datajoint --sweep-name  --schema-name myschema

```

## Development

```bash

# Clone and install in development mode

git clone https://github.com/dynamical-inference/ituna.git

cd ituna

pip install -e .[dev]

# Run tests

pytest tests -v

# Setup pre-commit hooks

pre-commit install

```

For the full development guide — branching conventions, code style, building docs, and the release process — see [CONTRIBUTING.md](CONTRIBUTING.md).

## Citation

If you use 🐟iTuna in your research, please cite:

```bibtex

@software{ituna,

  author = {Schmidt, Tobias and Schneider, Steffen},

  title = {iTuna: Tune machine learning models for empirical identifiability and consistency},

  url = {https://github.com/dynamical-inference/ituna},

  version = {0.1.0},

}

```

## License

🐟iTuna is released under the [MIT License](./LICENSE). If you re-use parts of the iTuna code in your own package, please make sure to copy & paste the contents of the `LICENSE` file into a `NOTICE` in your repository.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dynamical-inference/ituna

Awesome Lists containing this project

README