https://github.com/sunkusun9/ml-labs
Pipeline-centric ML framework for experiment, train, and inference workflows
https://github.com/sunkusun9/ml-labs
experimentation inference machine-learning pipeline sklearn training
Last synced: about 2 months ago
JSON representation
Pipeline-centric ML framework for experiment, train, and inference workflows
- Host: GitHub
- URL: https://github.com/sunkusun9/ml-labs
- Owner: sunkusun9
- License: other
- Created: 2026-02-12T01:20:22.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-30T15:58:35.000Z (2 months ago)
- Last Synced: 2026-03-30T17:40:50.351Z (2 months ago)
- Topics: experimentation, inference, machine-learning, pipeline, sklearn, training
- Language: Python
- Size: 2.91 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# ml-labs
A structured machine learning experimentation framework for building, managing, and evaluating ML pipelines with cross-validation, caching, and multi-framework support.
## Installation
```bash
pip install ml-labs
```
With optional dependencies:
```bash
pip install ml-labs[xgboost] # XGBoost support
pip install ml-labs[lightgbm] # LightGBM support
pip install ml-labs[catboost] # CatBoost support
pip install ml-labs[shap] # SHAP value analysis
pip install ml-labs[polars] # Polars DataFrame support
pip install ml-labs[tensorflow] # Neural network estimators (NNClassifier, NNRegressor)
pip install ml-labs[all] # All optional dependencies
```
## Key Features
- **Pipeline**: DAG-based node graph for defining ML workflows with stages (data transformation) and heads (model prediction)
- **Experimenter**: Experiment execution engine with LRU caching, state management, and error resilience
- **Trainer**: Cross-validation training pipeline with split management
- **Collectors**: Extensible data collection — metrics, stacking outputs, model attributes, SHAP values, raw outputs
- **Adapters**: Unified interface for scikit-learn, XGBoost, LightGBM, CatBoost, and Keras
- **Data Flexibility**: Support for pandas, polars, cuDF, and NumPy arrays
## Architecture Overview
```
Pipeline Define node graphs (stages + heads) with groups and edges
│
Experimenter Execute pipelines, manage cache and state
│
├── ExpObj Per-node build/experiment objects (StageObj, HeadObj)
├── Trainer Cross-validation training with split management
└── Collector Collect metrics, predictions, model attributes, SHAP values
```
**Node State Model:**
```
init ──→ built ──→ finalized
│
└──→ error ──→ (reset) ──→ init
```
## Quick Start
```python
from mllabs import Experimenter, Connector, MetricCollector
exp = Experimenter(data=df, path="exp/my_experiment")
p = exp.pipeline
p.set_grp("scale", role="stage", processor="StandardScaler")
p.set_grp("model", role="head", processor="LogisticRegression",
parent="scale", edges={"X": [(None, None)], "y": [(None, "target")]})
p.set_node("lr_default", grp="model")
p.set_node("lr_c01", grp="model", params={"C": 0.1})
mc = MetricCollector("accuracy", Connector(), output_var="prediction",
metric_func=lambda y, pred: (y == pred).mean())
exp.add_collector(mc)
exp.build(["lr_default", "lr_c01"])
exp.exp(["lr_default", "lr_c01"])
print(mc.get_metrics(["lr_default", "lr_c01"]))
```
## Documentation
Full documentation is available at **https://sunkusun9.github.io/ml-labs/**
- [Concepts](https://sunkusun9.github.io/ml-labs/concepts/architecture/) — Architecture, Pipeline, State model, Data flow
- [User Guide](https://sunkusun9.github.io/ml-labs/guide/pipeline-experimenter/) — Pipeline & Experimenter, Trainer & Collectors, Adapters, Processors, Neural Networks
- [Serving Guide](https://sunkusun9.github.io/ml-labs/serving/inferencer/) — Inferencer export and inference
- [API Reference](https://sunkusun9.github.io/ml-labs/reference/index/) — Full API reference
## Requirements
- Python >= 3.10
- pandas >= 1.5
- numpy >= 1.23
- scikit-learn >= 1.2
- cachetools >= 5.0
## License
[PolyForm Noncommercial 1.0.0](https://polyformproject.org/licenses/noncommercial/1.0.0) — free for non-commercial use.