https://github.com/nanobiostructuresrg/melite
MELITE is a tabular classification benchmarking toolkit for model selection, repeated stratified cross-validation, final model export, and artifact-based inference.
https://github.com/nanobiostructuresrg/melite
benchmarking random-forest-classifier support-vector-classifier xgboost-classifier
Last synced: 3 days ago
JSON representation
MELITE is a tabular classification benchmarking toolkit for model selection, repeated stratified cross-validation, final model export, and artifact-based inference.
- Host: GitHub
- URL: https://github.com/nanobiostructuresrg/melite
- Owner: NanoBiostructuresRG
- License: other
- Created: 2025-05-12T21:57:30.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-05-26T06:06:46.000Z (4 days ago)
- Last Synced: 2026-05-26T06:26:17.467Z (4 days ago)
- Topics: benchmarking, random-forest-classifier, support-vector-classifier, xgboost-classifier
- Language: Python
- Homepage: https://nanobiostructuresrg.github.io/melite/
- Size: 271 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: COPYING
- Citation: CITATION.cff
Awesome Lists containing this project
README
# MELITE: Multi-model Evaluation and Learning for Inference-ready Tabular Experiments
[](https://github.com/NanoBiostructuresRG/melite/actions/workflows/ci.yml)
[](LICENSE)
[]()
[]()
**MELITE** is a pre-stable Python toolkit for tabular classification
benchmarking, model selection, repeated stratified cross-validation, final
model export, and artifact-based inference.
MELITE is tabular at the modeling level. The learning algorithms consume
numeric `X` and `y` arrays, so the feature matrix may come from PCA, UMAP,
fingerprints, descriptors, clinical variables, experimental measurements,
industrial features, or manually selected numeric features.
## Project Identity
```text
Project: MELITE
PyPI distribution: melite
Import package: melite
CLI: melite
Version: 0.2.0
License: LGPL-3.0-or-later
Status: alpha / pre-stable
```
## Documentation
The live documentation is published at:
https://nanobiostructuresrg.github.io/melite/
Key pages:
- [Installation](https://nanobiostructuresrg.github.io/melite/installation/)
- [Quick Start](https://nanobiostructuresrg.github.io/melite/quickstart/)
- [CLI Reference](https://nanobiostructuresrg.github.io/melite/cli/)
- [Configuration](https://nanobiostructuresrg.github.io/melite/configuration/)
- [API Reference](https://nanobiostructuresrg.github.io/melite/api/)
## Installation
After PyPI publication:
```bash
python -m pip install melite
```
For local development:
```bash
git clone https://github.com/NanoBiostructuresRG/melite.git
cd melite
python -m pip install -e .
```
For development and documentation tools:
```bash
python -m pip install -e ".[dev]"
python -m pip install -e ".[docs]"
```
## Quick Start
Run a fast smoke benchmark with the bundled synthetic example dataset:
```bash
melite run --smoke --config examples/example_config.toml
```
Export a selected model artifact:
```bash
melite export --row 0 --csv examples/output/results.csv --outdir examples/output/
```
Run artifact-based inference:
```python
import numpy as np
from melite import predict
X_new = np.load("examples/sample_PCA70.npz")["X"]
result = predict("examples/output/Model_SVC_sample_pca70.pkl", X_new)
print(result["predictions"])
print(result["probabilities"])
```
## Scope
| MELITE does | MELITE does not |
|-------------|-----------------|
| Accept prepared `X` and `y` arrays. | Generate fingerprints. |
| Benchmark SVC, Random Forest, and XGBoost classifiers. | Process SMILES. |
| Select the best row by F1-macro. | Generate PCA or UMAP reductions from raw data. |
| Export a final retrained `.pkl` model. | Act as a general AutoML framework. |
| Run artifact-based inference through `predict()`. | Promise a stable 1.0 API yet. |
| Handle any numeric tabular matrix. | Generate or validate domain-specific descriptors. |
Datasets are registered as concrete tabular matrix candidates under
`[datasets.]`. The `dataset_id` is user-defined and is used in
`results.csv`, figures, and exported model filenames.
```toml
[datasets.morgan_r2_2048]
path = "data/morgan_r2_2048.npz"
label_path = "raw/labels.npy"
family = "fingerprints"
method = "Morgan"
variant = "r2_2048"
[datasets.rdkit_descriptors]
path = "data/rdkit_descriptors.npz"
label_path = "raw/labels.npy"
family = "descriptors"
method = "RDKit"
[datasets.pca85]
path = "data/PCA85.npz"
label_path = "raw/labels.npy"
family = "dimensionality"
method = "PCA"
level = 85
```
Each registered dataset must define `path` and `label_path`. Optional metadata
fields are `family`, `method`, `variant`, `level`, and `description`; they are
reported for traceability and do not drive special-case model execution.
Registered datasets are loaded strictly: missing files, missing `X`, non-2D or
non-numeric `X`, length mismatches, and embedded `y` mismatches fail the run.
Legacy `[benchmark].reduction_types` and `levels` configs are still accepted
and are normalized into equivalent dataset entries such as `PCA70` and `UMAP90`.
## CLI
```bash
melite --help
melite run --help
melite export --help
melite --version
```
Common commands:
```bash
melite run
melite run --smoke
melite run --config my_config.toml
melite export --row 0
melite export --config my_config.toml --row 0
melite export --row 0 --force
```
## Public API
```python
from melite import Config
from melite import load_datasets
from melite import plot_cv_distributions
from melite import predict
from melite import __version__
```
Modules not listed above are importable directly but are not part of the public
contract and may change before 0.2.0.
## Input Format
```text
raw/labels.npy <- target vector y, shape (n_samples,)
data/morgan_r2_2048.npz <- required key: X, optional key: y
data/rdkit_descriptors.npz
data/PCA85.npz
data/UMAP90.npz
```
Each `.npz` file must contain an `X` array. If an embedded `y` array is present,
MELITE validates it against the configured `label_path`.
## Outputs
```text
output/
|-- results.txt
|-- results.csv
|-- Model__.pkl
`-- figures/
`-- _.png
```
Local inputs and generated artifacts such as `raw/`, `data/`, `output/`,
`.pkl`, and `.joblib` files are intentionally ignored by Git.
## Validation
The current `dev/v0.2.0` branch targets:
```bash
python -m pytest tests/ -v --basetemp=.review_pytest_tmp -o cache_dir=.review_pytest_cache
mkdocs build --strict
python -m build
python -m twine check dist/*
melite --help
melite run --help
melite export --help
melite --version
```
## Citation
If you use MELITE in your research, please cite it using the metadata in
[CITATION.cff](CITATION.cff).
```text
Contreras-Torres, F. F., & Murrieta, A. C. (2026). MELITE: Multi-model
Evaluation and Learning for Inference-ready Tabular Experiments (0.1.11).
Tecnologico de Monterrey. https://github.com/NanoBiostructuresRG/melite
```
## Authors
Developed by **Flavio F. Contreras-Torres**
Tecnologico de Monterrey
Co-author: **Ana C. Murrieta**
Tecnologico de Monterrey
## License
This project is licensed under the terms of the
[GNU Lesser General Public License v3.0 or later](LICENSE).
SPDX identifier: `LGPL-3.0-or-later`