https://github.com/cogent3/scinexus
composable apps for scientific programming
https://github.com/cogent3/scinexus
bioinformatics scientific-computing software-engineering software-factory
Last synced: about 1 month ago
JSON representation
composable apps for scientific programming
- Host: GitHub
- URL: https://github.com/cogent3/scinexus
- Owner: cogent3
- License: bsd-3-clause
- Created: 2026-03-31T21:55:03.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-22T00:37:01.000Z (about 2 months ago)
- Last Synced: 2026-04-22T02:44:47.458Z (about 2 months ago)
- Topics: bioinformatics, scientific-computing, software-engineering, software-factory
- Language: Python
- Homepage: https://scinexus.readthedocs.io
- Size: 556 KB
- Stars: 5
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: changelog.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://coveralls.io/github/cogent3/scinexus?branch=main) [](https://app.codacy.com/gh/cogent3/scinexus/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade) [](https://github.com/astral-sh/ruff) [](https://github.com/cogent3/scinexus/actions/workflows/ci.yml)
*`scinexus` is a framework for rapid development of data processing applications. It enables interoperability between objects through defined data types, allowing development of scientific domain app ecosystems. Just as `attrs` and `dataclasses` use type hints to simplify data type definition, `scinexus` uses them to simplify writing best-practice scientific algorithms.*
Many scientific problems require repeating calculations across many files or database records. Such tasks suit data-level parallelism, but writing robust, maintainable code for them is often tedious and quickly becomes complex.
As the Unix philosophy articulates, writing algorithms that do one thing well and can be composed together through piping data of known type is a *Very Good Thing*™.
**`scinexus` encourages this design pattern and eliminates the boilerplate.** We leverage the Python type annotation system to govern the compatibility (composability) of different applications. This enables in-process composition of your applications with validation of the consistency of the pipeline and the consistency of the data being run through it.
**`scinexus` is designed for scientific reproducibility.** Scientific computations should record all conditions needed to reproduce an analysis. `scinexus` reduces the effort by intercepting all arguments (including defaults) used in app construction and logging the resulting app state.
## Examples
Developers can choose inheriting from a base class or use the `scinexus.define_app` decorator to make composable apps. The following examples show simple composition
Loading files so missing data does not cause a crash
```python
from scinexus import define_app
@define_app(app_type="loader")
def read_json(path: str) -> dict:
import json
with open(path) as f:
return json.load(f)
@define_app
def validate(data: dict, required_field: str) -> dict:
if required_field not in data:
# this becomes a NotCompleted sentinel object
# your run doesn't crash!
raise ValueError(f"missing {required_field!r} field")
return data
app = read_json() + validate(required_field="name")
```
You can apply `app` to a single file path as `app(filepath)`, or operate in parallel (and show a progress bar) on a sequence of file paths as
```python
results = list(app.as_completed(["some_file_path.json", "some_other_file_path.json"], parallel=True, show_progress=True)
```
A contrived numerical example
```python
from scinexus import define_app
@define_app
def normalise(values: list[float]) -> list[float]:
lo, hi = min(values), max(values)
return [(v - lo) / (hi - lo) for v in values]
@define_app
def threshold(values: list[float]) -> list[bool]:
return [v > 0.5 for v in values]
app = normalise() + threshold()
app([1.0, 5.0, 3.0, 9.0])
```
A configurable app
```python
from scinexus import define_app
@define_app(app_type="loader")
def load_csv(path: str) -> list[dict]:
import csv
with open(path) as f:
return list(csv.DictReader(f))
@define_app
class summarise:
def __init__(self, column: str) -> None:
"""column contains the values to produce summary stats for"""
self.column = column
def main(self, rows: list[dict]) -> dict[str, float]:
vals = [float(r[self.column]) for r in rows]
return {"mean": sum(vals) / len(vals), "min": min(vals), "max": max(vals)}
app = load_csv() + summarise(column="price")
```
## Features
- Type checking at composition time
- Durable computing -- failures recorded as `NotCompleted` records, not exceptions
- Data-level parallel execution with pluggable backends (stdlib, loky, MPI, or custom)
- Progress bars (`tqdm` or `rich`)
- Automated logging and citation tracking
- Checkpointing via data stores (directory, SQLite)
## Installation
```bash
pip install scinexus
```
## History
The app framework and utility functions in `scinexus` incubated inside [cogent3](https://github.com/cogent3/cogent3) from March 2019, accumulating over seven years of development, testing, and real-world use in computational genomics before being extracted into a standalone package. The design is mature and has underpinned analyses in published studies.
The extraction into `scinexus` makes the infrastructure available to any scientific Python project, free of the `cogent3` dependency. See the [changelog](changelog.md) for a detailed list of changes from the cogent3 app infrastructure.
We acknowledge here that many members of the `cogent3` community contributed to the code that now lives here, including [@rmcar17](https://github.com/rmcar17), [@Nick-Foto](https://github.com/Nick-Foto), [@KatherineCaley](https://github.com/KatherineCaley), [@fredjaya](https://github.com/fredjaya), and [@khiron](https://github.com/khiron).