Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sematic-ai/sematic
An open-source ML pipeline development platform
https://github.com/sematic-ai/sematic
ai data-science machine-learning ml ml-ops ml-pipeline ml-pipelines mlops pipeline python python3
Last synced: 3 months ago
JSON representation
An open-source ML pipeline development platform
- Host: GitHub
- URL: https://github.com/sematic-ai/sematic
- Owner: sematic-ai
- License: other
- Created: 2022-04-19T22:16:33.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-16T16:20:10.000Z (4 months ago)
- Last Synced: 2024-10-18T22:33:39.425Z (4 months ago)
- Topics: ai, data-science, machine-learning, ml, ml-ops, ml-pipeline, ml-pipelines, mlops, pipeline, python, python3
- Language: Python
- Homepage:
- Size: 19.9 MB
- Stars: 972
- Watchers: 12
- Forks: 58
- Open Issues: 137
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: docs/code-of-conduct.md
Awesome Lists containing this project
- awesome-production-machine-learning - Sematic - ai/sematic.svg?style=social) - Platform to build resource-intensive pipelines with simple Python. (Training Orchestration)
README

The open-source Continuous Machine Learning Platform
Build ML pipelines with only Python, run on your laptop, or in the cloud.

[](https://app.circleci.com/pipelines/github/sematic-ai/sematic?branch=main&filter=all)

[](https://python.org)
[](https://python.org)
[](https://python.org)
[](https://python.org)
[](https://python.org)

[](https://sematic.dev)

[Sematic](https://sematic.dev) is an open-source ML development platform. It
lets ML Engineers and Data Scientists write arbitrarily complex end-to-end
pipelines with simple Python and execute them on their local machine, in a cloud
VM, or on a Kubernetes cluster to leverage cloud resources.Sematic is based on learnings gathered at top self-driving car companies. It
enables chaining data processing jobs (e.g. Apache Spark) with model training
(e.g. PyTorch, Tensorflow), or any other arbitrary Python business logic into
type-safe, traceable, reproducible end-to-end pipelines that can be monitored
and visualized in a modern web dashboard.Read our [documentation](https://docs.sematic.dev) and join our [Discord
channel](https://discord.gg/4KZJ6kYVax).## Why Sematic
- **Easy onboarding** – no deployment or infrastructure needed to get started,
simply install Sematic locally and start exploring.
- **Local-to-cloud parity** – run the same code on your local laptop and on your
Kubernetes cluster.
- **End-to-end traceability** – all pipeline artifacts are persisted, tracked,
and visualizable in a web dashboard.
- **Access heterogeneous compute** – customize required resources for each
pipeline step to optimize your performance and cloud footprint (CPUs, memory,
GPUs, Spark cluster, etc.)
- **Reproducibility** – rerun your pipelines from the UI with guaranteed
reproducibility of results## Getting Started
To get started locally, simply install Sematic in your Python environment:
```shell
$ pip install sematic
```Start the local web dashboard:
```shell
$ sematic start
```Run an example pipeline:
```shell
$ sematic run examples/mnist/pytorch
```Create a new boilerplate project:
```shell
$ sematic new my_new_project
```Or from an existing example:
```shell
$ sematic new my_new_project --from examples/mnist/pytorch
```Then run it with:
```shell
$ python3 -m my_new_project
```To deploy Sematic to Kubernetes and leverage cloud resources, see our
[documentation](https://docs.sematic.dev).## Features
- **Lightweight Python SDK** – define arbitrarily complex end-to-end pipelines
- **Pipeline nesting** – arbitrarily nest pipelines into larger pipelines
- **Dynamic graphs** – Python-defined graphs allow for iterations, conditional
branching, etc.
- **Lineage tracking** – all inputs and outputs of all steps are persisted and
tracked
- **Runtime type-checking** – fail early with run-time type checking
- **Web dashboard** – Monitor, track, and visualize pipelines in a modern web UI
- **Artifact visualization** – visualize all inputs and outputs of all steps in
the web dashboard
- **Local execution** – run pipelines on your local machine without any
deployment necessary
- **Cloud orchestration** – run pipelines on Kubernetes to access GPUs and other
cloud resources
- **Heterogeneous compute resources** – run different steps on different
machines (e.g. CPUs, memory, GPU, Spark, etc.)
- **Helm chart deployment** – install Sematic on your Kubernetes cluster
- **Pipeline reruns** – rerun pipelines from the UI from an arbitrary point in
the graph
- **Step caching** – cache expensive pipeline steps for faster iteration
- **Step retry** – recover from transient failures with step retries
- **Metadata and collaboration** – Tags, source code visualization, docstrings,
notes, etc.
- **Numerous integrations** – See below## Integrations
- **Apache Spark** – on-demand in-cluster Spark cluster
- **Ray** – on-demand Ray in-cluster Ray resources
- **Snowflake** – easily query your data warehouse (other warehouses supported
too)
- **Plotly, Matplotlib** – visualize plot artifacts in the web dashboard
- **Pandas** – visualize dataframe artifacts in the dashboard
- **Grafana** – embed Grafana panels in the web dashboard
- **Bazel** – integrate with your Bazel build system
- **Helm chart** – deploy to Kubernetes with our Helm chart
- **Git** – track git information in the web dashboard## Community and resources
Learn more about Sematic and get in touch with the following resources:
- [Sematic landing page](https://sematic.dev)
- [Documentation](https://docs.sematic.dev)
- [Discord channel](https://discord.gg/4KZJ6kYVax)
- [YouTube channel](https://www.youtube.com/@sematic-ai)
- [Our Blog](https://sematic.dev/blog)## Contribute!
To contribute to Sematic, check out [open issues tagged "good first
issue"](https://github.com/sematic-ai/sematic/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22),
and get in touch with us on [Discord](https://discord.gg/4KZJ6kYVax).
You can find instructions on how to get your development environment set up
in our [developer docs](./developer-docs/README.md). If you'd like to add
an example, you may also find
[this guide](https://docs.sematic.dev/project/contributor-guide/contribute-example)
helpful.