{"id":29656573,"url":"https://github.com/zincware/zntrack","last_synced_at":"2025-07-22T08:36:02.367Z","repository":{"id":37099871,"uuid":"372782163","full_name":"zincware/ZnTrack","owner":"zincware","description":"Create, visualize, run \u0026 benchmark DVC pipelines in Python \u0026 Jupyter notebooks.","archived":false,"fork":false,"pushed_at":"2025-07-21T19:41:16.000Z","size":10176,"stargazers_count":53,"open_issues_count":123,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-07-21T21:36:10.290Z","etag":null,"topics":["data-science","data-version-control","developer-tools","dvc","git","machine-learning","python","reproducibility"],"latest_commit_sha":null,"homepage":"https://zntrack.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zincware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"patreon":"zincwarecode"}},"created_at":"2021-06-01T09:58:03.000Z","updated_at":"2025-07-14T13:45:47.000Z","dependencies_parsed_at":"2023-09-22T01:52:13.756Z","dependency_job_id":"bc137333-c61c-4767-8fd4-5c097325abe3","html_url":"https://github.com/zincware/ZnTrack","commit_stats":{"total_commits":646,"total_committers":6,"mean_commits":"107.66666666666667","dds":"0.34829721362229105","last_synced_commit":"78b349a445f0ae1d8326bd95faa0317e50ff24da"},"previous_names":[],"tags_count":43,"template":false,"template_full_name":null,"purl":"pkg:github/zincware/ZnTrack","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zincware%2FZnTrack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zincware%2FZnTrack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zincware%2FZnTrack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zincware%2FZnTrack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zincware","download_url":"https://codeload.github.com/zincware/ZnTrack/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zincware%2FZnTrack/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266456424,"owners_count":23931408,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","data-version-control","developer-tools","dvc","git","machine-learning","python","reproducibility"],"created_at":"2025-07-22T08:36:00.574Z","updated_at":"2025-07-22T08:36:02.348Z","avatar_url":"https://github.com/zincware.png","language":"Python","funding_links":["https://patreon.com/zincwarecode"],"categories":[],"sub_categories":[],"readme":"[![codecov](https://codecov.io/gh/zincware/ZnTrack/branch/main/graph/badge.svg?token=ZQ67FXN1IT)](https://codecov.io/gh/zincware/ZnTrack)\n![PyTest](https://github.com/zincware/ZnTrack/actions/workflows/test.yaml/badge.svg)\n[![PyPI version](https://badge.fury.io/py/zntrack.svg)](https://badge.fury.io/py/zntrack)\n[![code-style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black/)\n[![Documentation](https://readthedocs.org/projects/zntrack/badge/?version=latest)](https://zntrack.readthedocs.io/en/latest/?badge=latest)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/zincware/ZnTrack/HEAD)\n[![DOI](https://img.shields.io/badge/arXiv-2401.10603-red)](https://arxiv.org/abs/2401.10603)\n[![ZnTrack](https://img.shields.io/badge/Powered%20by-ZnTrack-%23007CB0)](https://zntrack.readthedocs.io/en/latest/)\n[![zincware](https://img.shields.io/badge/Powered%20by-zincware-darkcyan)](https://github.com/zincware)\n[![Discord](https://img.shields.io/discord/1034511611802689557)](https://discord.gg/7ncfwhsnm4)\n\n![Logo](https://raw.githubusercontent.com/zincware/ZnTrack/main/docs/source/_static/logo_ZnTrack.png)\n\n# ZnTrack: Make Your Python Code Reproducible!\n\nZnTrack (`zɪŋk træk`) is a lightweight and easy-to-use Python package for\nconverting your existing Python code into reproducible workflows. By structuring\nyour code as a directed graph with well-defined inputs and outputs, ZnTrack\nensures reproducibility, scalability, and ease of collaboration.\n\n## Key Features\n\n- **Reproducible Workflows**: Convert Python scripts into reproducible workflows with minimal effort.\n- **Parameter, Output, and Metric Tracking**: Easily track parameters, outputs, and metrics in your Python code.\n- **Shareable and Collaborative**: Collaborate with your team by working together through GIT. Share your workflows and use parts in other projects or package them as Python packages.\n- **DVC Integration**: ZnTrack is built on top of [DVC](https://dvc.org) for version control and experiment management and seamlessly integrates into the [DVC](https://dvc.org) ecosystem.\n\n\n## Example: Molecular Dynamics Workflow\n\nLet’s take a workflow that constructs a periodic, atomistic system of Ethanol\nand runs a geometry optimization using\n[MACE-MP-0](https://arxiv.org/abs/2401.00096).\n\n### Original Workflow\n\n```python\nfrom ase.optimize import LBFGS\nfrom mace.calculators import mace_mp\nfrom rdkit2ase import pack, smiles2conformers\n\nmodel = mace_mp()\n\nframes = smiles2conformers(smiles=\"CCO\", numConfs=32)\nbox = pack(data=[frames], counts=[32], density=789)\n\nbox.calc = model\n\ndyn = LBFGS(box, trajectory=\"optim.traj\")\ndyn.run(fmax=0.5)\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eDependencies\u003c/summary\u003e\nFor this example to work, you will need:\n\u003cul\u003e\n  \u003cli\u003ehttps://github.com/ACEsuit/mace\u003c/li\u003e\n  \u003cli\u003ehttps://github.com/m3g/packmol\u003c/li\u003e\n  \u003cli\u003ehttps://github.com/zincware/rdkit2ase\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/details\u003e\n\n### Converted Workflow with ZnTrack\n\nTo make this workflow reproducible, we convert it into a **directed graph\nstructure** where each step is represented as a **Node**. Nodes define their\ninputs, outputs, and the computational logic to execute. Here's the graph\nstructure for our example:\n\n```mermaid\nflowchart LR\n\nSmiles2Conformers --\u003e Pack --\u003e StructureOptimization\nMACE_MP --\u003e StructureOptimization\n```\n\n#### Node Definitions\n\nIn ZnTrack, each **Node** is defined as a Python class. The class attributes\ndefine the **inputs** (parameters and dependencies) and **outputs**, while the\n`run` method contains the computational logic to be executed.\n\n\u003e [!NOTE]\n\u003e ZnTrack uses Python dataclasses under the hood, providing an automatic\n\u003e `__init__` method. Starting from Python 3.11, most IDEs should reliably\n\u003e provide type hints for ZnTrack Nodes.\n\n\u003e [!TIP]\n\u003e For files produced during the `run` method, ZnTrack provides a unique\n\u003e **Node Working Directory** (`zntrack.nwd`). Always use this directory to store\n\u003e files to ensure reproducibility and avoid conflicts.\n\n```python\nfrom dataclasses import dataclass\nfrom pathlib import Path\n\nimport ase.io\nfrom ase.optimize import LBFGS\nfrom mace.calculators import mace_mp\nfrom rdkit2ase import pack, smiles2conformers\n\nimport zntrack\n\n\nclass Smiles2Conformers(zntrack.Node):\n    smiles: str = zntrack.params()  # A required parameter\n    numConfs: int = zntrack.params(32)  # A parameter with a default value\n\n    frames_path: Path = zntrack.outs_path(zntrack.nwd / \"frames.xyz\")  # Output file path\n\n    def run(self) -\u003e None:\n        frames = smiles2conformers(smiles=self.smiles, numConfs=self.numConfs)\n        ase.io.write(self.frames_path, frames)\n\n    @property\n    def frames(self) -\u003e list[ase.Atoms]:\n        # Load the frames from the output file using the node's filesystem\n        with self.state.fs.open(self.frames_path, \"r\") as f:\n            return list(ase.io.iread(f, \":\", format=\"extxyz\"))\n\n\nclass Pack(zntrack.Node):\n    data: list[list[ase.Atoms]] = zntrack.deps()  # Input dependency (list of ASE Atoms)\n    counts: list[int] = zntrack.params()  # Parameter (list of counts)\n    density: float = zntrack.params()  # Parameter (density value)\n\n    frames_path: Path = zntrack.outs_path(zntrack.nwd / \"frames.xyz\")  # Output file path\n\n    def run(self) -\u003e None:\n        box = pack(data=self.data, counts=self.counts, density=self.density)\n        ase.io.write(self.frames_path, box)\n\n    @property\n    def frames(self) -\u003e list[ase.Atoms]:\n        # Load the packed structure from the output file\n        with self.state.fs.open(self.frames_path, \"r\") as f:\n            return list(ase.io.iread(f, \":\", format=\"extxyz\"))\n\n\n# We could hardcode the MACE_MP model into the StructureOptimization Node, but we\n# can also define it as a dependency. Since the model doesn't require a `run` method,\n# we define it as a `@dataclass`.\n\n\n@dataclass\nclass MACE_MP:\n    model: str = \"medium\"  # Default model type\n\n    def get_calculator(self, **kwargs):\n        return mace_mp(model=self.model)\n\n\nclass StructureOptimization(zntrack.Node):\n    model: MACE_MP = zntrack.deps()  # Dependency (MACE_MP model)\n    data: list[ase.Atoms] = zntrack.deps()  # Dependency (list of ASE Atoms)\n    data_id: int = zntrack.params()  # Parameter (index of the structure to optimize)\n    fmax: float = zntrack.params(0.05)  # Parameter (force convergence threshold)\n\n    frames_path: Path = zntrack.outs_path(zntrack.nwd / \"frames.traj\")  # Output file path\n\n    def run(self):\n        atoms = self.data[self.data_id]\n        atoms.calc = self.model.get_calculator()\n        dyn = LBFGS(atoms, trajectory=self.frames_path.as_posix())\n        dyn.run(fmax=0.5)\n\n    @property\n    def frames(self) -\u003e list[ase.Atoms]:\n        # Load the optimization trajectory from the output file\n        with self.state.fs.open(self.frames_path, \"rb\") as f:\n            return list(ase.io.iread(f, \":\", format=\"traj\"))\n```\n\n#### Building and Running the Workflow\n\nNow that we’ve defined all the necessary Nodes, we can build and execute the\nworkflow. Follow these steps:\n\n1. **Initialize a new directory** for your project:\n\n   ```bash\n   git init\n   dvc init\n   ```\n\n1. **Create a Python module** for the Node definitions:\n\n   - Create a file `src/__init__.py` and place the Node definitions inside it.\n\n1. **Define and execute the workflow** in a `main.py` file:\n\n   ```python\n    from src import MACE_MP, Pack, Smiles2Conformers, StructureOptimization\n\n    import zntrack\n\n    # Initialize the ZnTrack project\n    project = zntrack.Project()\n\n    # Define the MACE-MP model\n    model = MACE_MP()\n\n    # Build the workflow graph\n    with project:\n        etoh = Smiles2Conformers(smiles=\"CCO\", numConfs=32)\n        box = Pack(data=[etoh.frames], counts=[32], density=789)\n        optm = StructureOptimization(model=model, data=box.frames, data_id=-1, fmax=0.5)\n\n    # Execute the workflow\n    project.repro()\n   ```\n\n\u003e [!TIP]\n\u003e If you don’t want to execute the graph immediately, use\n\u003e `project.build()` instead. You can run the graph later using `dvc repro` or\n\u003e the [paraffin](https://github.com/zincware/paraffin) package.\n\n#### Accessing Results\n\nOnce the workflow has been executed, the results are stored in the respective\nfiles. For example, the optimized trajectory is saved in\n`nodes/StructureOptimization/frames.traj`.\n\nYou can load the results directly using ZnTrack, without worrying about file\npaths or formats:\n\n```python\nimport zntrack\n\n# Load the StructureOptimization Node\noptm = zntrack.from_rev(name=\"StructureOptimization\")\n# you can pass `remote: str` and `rev: str` to access data from\n# a different commit or a remote repository.\n\n# Access the optimization trajectory\nprint(optm.frames)\n```\n\n______________________________________________________________________\n\n### More Examples\n\nFor additional examples and advanced use cases, check out these packages built\non top of ZnTrack:\n\n- [mlipx](https://mlipx.readthedocs.io/en/latest/) - Machine Learned Interatomic Potential eXploration.\n- [IPSuite](https://github.com/zincware/IPSuite) - Machine Learned **I**nteratomic **P**otential Tools.\n\n______________________________________________________________________\n\n## References\n\nIf you use ZnTrack in your research, please cite us:\n\n```bibtex\n@misc{zillsZnTrackDataCode2024,\n  title = {{{ZnTrack}} -- {{Data}} as {{Code}}},\n  author = {Zills, Fabian and Sch{\\\"a}fer, Moritz and Tovey, Samuel and K{\\\"a}stner, Johannes and Holm, Christian},\n  year = {2024},\n  eprint={2401.10603},\n  archivePrefix={arXiv},\n}\n```\n\n______________________________________________________________________\n\n## Copyright\n\nThis project is distributed under the\n[Apache License Version 2.0](https://github.com/zincware/ZnTrack/blob/main/LICENSE).\n\n______________________________________________________________________\n\n## Similar Tools\n\nHere’s a list of other projects that either work together with ZnTrack or\nachieve similar results with slightly different goals or programming languages:\n\n- [DVC](https://dvc.org/) - Main dependency of ZnTrack for Data Version Control.\n- [dvthis](https://github.com/jcpsantiago/dvthis) - Introduce DVC to R.\n- [DAGsHub Client](https://github.com/DAGsHub/client) - Logging parameters from\n  within Python.\n- [MLFlow](https://mlflow.org/) - A Machine Learning Lifecycle Platform.\n- [Metaflow](https://metaflow.org/) - A framework for real-life data science.\n- [Hydra](https://hydra.cc/) - A framework for elegantly configuring complex\n  applications.\n- [Snakemake](https://snakemake.readthedocs.io/en/stable/) - Workflow management\n  system for reproducible and scalable data analyses.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzincware%2Fzntrack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzincware%2Fzntrack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzincware%2Fzntrack/lists"}