{"id":37049296,"url":"https://github.com/itislab/alpheus","last_synced_at":"2026-01-14T05:43:57.663Z","repository":{"id":38051207,"uuid":"159497564","full_name":"itislab/alpheus","owner":"itislab","description":"Alpheus: A tool for organizing and managing computational experiments","archived":false,"fork":false,"pushed_at":"2022-12-07T14:59:14.000Z","size":832,"stargazers_count":6,"open_issues_count":24,"forks_count":5,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-08-22T19:46:09.707Z","etag":null,"topics":["computational-science","dotnet-cli","dotnet-core","machine-learning","nuget"],"latest_commit_sha":null,"homepage":"","language":"F#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/itislab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-11-28T12:18:57.000Z","updated_at":"2022-11-02T21:13:14.000Z","dependencies_parsed_at":"2023-01-24T17:45:53.169Z","dependency_job_id":null,"html_url":"https://github.com/itislab/alpheus","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/itislab/alpheus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itislab%2Falpheus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itislab%2Falpheus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itislab%2Falpheus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itislab%2Falpheus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/itislab","download_url":"https://codeload.github.com/itislab/alpheus/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itislab%2Falpheus/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28411234,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T05:26:33.345Z","status":"ssl_error","status_checked_at":"2026-01-14T05:21:57.251Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-science","dotnet-cli","dotnet-core","machine-learning","nuget"],"created_at":"2026-01-14T05:43:57.145Z","updated_at":"2026-01-14T05:43:57.657Z","avatar_url":"https://github.com/itislab.png","language":"F#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Alpheus\n\n| Platform        | Build           |\n| ------------- |:-------------:|\n| Windows   | [![VS2019 Build Status](https://ci.appveyor.com/api/projects/status/2q24972vyycmjdxx/branch/master?svg=true)](https://ci.appveyor.com/project/dgrechka/alpheus) |\n| Linux     | [![Linux Build Status](https://drone.k8s.grechka.family/api/badges/itislab/alpheus/status.svg)](https://drone.k8s.grechka.family/itislab/alpheus)  |\n\n[![codecov](https://codecov.io/gh/itislab/alpheus/branch/master/graph/badge.svg)](https://codecov.io/gh/itislab/alpheus)\n[![Tests](https://img.shields.io/appveyor/tests/dgrechka/alpheus/master)](https://ci.appveyor.com/project/dgrechka/alpheus/build/tests)\n![NuGet](https://img.shields.io/nuget/v/Alpheus-cli.svg)\n[![NuGet](https://img.shields.io/nuget/dt/Alpheus-cli.svg)](https://www.nuget.org/packages/Alpheus-cli/)\n\n\nA tool for organizing and managing computational experiments. Reproducible research is a way of conducting research allowing to get a provenance of any result and be able to compute it again. Alpheus forces a researcher to follow this way.\n\nAdvantages:\n_todo: complete the list_\n\n- Builds a dependency graph of operations.\n- Incrementally computes only that data which were affected by a change.\n- Allows reproducing the data.\n- Allows using your usual tools and is based on folders/files.\n\n## Why Alpheus?\nAlpheus uses full power of .Net Core threadpool processing, asyncronous IO and asyncronous computations. You will be surprised how effective Alpheus deals with large collection of small files saving them to artefact storages and restoring them back when needed.\n\n## Installation\nYou need to have [.Net Core SDK 6](https://dotnet.microsoft.com/en-us/download) installed.\n\nIf you have it install the latest version of Alpheus with\n\n```\ndotnet tool install --global Alpheus-cli\n```\n\n## Usage\n\nYou can call `alpheus` in the command line\n\n```\nC:\\project1\u003ealpheus help\nUSAGE: alpheus [help] [\u003csubcommand\u003e [\u003coptions\u003e]]\n\nSUBCOMMANDS:\n\n    init \u003coptions\u003e        Make the current directory an Alpheus experiment directory\n    config \u003coptions\u003e      Modify configuration of research directory\n    build \u003coptions\u003e       Creates an experiment graph node\n    compute \u003coptions\u003e     Tries to compute the graph to make the outdated node up to date\n    status \u003coptions\u003e      Prints the graph status for particular .alph file\n    save \u003coptions\u003e        Save a copy of file/directory to the storage(s)\n    restore \u003coptions\u003e     Restore a copy of the file/directory from storage\n\n    Use 'alpheus \u003csubcommand\u003e help' for additional information.\n\nOPTIONS:\n\n    help                  display this list of options.\n```\n\n## Build\n\nBefore building the code, you need to make sure the machine has the following tools installed:\n\n1. [.Net Core SDK 6](https://dotnet.microsoft.com/en-us/download)\n\n\nClone the repository and run the following command in the root of the repository:\n\n```\ndotnet build\n```\n\nYou can also open Alpheus.sln using Visual Studio 2019 (or newer) and build the solution.\n\n## Tests\n\nRun the following command in the root of the repository:\n\n```\ndotnet test\n```\n\n## Documentation\n\n### Initialization\n\nIn the root folder of the experiment run the following command:\n\n```\nalpheus init\n```\n\nThis creates new folder `.alpheus` with default settings described in `.alpheus/config.json`. This folder should be committed to the git repository.\n\nNow the root folder can be called the _experiment folder_.\n\n### Adding new method\n\nExperiment is a composition of _methods_ producing and consuming _artefacts_. Each method is a command line operation registered using the command `alpheus build`. An artefact is a file or a folder located within the experiment folder. Folder artefacts are denoted with path with trailing '\\\\' on Windows and '/' on Linux.\n\nFor example, the following command registers a method which produces an output artefact `author.txt` by running command `whoami \u003e author.txt`:\n\n```\nalpheus build -o \"author.txt\" \"cmd /c whoami \u003e $out1\"\n```\n\nNote that this command doesn't actually run anything, but just creates `author.txt.alph` file which describes how `author.txt` can be produced. When there are many methods, these description files allow to build a dependency graph for methods of the experiment.\n\nLet the `scripts/count.py` script contains two arguments: input file and output file, and puts number of characters in the input file to the output file. The following command registers a method which runs the script for the `author.txt` and builds `count.txt`:\n\n```\nalpheus build -d \"scripts/count.py\" -d \"author.txt\" -o \"count.txt\" \"python $in1 $in2 $out1\"\n```\n\nNote that we manifest that the new method depends on output of the first method, `author.txt`. This information is stored in the created file `count.txt.alph`. \n\nAll `*.alph` files must be committed to the git repository, so the experiment workflow is shared.\n\n### Computing an artefact\n\nTo compute an artefact, use `alpheus compute`. For instance, the following command computes `count.txt`:\n\n```\nalpheus compute count.txt\n```\n\nAlpheus builds the dependency graph of methods needed in order to produce the required file and then runs only those methods which have no up-to-date outputs. Alpheus automatically determines changes in files/directories, so you don't need to worry if the output is consistent. As a result, we get both `author.txt` and `count.txt`. \n\nIt is up to you whether you want to commit these files to the git repository or push them to an external ertefact storage (described later), or keep them just on the local machine. In the latter case, on other machines these files must be recomputed, if needed.\n\n\n### Removing an artefact/method\n\nJust delete corresponding `*.alph` files. Note that you can break the dependencies by deleting artefacts required by other methods. In this case, the computation of those methods will fail.\n\n### Getting status of an experiment\n\nYou can see the status of the artefact in your experiment folder with the command `alpheus status`\n\ne.g.\n\n```\nalpheus status count.txt\n```\n\nThe command will show you status of the artefact that you've specifed and also the artefacts that are used to produce the current one.\n\nThere are two statuses of an artefact:\n\n- *Needs recomputation* indicates that the artefact is obsolete because of one of the following reasons:\n\n    - Input artefact is outded.\n    - Output artefact is modified not using alpheus.\n    - The command producing the artefact is modified.\n    - The order of the inputs is changed.\n  \n- *Up-to-date* indicates that the correct version of the artefact is available either locally or on the artefact storage, such that this artefact is a result of the command applied to the current inputs.\n  \n\n### Vector operations [Experimental - not stable]\n\nIf you need to perform an identical operation with multiple artefacts, you should provide both input and output paths with an asterisk (*) when declaring a method:\n\n```\nalpheus build -d \"scripts/count.py\" -d \"files/*.txt\" -o \"counts/*.txt\" \"python $in1 $in2 $out1\"\n```\n\nIn the given example, the script `count.py` will be executed for each text file in the `files` folder, and \nthere will be text files with same name containing counts in the `count` folder.\n\nMaximum number of asterisks in the inputs and number of asterisks in the outputs must be same.\nNumber of asterisks define dimensionality of the vector operations.\n\nNotes:\n- The rule is: an asterisk in a path item (i.e. between directory separators) indicates a directory (e.g. `*/data.csv`);\nan asterisk with an extension means a file (e.g. `data/*.csv`)\n- An asterisk in the input can change its type in the output (e.g. the input files `data/*.csv` to the output directories `metadata/*/`).\n- When a vector method instance runs, an asterisk in the output path is replaced with the value of the corresponding asterisk in the input.\nThis also means that it is possible to change an extension of the output file (e.g. the input `data/*.csv` to the output `metadata/*.json`).\n- Only one input can be a vector (i.e. contain one or more asterisks), others must be scalar. Reason for that\nare two unclear issues so far: (1) if there are two vector inputs, what is the joint input? is it cartaesian product of the two or just\npairs with same indices? if the latter, how to order input items? (2) how to name outputs in this case?\n\nTo aggregate results of a vector operation, just don't specify an asterisk in the output:\n\n```\nalpheus build -o \"summary.txt\" -d \"scripts/build_summary.py\" -d \"files/*.txt\" \"python $in1 $in2 $out1\"\n```\n\nIn this case, the script gets the input pattern as an argument, e.g. a full path for \"files/*.txt\" as `$in2`.\n\n\n\n#### When to use vector operations [Experimental - not stable]\n\nYou may wonder why not to create a method that enumerates needed files itself. You may and is advised to do so in case you have lightway operation on many small files (e.g. image format conversion). \nUse Alpheus vector operations if your vector element processing is considerably heavy.\n\nAs the Alpheus keeps track of readyness status of every vector element (file/dir that match the pattern with \\*) it impose the overhead. \nOn the other hand Alpheus will skip computation of up-to-date vector element and process only those vector elements that require (re)computation. \nThis can help as a checkpoint in case of resuming an interrupted computation.\n\n### Using external storage for artefacts\n\n\n## Support for standard tools and languages\n\n_todo_\n\n## Common worfklow\n\n_todo_: how the user builds the expemeriment, saves, shares. We recommend to start with adding and debugging scripts manually then register it in the dependency graph.\n\n## Migration from a bunch of scripts and data files\n\n_todo_: builds alpheus experiment when you already have a bunch of scripts and files.\n\n## Sharing and collaborating\n\n_todo_\n\n## Running in Cloud\n\n_todo_\n\n## Guidelines\n\n### Save intermediate artefacts \u0026 delete them from disk\nTo speed up the reproducibility checks even more, it is adviced to save and remove from disk any artefact that is not desired to be used directly (as inputs to newly created methods or for manual inspection). This will free Alheus from obligation to check that the disk version of the artefact has not changed since the artefact had been produced.\n\n### File structure\n\n**Shared or exclusive code?**\nWorking on a problem, eventually you get a solid results you want to keep, but fork another experiment to try another approach. \nAfter some time, you might get experiments (e.g. having different models) with some code that can be re-used in the experiments.\n\nThe conflict is that from one side the shared code is useful since the code improvement automatically affects all the experiments. From \nthe other side, being modified it invalidates the computed results, which is bad.\n\nTherefore we suggest consider not sharing the code, but use copy/paste when you explicitly want to propagate changes from one experiment to another. git will help to resolve changes.\n\n## Questions and Answers\n\n#### Why not to use some build system, like make, cmake, etc.?\nBuild systems usually determine that the file/folder needs to be recalulated using modification time. This will not work if you push the artefact to some storage and later pull them back to your hard drive. Alpheus uses modification timestamps along with the data checksums to verify that the actual disk version matches the expected version of the artefact.\nAlpheus gives you an ability to offload most of the experiment artefact to external storages to free your local disk drive. It would be impossible with build tools, like cmake.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitislab%2Falpheus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fitislab%2Falpheus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitislab%2Falpheus/lists"}