https://github.com/ivotron/vio
Versioning for input/output files of version-controlled projects
https://github.com/ivotron/vio
Last synced: about 1 year ago
JSON representation
Versioning for input/output files of version-controlled projects
- Host: GitHub
- URL: https://github.com/ivotron/vio
- Owner: ivotron
- License: bsd-3-clause
- Created: 2015-12-12T21:20:48.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-03-28T07:26:47.000Z (about 10 years ago)
- Last Synced: 2025-02-11T12:18:57.651Z (over 1 year ago)
- Language: Go
- Homepage:
- Size: 19.5 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# vio
Versioning for input/output files.
When working with a version-controlled project, we often use/obtain
artifacts (configuration files, logs, measurements, figures, etc.)
for/from programs that correspond to a particular version of the
project, but that are not part of it (i.e. not being kept track by the
VCS). After a couple of executions, it quickly becomes difficult to
keep track of what versions of the project consumed/generated which
files. `vio` helps to deal with this issue by allowing a user to
create a snapshot of the unversioned files after a program has
executed, and to store and associate this snapshot with the latest
revision of the project.
## Example
```bash
git clone https://project.git
cd project
# work, work, work
git add -u
git commit -m "I worked hard and implemented many things"
# parametrize execution
echo "my configs for a particular execution" > params.conf
# execute and generate some results
exec program -c params.conf > execution.out
# commit anything that is not being tracked by git. In this
# particular case, files params.conf and execution.out
vio commit -m "the result of my hard work"
```
## High-level
In a nutshell, vio:
1. Finds all files that are not tracked by the VCS.
2. Creates a dataset of all unversioned files.
3. Puts the dataset in a storage backend, associating it to an
execution ID (`commit_id + timestamp`).
4. Provides versioning-semantics for datasets, allowing users to
compare between distinct versions.
5. Stores metadata for datasets, allowing users to annotate and
contextualize them for future introspection.
The vio's "database" has the following schema:
```
commit_id | execution_id | vio_commit_message | files | metadata |
```
`commit_id` corresponds to the version in a VCS while `execution_id`
to a timestamp obtained at the moment when the snapshot is created.
`files` is the working directory snapshot of all unversioned files.
Lastly, `metadata` is a collection of key-value pairs.
# vio vs. other tools
## `git-lfs`
`git-lfs` allows the inclusion of large files into a git repo. The
main difference between vio and `git-lfs` is that `vio` lets you
associate multiple datasets (or filesystem snapshots) to a single
version of the git repo, while `git-lfs` can only associate a single
one. In other words, the relationship between git commits and commits
in the storage backend is one-to-one for `git-lfs` while one-to-many
for `vio`.
Given the above, `vio` can use `git-lfs` as a backend, in the same way
that the `git` backend is used by `vio`.
Other tools such as `git-annex`, etc. also fall in this category.
## artifact repositories
**TODO**
## CI tools
**TODO**
# references
Some use cases that this tool is aimed at solving:
*
*
*