https://github.com/calkit/calkit

A framework and toolkit for analytical/research projects, with an emphasis on reproducibility.
https://github.com/calkit/calkit

conda docker dvc environments git latex open-science overleaf pipelines reproducibility reproducible-research reproducible-science research version-control workflow-management

Last synced: about 2 months ago
JSON representation

A framework and toolkit for analytical/research projects, with an emphasis on reproducibility.

Host: GitHub
URL: https://github.com/calkit/calkit
Owner: calkit
License: mit
Created: 2024-08-21T23:10:26.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-05-07T13:22:17.000Z (about 1 year ago)
Last Synced: 2025-05-07T14:27:06.299Z (about 1 year ago)
Topics: conda, docker, dvc, environments, git, latex, open-science, overleaf, pipelines, reproducibility, reproducible-research, reproducible-science, research, version-control, workflow-management
Language: Python
Homepage: https://docs.calkit.org
Size: 8.67 MB
Stars: 20
Watchers: 2
Forks: 0
Open Issues: 170
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

Documentation

|

Tutorials

|

Discussions

Calkit helps you manage and automate research projects like a software
engineer.

Define computational environments,
steps that process your data, create figures,
presentations, and publications, connect to external tools,
then iterate quickly and painlessly until your research questions are
answered, tracking changes to all files along the way.
At the end, deliver your entire project as a self-contained, self-documenting,
version-controlled, and
[single button reproducible](https://doi.org/10.1190/1.1822162)
"calculation kit" so you and others can easily verify
and build upon the results.

## Guiding principles

- Quality comes from iteration. Automation reduces the time and effort
needed to iterate, thereby increasing iteration and quality.
- Automating a step can and should take roughly
the same amount of time as doing it once manually,
therefore it's almost always worth it.
- Working in a "quick and dirty" way can easily become _not quick_ when
the dirtiness results in mistakes and/or discourages working in small steps.

## Features

- A simplified [version control](https://docs.calkit.org/version-control)
interface that unifies Git and DVC (Data Version Control),
so all materials can be kept in the same project repository.
This way, code doesn't need to be siloed away from other
important artifacts like datasets, models, figures, or article PDFs,
allowing you to work on all parts of a project without hopping around to
different tools.
- [Computational environment management](https://docs.calkit.org/environments) with support for many
languages and environment managers: Conda, Docker, uv, Julia, Renv, and more.
No need to create and update environments on your own. Calkit will handle
them as needed.
- An environment-aware build system or [pipeline](https://docs.calkit.org/pipeline) with
a simple declarative syntax and
output caching so you don't need to think about which steps or stages
need to be rerun after changing any part of the project.
Simply call `calkit run`.
Compose your pipeline from many different kinds of stages,
including simple scripts, commands, Jupyter Notebooks, LaTeX, and more.
- A complementary self-hostable and GitHub-integrated
[cloud platform](https://github.com/calkit/calkit-cloud)
to facilitate backup, collaboration,
and sharing throughout the entire research lifecycle.
- [Overleaf integration](https://docs.calkit.org/overleaf/), so
analysis, visualization, and writing can all stay in sync
(no more manual uploads!)
- Support for running on high performance computing (HPC) systems that use
[SLURM schedulers](https://docs.calkit.org/pipeline/slurm).
- Support for running with [GitHub Actions](https://docs.calkit.org/tutorials/github-actions).
- Extensions for doing all of the above graphically in
[JupyterLab](https://docs.calkit.org/jupyterlab) and
[VS Code](https://marketplace.visualstudio.com/items?itemName=Calkit.calkit-vscode).

## Installation

On Linux, macOS, or Windows Git Bash,
install Calkit and [uv](https://docs.astral.sh/uv/)
(if not already installed) with:

```sh
curl -LsSf install.calkit.org | sh
```

Or with Windows Command Prompt or PowerShell:

```powershell
powershell -ExecutionPolicy ByPass -c "irm install-ps1.calkit.org | iex"
```

If you already have uv installed, install Calkit with:

```sh
uv tool install calkit-python
```

You can also install with your system Python:

```sh
pip install calkit-python
```

To effectively use Calkit, you'll want to ensure [Git](https://git-scm.com)
is installed and properly configured.
You may also want to install [Docker](https://docker.com),
since that is the default method by which LaTeX environments are created.
If you want to use the [Calkit Cloud](https://calkit.io)
for collaboration and backup as a DVC remote,
you can [set up cloud integration](https://docs.calkit.org/cloud-integration) with:

```sh
calkit cloud login
```

If you use AI agents like Claude, Copilot, or Codex,
see [AI tools](https://docs.calkit.org/ai-tools)
to learn how to install agent skills for working with Calkit.

### Use without installing

If you want to use Calkit without installing it,
you can use uv's `uvx` command to run it directly:

```sh
uvx calk9 --help
```

### Calkit Assistant

For Windows users, the
[Calkit Assistant](https://github.com/calkit/calkit-assistant)
app is the easiest way to get everything set up and ready to work in
VS Code, which can then be used as the primary app for working on
all scientific or analytical computing projects.

![Calkit Assistant](https://github.com/calkit/calkit-assistant/blob/main/resources/screenshot.png?raw=true)

## Quickstart

!!! note
`ck` is an abbreviated alias for the `calkit` executable.
All `calkit` commands can be run as `ck` instead, e.g., `ck save -am "..."`.

### From an existing project

If you want to use Calkit with an existing project,
navigate into its working directory and use the `xr` command to start
executing and recording your scripts, notebooks, LaTeX files, etc.,
as reproducible pipeline stages.
For example:

```sh
calkit xr scripts/analyze.py

calkit xr notebooks/plot.ipynb

calkit xr paper/main.tex
```

Calkit will attempt to detect environments, inputs, and outputs and
save them in `calkit.yaml`.
If successful,
you'll be able to run the full pipeline with:

```sh
calkit run
```

Next, make a change to e.g., a script and look at the output of
`calkit status`.
You'll see that the pipeline has a stage that is out-of-date:

```sh
---------------------------- Pipeline ----------------------------
analyze:
changed deps:
modified: scripts/analyze.py
```

This can be fixed with another call to `calkit run`.

You can save (add and commit) all changes with:

```sh
calkit save -am "Add to pipeline"
```

### Fresh from a Calkit project template

Create a new project from the
[`calkit/example-basic`](https://github.com/calkit/example-basic)
template with:

```sh
calkit new project my-research \
--title "My research" \
--template calkit/example-basic \
--cloud
```

Note the `--cloud` flag requires [cloud integration](https://docs.calkit.org/cloud-integration)
to be set up, but can be omitted if the project doesn't need to be backed up to
the cloud or shared with collaborators.
Cloud integration can also be set up later.

Next, move into the project folder and run the pipeline,
which consists of several stages defined in `calkit.yaml`:

```sh
cd my-research
calkit run
```

Next, make some edits to a script or LaTeX file and run `calkit status` to
see what stages are out-of-date.
For example:

```sh
---------------------------- Pipeline ----------------------------
build-paper:
changed deps:
modified: paper/paper.tex
```

Execute `calkit run` again to bring everything up-to-date.

To back up or save the project, call:

```sh
calkit save -am "Run pipeline"
```

## Get involved

We welcome all kinds of contributions!
See [CONTRIBUTING.md](CONTRIBUTING.md) to learn how to get involved.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/calkit/calkit

Awesome Lists containing this project

README