Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dennwc/cas

Content Addressible Storage
https://github.com/dennwc/cas

content-addressable-storage golang

Last synced: 16 days ago
JSON representation

Content Addressible Storage

Awesome Lists containing this project

README

        

# Content Addressable Storage

[![Go Reference](https://pkg.go.dev/badge/github.com/dennwc/cas.svg)](https://pkg.go.dev/github.com/dennwc/cas)
[![Join the chat at https://app.gitter.im/#/room/#dennwc_cas:gitter.im](https://badges.gitter.im/dennwc/cas.svg)](https://app.gitter.im/#/room/#dennwc_cas:gitter.im)

This project implements a simple and pragmatic approach to Content Addressable Storage (CAS).
It was heavily influenced by [Perkeep](https://perkeep.org/) (aka Camlistore) and Git.

For more details, see [concepts](./docs/concepts.md) and [comparison](./docs/comparison.md) with other systems.

## Status

The project is stable, and further work is ongoing on designing CAS2 - more flexible and performant version.
This project will receive bug fixed and maintenance work. New features will likely end up in CAS2.

Check the [Quick start guide](./docs/quickstart.md) for a list of basic commands.

## Goals

- **Simplicity:** the core specification should be trivial to implement.

- **Interop:** CAS should play nicely with existing tools and technologies,
either content-addressable or not.

- **Easy to use:** CAS should be a single command away, similar to `git init`.

## Use cases

- Immutable and versioned archives: CAS supports files with multiple
TBs of data, folders with millions of files and can index and use remote
data without storing it locally.

- Data processing pipelines: CAS caching capabilities allows to use it for
incremental data pipelines.

- Git for large files: CAS stores files with an assumption that they can
be multiple TBs and is optimized for this use case, while still supporting
tags and branches, like Git.

## Features and the roadmap

**Implemented:**

- Fast file hashing
- SHA-256, other can be used
- Stores results in file attributes (cache)
- Support for large archives
- Large contiguous files (> TB)
- Large multipart files (> TB)
- Large directories (> millions of files)
- Zero-copy file fetch (BTRFS)
- Integrations
- Can index and sync web content
- HTTP(S) caching (as a Go library)
- Remote storage
- Self-hosted HTTP CAS server (read-only)
- Google Cloud Storage
- Usability
- Mutable objects (pins)
- Local storage in Git fashion
- Data pipelines
- Extendable
- Caches results
- Incremental

**Planned (for CAS2):**

- Support for large multipart files (> TB)
- Support multilevel parts
- Support blob splitters (rolling checksum, new line, etc)
- Remote storage
- AWS, etc
- Self-hosted HTTP CAS server (read-write)
- Integration with Git
- Zero-copy fetch from Git (either remote or local)
- LFS integration
- Integration with Docker
- Zero-copy fetch of an image from Docker
- Unpack FS images to CAS
- Use containers in pipelines
- Integration with BitTorrent:
- Store torrent files
- Download torrent data directly to CAS
- To consider: expose CAS as a peer
- Integration with other CAS systems:
- Perkeep
- Upspin
- IPFS
- Windows and OSX support
- Better support for pipelines