https://github.com/databiosphere/ssds

Simple data storage system for AWS and GCP
https://github.com/databiosphere/ssds

Last synced: about 1 year ago
JSON representation

Simple data storage system for AWS and GCP

Host: GitHub
URL: https://github.com/databiosphere/ssds
Owner: DataBiosphere
License: mit
Created: 2020-05-29T12:20:44.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2022-06-30T17:21:08.000Z (about 4 years ago)
Last Synced: 2025-06-01T06:52:36.137Z (about 1 year ago)
Language: Python
Size: 304 KB
Stars: 2
Watchers: 3
Forks: 1
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# ssds
The s¹ simple data store.

Upload directory trees to S3 or GS cloud buckets as "submissions". Each submission takes a user assigned
identifier and a human readable name. The cloud location of the submission has the key structure
`submissions/{uuid}--{name}/{tree}`.

All uploads are checksummed and verified. Multpart uploads use defined chunk sizes to consistently track compose S3
ETags.

# Installation

```
pip install git+https://github.com/DataBiosphere/ssds
```

# Usage
Make a new submission

```
ssds staging upload --submission-id my_submission_id --name my_cool_submission_name /local/path/to/my/submission
```

Update an existing submission

```
ssds staging upload --submission-id my_existing_submission_id /local/path/to/my/submission
```

List all staging submissions

```
ssds staging list
```

List contents of a staging submission

```
ssds staging list-submission --submission-id my_submission_id
```

The above commands can target staging deployments other than the default with the `--deployment` argument.

Available deployments can be listed with

```
ssds deployment list-staging
```

and

```
ssds deployment list-release
```

Submissions can be synced between staging deployments with
```
ssds staging sync --submission-id my_existing_submission_id --dst-deployment my_dst_deployment
```

## Configuring Billing Projects for Requester Pays Google Buckets

For working with requester pays Google Storage buckets, the billing project is specified by setting the
environment variable `GOOGLE_PROJECT`, e.g.
```
export GOOGLE_PROJECT="my-gcp-billing-project"
```

# Developing

Run tests with
```
make test
```
If `mypy` linting fails, you may need to run
```
mypy --install-types
```

Many tests require access to test buckets listed in `ssds/deployment.py`.

These buckets are in the `pangenomics` AWS account.
Be sure to configure your S3 and GS credentials have access.

## Links
Project home page [GitHub](https://github.com/DataBiosphere/ssds)

### Bugs
Please report bugs, issues, feature requests, etc. on [GitHub](https://github.com/DataBiosphere/ssds).

![](https://travis-ci.org/DataBiosphere/ssds.svg?branch=master)

¹super, splendidly, serendipitous, sometimes, sporadically, etc.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/databiosphere/ssds

Awesome Lists containing this project

README