https://github.com/databiosphere/ssds
Simple data storage system for AWS and GCP
https://github.com/databiosphere/ssds
Last synced: 9 months ago
JSON representation
Simple data storage system for AWS and GCP
- Host: GitHub
- URL: https://github.com/databiosphere/ssds
- Owner: DataBiosphere
- License: mit
- Created: 2020-05-29T12:20:44.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2022-06-30T17:21:08.000Z (over 3 years ago)
- Last Synced: 2025-06-01T06:52:36.137Z (10 months ago)
- Language: Python
- Size: 304 KB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ssds
The s1 simple data store.
Upload directory trees to S3 or GS cloud buckets as "submissions". Each submission takes a user assigned
identifier and a human readable name. The cloud location of the submission has the key structure
`submissions/{uuid}--{name}/{tree}`.
All uploads are checksummed and verified. Multpart uploads use defined chunk sizes to consistently track compose S3
ETags.
# Installation
```
pip install git+https://github.com/DataBiosphere/ssds
```
# Usage
Make a new submission
```
ssds staging upload --submission-id my_submission_id --name my_cool_submission_name /local/path/to/my/submission
```
Update an existing submission
```
ssds staging upload --submission-id my_existing_submission_id /local/path/to/my/submission
```
List all staging submissions
```
ssds staging list
```
List contents of a staging submission
```
ssds staging list-submission --submission-id my_submission_id
```
The above commands can target staging deployments other than the default with the `--deployment` argument.
Available deployments can be listed with
```
ssds deployment list-staging
```
and
```
ssds deployment list-release
```
Submissions can be synced between staging deployments with
```
ssds staging sync --submission-id my_existing_submission_id --dst-deployment my_dst_deployment
```
## Configuring Billing Projects for Requester Pays Google Buckets
For working with requester pays Google Storage buckets, the billing project is specified by setting the
environment variable `GOOGLE_PROJECT`, e.g.
```
export GOOGLE_PROJECT="my-gcp-billing-project"
```
# Developing
Run tests with
```
make test
```
If `mypy` linting fails, you may need to run
```
mypy --install-types
```
Many tests require access to test buckets listed in `ssds/deployment.py`.
These buckets are in the `pangenomics` AWS account.
Be sure to configure your S3 and GS credentials have access.
## Links
Project home page [GitHub](https://github.com/DataBiosphere/ssds)
### Bugs
Please report bugs, issues, feature requests, etc. on [GitHub](https://github.com/DataBiosphere/ssds).

1super, splendidly, serendipitous, sometimes, sporadically, etc.