Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hdfgroup/hsds

Cloud-native, service based access to HDF data
https://github.com/hdfgroup/hsds

asyncio aws data-analysis docker hdf5 multi-dimensional python scientific-data

Last synced: 3 days ago
JSON representation

Cloud-native, service based access to HDF data

Awesome Lists containing this project

README

        

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/hdfgroup/hsds)

# HSDS (Highly Scalable Data Service) - REST-based service for HDF5 data

## Introduction

HSDS is a web service that implements a REST-based web service for HDF5 data stores.
Data can be stored in either a POSIX files system, or using object-based storage such as
AWS S3, Azure Blob Storage, or [MinIO](https://min.io).
HSDS can be run a single machine with or without Docker or on a cluster using Kubernetes (or AKS on Microsoft Azure).

## Quick Start

### With Github codespaces

Launch a Codespaces environment by clicking the banner __["Open in GitHub Codespaces"](https://codespaces.new/HDFGroup/hsds)__. Once the codespace is ready, type:
`python testall.py` in the terminal window to run the test suite.

### On your desktop/laptop

Make sure you have Python 3 and Pip installed, then:

1. Run install: `$ ./build.sh --no-lint --no-docker` from source tree OR install from pypi: `$ pip install hsds`
2. Create a directory the server will use to store data, example: `$ mkdir ~/hsds_data`
3. Start server: `$ hsds --root_dir ~/hsds_data`
4. Run the test suite. In a separate terminal run:
- Set user_name: `$ export USER_NAME=$USER`
- Set user_password: `$ export USER_PASSWORD=$USER`
- Set admin name: `$ export ADMIN_USERNAME=$USER`
- Set admin password: `$ export ADMIN_PASSWORD=$USER`
- Run test suite: `$ python testall.py --skip_unit`
5. (Optional) Install the h5pyd package for an h5py compatible api and tool suite: https://github.com/HDFGroup/h5pyd
6. (Optional) Post install setup (test data, home folders, cli tools, etc): [docs/post_install.md](docs/post_install.md)

To shut down the server, and the server is not running in Docker, just control-C.

If using docker, run: `$ ./stopall.sh`

Note: passwords can (and should for production use) be modified by changing values in hsds/admin/config/password.txt and rebuilding the docker image. Alternatively, an external identity provider such as Azure Active Directory or KeyCloak can be used. See: [docs/azure_ad_setup.md](docs/azure_ad_setup.md) for Azure AD setup instructions or [docs/keycloak_setup.md](docs/keycloak_setup.md) for KeyCloak.

## Detailed Install Instructions

### On AWS

For complete instructions to install on a single Azure VM with Docker:

- See: [docs/docker_install_aws.md](docs/docker_install_aws.md)

For complete instructions to install on AWS Kubernetes Service (EKS):

- See: [docs/kubernetes_install_aws.md](docs/kubernetes_install_aws.md)

For complete instructions to install on AWS Lambda:

- See: [docs/aws_lambda_setup.md](docs/aws_lambda_setup.md).

### On Azure

For complete instructions to install on a single Azure VM with Docker:

- See: [docs/docker_install_azure.md](docs/docker_install_azure.md)

For complete instructions to install on Azure Kubernetes Service (AKS):

- See: [docs/kubernetes_install_azure.md](docs/kubernetes_install_azure.md)

### On Prem (POSIX-based storage)

For complete instructions to install on a desktop or local server:

- See: [docs/docker_install_posix.md](docs/docker_install_posix.md)

### On DCOS (BETA)

For complete instructions to install on DCOS:

- See: [docs/docker_install_dcos.md](docs/docker_install_dcos.md)

## General Install Topics

Setting up docker:

- See [docs/setup_docker.md](docs/setup_docker.md)

Post install setup and testing:

- See [docs/post_install.md](docs/post_install.md)

Authorization, ACLs, and Role Based Access Control (RBAC):

- See [docs/authorization.md](docs/authorization.md)

## Writing Client Applications

As a REST service, clients be developed using almost any programming language. The
test programs under: hsds/test/integ illustrate some of the methods for performing
different operations using Python and HSDS REST API (using the requests package).

The related project: provides a (mostly) h5py-compatible
interface to the server for Python clients.

For C/C++ clients, the HDF REST VOL is a HDF5 library plugin that enables the HDF5 API to read and write data
using HSDS. See: . Note: requires v1.12.0 or greater version of the HDF5 library.

## Uninstalling

HSDS only modifies the storage location that it is configured to use, so to uninstall just remove
source files, Docker images, and S3 bucket/Azure Container/directory files.

## Reporting bugs (and general feedback)

Create new issues at for any problems you find.

For general questions/feedback, please use the HSDS forum: .

## License

HSDS is licensed under an APACHE 2.0 license. See LICENSE in this directory.

## Azure Marketplace

VM Offer for Azure Marketplace. HSDS for Azure Marketplace provides an easy way to
setup a Azure instance with HSDS. See: for more information.

## Websites

- Main website:
- Source code:
- Forum:
- Documentation: (For REST API)

## Other useful resources

### HDF Group Blog Posts

- Web Caching:
- HSDS Streaming:
- Cloud Storage Options for HDF5:
- HSDS Docker Images:
- HSDS Container Types:
- Using Multiprocessing in Python:
- Biosimulations - case study with HSDS and Vega:
- HSDS for Microsoft Azure:
- New Features in HSDS v0.6:
- HSDS Security:
- HDF for the Web: HDF Server:

### External Blogs and Articles

- A RESTful Meeting Between MATLAB and HDF Server:
- AWS Big Data Blog:

### Slide Decks

- HSDS v0.7 New Features, EUHUG 2022:
- HSDS Serverless, EUHUG 2021:
- HSDS REST, HUG 2020:
- HSDS with Jupyter, ESIP 2018:
- HDF Data Services, SciPy17:

### Videos

- HSDS Webinar:
- HSDS Overview, Allotrope Connect Day:
- The Use of HSDS on SlideRule, HUG 2020:
- HDF Data Services, SciPy 2017:
- RESTful HDF, SciPy 2015:

### Papers

- restfulSE: A semantically rich interface for cloud-scale genomics with Bioconductor:
- RESTful HDF5 White Paper: