https://github.com/quiltdata/quilt-renovate
Destination for renovate PRs to keep PR volume down
https://github.com/quiltdata/quilt-renovate
Last synced: 9 months ago
JSON representation
Destination for renovate PRs to keep PR volume down
- Host: GitHub
- URL: https://github.com/quiltdata/quilt-renovate
- Owner: quiltdata
- License: apache-2.0
- Created: 2019-11-24T21:29:54.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T06:56:28.000Z (over 3 years ago)
- Last Synced: 2025-05-15T19:09:27.961Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 44 MB
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 42
-
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: docs/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
[](https://docs.quiltdata.com/)
[](https://slack.quiltdata.com/)
[](https://codecov.io/gh/quiltdata/quilt)
[](https://pypi.org/project/quilt3/)
> Below is the documentation for [Quilt 3](https://quiltdata.com/). See [here](https://docs.quiltdata.com/v/quilt-2-master/) and [here](https://github.com/quiltdata/quilt/tree/quilt-2-master) from Quilt 2.
# Quilt is a versioned data portal for AWS
* [open.quiltdata.com](https://open.quiltdata.com/) is a petabyte-scale open data portal that runs on Quilt
* [quiltdata.com](https://quiltdata.com) includes case studies, use cases, videos, and information on how you can run a private Quilt instance
## Who is Quilt for?
Quilt is for data-driven teams of both technical
and non-technical members (executives, data scientists,
data engineers, sales, product, etc.).
## What does Quilt do?
Quilt adds search, visual content preview, and
versioning to every file in S3.
## How does Quilt work?
Quilt consists of a Python client, web catalog, lambda
functions—all of which are open source—plus
a suite of backend services and Docker containers
orchestrated by CloudFormation.
The latter are available under a paid license for
private use on [quiltdata.com](https://quiltdata.com).
## Use cases
Quilt addresses five key use cases:
* **Share** data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to create an IAM role).
* **Understand** data better through inline documentation (Jupyter notebooks, markdown) and visualizations (Vega, Vega Lite)
* **Discover** related data by indexing objects in ElasticSearch
* **Model** data by providing a home for large data and models that don't fit in git, and by providing immutable versions for objects and data sets (a.k.a. "Quilt Packages")
* **Decide** by broadening data access within the organization and supporting the documentation of decision processes through audit-able versioning and inline documentation
## Roadmap
### I - Performance and core services
* [ ] Address performance issues with push (e.g. re-hash)
* [ ] Refactor `s3://bucket/.quilt` for improved listing and delete performance
* [ ] Metadata services for filtering packages
### II - CI/CD for data
* [ ] Ability to fork/merge packages
* [ ] Data quality monitoring
### III - Storage agnostic (support Azure, GCP buckets)
* [ ] Evaluate min.io and ceph.io
* [ ] Evaluate feasibility of local storage (e.g. NAS)
### IV - Cloud agnostic
* [ ] K8s deployment for Azure, GCP
* [ ] Shim lambdas (consider serverless.com)
* [ ] Shim ElasticSearch (consider SOLR)
* [ ] Shim IAM via RBAC