Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MVP-Labs/compute-to-data
Privacy-preserving data sandbox for on-premise computation
https://github.com/MVP-Labs/compute-to-data
cryptography deep-learning on-premise privacy traceability
Last synced: 3 months ago
JSON representation
Privacy-preserving data sandbox for on-premise computation
- Host: GitHub
- URL: https://github.com/MVP-Labs/compute-to-data
- Owner: MVP-Labs
- License: lgpl-2.1
- Archived: true
- Created: 2021-02-01T10:13:12.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-06-15T05:59:39.000Z (over 3 years ago)
- Last Synced: 2024-05-18T19:10:33.283Z (9 months ago)
- Topics: cryptography, deep-learning, on-premise, privacy, traceability
- Language: Python
- Homepage:
- Size: 14.4 MB
- Stars: 11
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-federated-computing - Compute-to-Data
README
# Compute-to-Data
[中文版](./README_CN.md)
## Overview
This project implements a on-premise data sandbox for serving private computation of sensitive data. Third-party scientists can execute codes remotely and get results on data they cannot see. The data grid will automatically verify the data service terms for its owner. The whole process of data sharing and utilization is traceable and auditable.
We provide `dsb` and `dt_cli` toolkits for data owners and scientists. The `dsb` is a Flask-based service deployment tool for data assets, allowing data owners to quickly define computing services and verify external job requests according to agreements. The `dt_cli` is a client tool for datatoken services and remote execution.
## Play With It
### user story
Consider the joint risk management scenario, a third-party fintech company C provides model solutions for two banks A and B. Sensitive customer data are stored in their private databases. Only when data privacy is guaranteed and external operations are auditable, band A and B are allowed to receive and authorize the third party's model to perform on-premise computation. By using the DataToken SDK, data owners can trade the computation rights of their private data, and thus data becomes assets in the marketplace.
### run tests
We provide [dt-examples](https://github.com/ownership-labs/dt-examples) for testing. Required config files, datasets, asset metadatas and federated models are all included. For each on-premise computation, a seperate folder will be created for storing running resources and logs. Each job will have a corresponding folder like `tests/job_id/`, in which datasets, models and parameters will be fetched to the disk automatically. This simply simulates a private computation sandbox.