Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hydrospheredata/hydro-visualization
Service for visualisation of high dimensional for hydrosphere
https://github.com/hydrospheredata/hydro-visualization
hydrosphere python
Last synced: 28 days ago
JSON representation
Service for visualisation of high dimensional for hydrosphere
- Host: GitHub
- URL: https://github.com/hydrospheredata/hydro-visualization
- Owner: Hydrospheredata
- Created: 2019-12-12T12:29:03.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-12T21:02:25.000Z (about 2 years ago)
- Last Synced: 2024-04-14T07:52:52.431Z (9 months ago)
- Topics: hydrosphere, python
- Language: Python
- Size: 72.4 MB
- Stars: 2
- Watchers: 5
- Forks: 2
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# hydro-visualization
Service for visualization of high dimensional for hydrosphere## DEPENDENCIES
```python
DEBUG_ENV = bool(os.getenv("DEBUG_ENV", False))
APP_PORT = int(os.getenv("APP_PORT", 5000))
GRPC_PORT = os.getenv("GRPC_PORT", 5001)
GRPC_UI_ADDRESS = os.getenv("GRPC_UI_ADDRESS", "localhost:9090")
HS_CLUSTER_ADDRESS = os.getenv("HTTP_UI_ADDRESS", "http://localhost")
SECURE = os.getenv("SECURE", False)
MONGO_URL = os.getenv("MONGO_URL", "mongodb")
MONGO_PORT = int(os.getenv("MONGO_PORT", 27017))
MONGO_AUTH_DB = os.getenv("MONGO_AUTH_DB", "admin")
MONGO_USER = os.getenv("MONGO_USER")
MONGO_PASS = os.getenv("MONGO_PASS")
AWS_STORAGE_ENDPOINT = os.getenv('AWS_STORAGE_ENDPOINT', '')
AWS_REGION = os.getenv('AWS_REGION', '')
HYDRO_VIS_BUCKET_NAME = os.getenv('AWS_BUCKET', 'hydro-vis')
```## Assumptions:
- Model must have in it's contract **'embedding'** output
- If model returns class prediction and confidence these fields should be named **'class'** and **'confidence'** respectively
- Only data (embeddings) from requests will be visualized. Training data is used only for accurate transformation.## API
### Task states:
![Task states](docs/status.png)
Whole API description is available [here](openapi.yaml)
1.**POST** /visualization/plottable_embeddings/umap?model_version_id=2
transformer - manifold learning transformer from ["umap", "trimap", "tsne"]. For now only ["umap"].
response json:
```json
{"task_id": "22e86484-7d90-49fd-a3e1-329b978ee18c"}
```2. **POST** /visualization/jobs/?model_version_id=2
response json:
```json
{"task_id": "22e86484-7d90-49fd-a3e1-329b978ee18c"}
```
3. **GET** /visualization/jobs?task_id=22e86484-7d90-49fd-a3e1-329b978ee18cReturns state of a task and result if ready
response_json(SUCCESS):
```json
{
"result": {"data_shape": [2, 2],
"data": [[0.1, 0.2], [0.3, 0.4]],
"request_ids": [200,2001],
"class_labels": {
"confidence": {"data": [0.1, 0.2, 0.3],
"coloring_type": "gradient"},
"class": {"data": [1, 2, 1, 3, 1],
"coloring_type": "class",
"classes": [1, 2, 3]}
},
"metrics": {
"anomality": {
"scores": [0.1, 0.2, 0.5, 0.2],
"threshold": 0.5,
"operation": "Eq",
"coloring_type": "gradient"
}
},
"top_100": [[2, 3, 4], []],
"visualization_metrics": {
"global_score": 0.9,
"sammon_error": 0.1,
"msid_score": 200
}
},
"state": "SUCCESS",
"task_id": "22e86484-7d90-49fd-a3e1-329b978ee18c",
"description": ""}
```response_json (PENDING):
```json
{
"state": "PENDING",
"task_id": "22e86484-7d90-49fd-a3e1-329b978ee18c"
}
```2. **POST** /visualization/params/?model_version_id=2
**request format**:
```json
{
"parameters": {
"metric": "euclidean",
"min_dist": 0.1,
"n_components": 2,
"n_neighbours": 15
},
"production_data_sample_size": 500,
"training_data_sample_size": 5000,
"visualization_metrics": [
"global_score"
]
}```
- parameters: dict of transfomer parameters. Different set of parameters for different transformer used.
.
**response**:
200 - Success3. **GET** /visualization/params/?model_version_id=2
**response format**:
```json
{
"parameters": {
"metric": "euclidean",
"min_dist": 0.1,
"n_components": 2,
"n_neighbours": 15
},
"production_data_sample_size": 500,
"training_data_sample_size": 5000,
"visualization_metrics": [
"global_score"
]
}```
4. **GET** /visualization/supported?model_version_id=2
**response**:
```json
{"supported": true, "message":""}, 200
{"supported": false, "message":"Some message"}, 200
```## **Projector params**:
```json
{
"properties": {
"metric": {
"type": "string",
"default": "euclidean",
"enum": [ "euclidean", "manhattan", "chebyshev", "minkowski", "canberra", "braycurtis", "haversine",
"mahalanobis", "wminkowski", "seuclidean", "cosine", "correlation", "hamming", "jaccard",
"dice", "russellrao", "kulsinski", "rogerstanimoto", "sokalmichener", "sokalsneath", "yule"]
},
"min_dist": {
"type": "number",
"default": 0.1,
"examples": [
0.1
],
"maximum": 0.99,
"minimum": 0.0
},
"n_components": {
"type": "integer",
"default": 2,
"examples": [
2
],
"maximum": 3,
"minimum": 2
},
"n_neighbours": {
"type": "integer",
"default": 15,
"examples": [
15
],
"maximum": 200,
"minimum": 2
},
"production_data_sample_size": {
"type": "integer",
"default": 500,
"examples": [
500
],
"maximum": 5000,
"minimum": 20
},
"training_data_sample_size": {
"type": "integer",
"default": 5000,
"examples": [
5000
],
"maximum": 10000,
"minimum": 20
},
"visualization_metrics": {
"type": "array",
"default": ["global_score"],
"enum": ["global_score", "sammon_error", "auc_score", "stability_score", "msid", "clustering"]
}
```## Demo
1. set environment variables: AWS_ACCESS_KEY, AWS_SECRET_KEY
2. upload demo/adult/model and demo/adult/monitoring_model
2. send requestPOST /visualization/plottable_embeddings/umap?model_version_id=2
### Database schema
documents: key - model_name: model_version
collection: umap, trimap, tsne- add created field
```json
{
"model_version_id": "2",
"result_file": "s3://hydro-vis/adult_scalar/2/result.json",
"transformer_file": "s3://hydro-vis/adult_scalar/2/umap_transformer",
"parameters": {
"metric": "euclidean",
"min_dist": 0.1,
"n_components": 2,
"n_neighbours": 15
},
"production_data_sample_size": 500,
"training_data_sample_size": 5000,
"visualization_metrics": ["global_score"]
}
```transformed_embeddings - files that store transformed embeddings with labels and other monitoring numbers
transformer structure
transformed embeddings file format:
parquetlabel, confidence, transformed_embedding(vec), score1, score1_thresh, score2, score2_thresh, …