Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bcebere/elastic-surv
Survival analysis for Big Data
https://github.com/bcebere/elastic-surv
automl bigdata coxph deephit elasticsearch hyperband survival-analysis
Last synced: 26 days ago
JSON representation
Survival analysis for Big Data
- Host: GitHub
- URL: https://github.com/bcebere/elastic-surv
- Owner: bcebere
- License: bsd-3-clause
- Created: 2021-12-13T10:36:27.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-03-20T07:18:39.000Z (over 2 years ago)
- Last Synced: 2023-03-09T01:32:01.919Z (over 1 year ago)
- Topics: automl, bigdata, coxph, deephit, elasticsearch, hyperband, survival-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 77.1 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
elastic-surv
Survival analysis on Big Data[![elastic-surv Tests](https://github.com/bcebere/elastic-surv/actions/workflows/test.yml/badge.svg)](https://github.com/bcebere/elastic-surv/actions/workflows/test.yml)
[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://github.com/bcebere/elastic-surv/blob/main/LICENSE)
elastic-surv is a library for training risk estimation models on ElasticSearch backends. Potential use cases include user churn prediction or survival probability.
- :key: Survival models include CoxPH, DeepHit or LogisticHazard([pycox](https://github.com/havakv/pycox)).
- :fire: ElasticSearch support using [eland](https://github.com/elastic/eland).
- :cyclone: Automatic model selection using HyperBand.
## Problem formulation
Risk estimation tasks require:
- A set of covariates/features(`X`).
- An outcome/event column(`Y`) - 0 means right censoring, 1 means that the event occured.
- Time to event column(`T`) - the duration until the event or the censoring occured.The risk estimation task output is a survival function: for N time horizons, it outputs the probability of "survival"(event not occurring) at each horizon.
## InstallationFor configuring the ELK stack, please follow the instructions [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html).
The library can be installed using
```bash
$ pip install .
```## Sample Usage
For each ElasticSearch data backend, we need to mention:
- the es_index_pattern and the es_client for the ES connection.
- which keys in the ES index stand for the time-to-event and outcome data.
- optional: which features to include from the index.```python
from elastic_surv.dataset import ESDataset
from elastic_surv.models import CoxPHModeldataset = ESDataset(
es_index_pattern = 'churn-prediction',
time_column = 'months_active',
event_column = 'churned',
es_client = "localhost",
)model = CoxPHModel(in_features = dataset.features())
model.train(dataset)
model.score(dataset)
```
For this example, we use a local ES index, `churn-prediction`. This can be generated using the following snippet```python
from pysurvival.datasets import Dataset
import eland as edraw_dataset = Dataset('churn').load()
ed.pandas_to_eland(raw_dataset,
es_client='localhost',
es_dest_index='churn-prediction',
es_if_exists='replace',
es_dropna=True,
es_refresh=True,
)
```## Tutorials
- [Tutorial 1: Data backends](tutorials/tutorial_1_data_backends.ipynb)
- [Tutorial 2: Training a survival model over ElasticSearch](tutorials/tutorial_2_model_training.ipynb)
- [Tutorial 3: AutoML for survival analysis over ElasticSearch](tutorials/tutorial_3_automl.ipynb)
## TestsInstall the testing dependencies using
```bash
pip install .[testing]
```
The tests can be executed using
```bash
pytest -vsx
```