{"id":17063942,"url":"https://github.com/rk2900/drsa","last_synced_at":"2025-07-23T04:03:40.721Z","repository":{"id":70335369,"uuid":"120989515","full_name":"rk2900/DRSA","owner":"rk2900","description":"Deep Recurrent Survival Analysis, an auto-regressive deep model for time-to-event data analysis with censorship handling. An implementation of our AAAI 2019 paper and a benchmark for several (Python) implemented survival analysis methods.","archived":false,"fork":false,"pushed_at":"2021-01-28T07:30:50.000Z","size":9861,"stargazers_count":141,"open_issues_count":5,"forks_count":57,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-12T18:14:10.576Z","etag":null,"topics":["data-science","deep-learning","machine-learning","survival-analysis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rk2900.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-02-10T06:02:42.000Z","updated_at":"2025-04-10T17:49:58.000Z","dependencies_parsed_at":"2023-04-25T23:18:25.685Z","dependency_job_id":null,"html_url":"https://github.com/rk2900/DRSA","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rk2900/DRSA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rk2900%2FDRSA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rk2900%2FDRSA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rk2900%2FDRSA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rk2900%2FDRSA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rk2900","download_url":"https://codeload.github.com/rk2900/DRSA/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rk2900%2FDRSA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266614306,"owners_count":23956341,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","deep-learning","machine-learning","survival-analysis"],"created_at":"2024-10-14T10:53:23.091Z","updated_at":"2025-07-23T04:03:40.650Z","avatar_url":"https://github.com/rk2900.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deep Recurrent Survival Analysis (DRSA)\nA `tensorflow` implementation of DRSA model. This is the experiment code for our AAAI 2019 paper \"[Deep Recurrent Survival Analysis](https://arxiv.org/abs/1809.02403)\".\n\nIf you have any problems, please feel free to contact the authors [Kan Ren](http://saying.ren), [Jiarui Qin](mailto:qinjr@icloud.com) and [Lei Zheng](mailto:zhenglei2016@sjtu.edu.cn).\n\n### Abstract\n\u003e Survival analysis is a hotspot in statistical research for modeling time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for survival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodologies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distribution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent Survival Analysis model which combines deep learning for conditional probability prediction at fine-grained level of the data, and survival analysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the censored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophisticated data distributions. In the experiments on the three real-world tasks from different fields, our model significantly outperforms the state-of-the-art solutions under various metrics.\n\n### Model Description\nOur model is `DRSA` model. The baseline models are `Kaplan-Meier`, `Lasso-Cox`, `Gamma`, `MTLSA`, `STM`, `DeepSurv`, `DeepHit`, `DRN`, and `DRSA`.\nAmong the baseline implementations, we forked the code of [STM](https://github.com/zeromike/bid-lands) and [MTLSA](https://github.com/MLSurvival/MTLSA).\nWe made some minor modifications on the two projects to fit in our experiments. To get the modified code, you may click MTLSA @ ba353f8 and STM @ df57e70. Many thanks to the authors of `STM` and `MTLSA`.\nOther baselines' implementations are in `python` directory.\n\n### Data Preparation\nWe have uploaded a tiny data sample for training and evaluation.\n\nThe **full dataset** for this project can be directly downloaded from this link: https://goo.gl/nUFND4.\n(I've uploaded the full dataset with three split compressed ZIP files with Git LFS in this repo.)\nThis dataset contains three large-scale datasets in three real-world tasks, which is the first dataset with such scale for experiment reproduction in survival analysis.\n\nAfter download please replace the sample data in `data/` folder with the full data files.\n\n| Dataset  | MD5 Code  | Size |\n| ------------ | ------------ | --- |\n| drsa.**zip** | b63c53559f58e6afa62c121b0dd1997d  | 2.6 GB |\n\n#### Data specification\nWe have three datasets and each of them contains `.yzbx.txt`, `featureindex.txt` and `.log.txt`.\nWe created the first data file `.log.txt` from the raw data of the original data source (please refer to our paper).\nThen we made feature engineering according to the created feature dictionary `featindex.txt`.\nThe corresponding feature engineered data are in `.yzbx.txt`.\n\nIf you need to reproduce the experiemtns, you may run over `.yzbx.txt`.\nIf you want to dive deep and explain the observations of experiments, you would need to look into the the other files like `.log.txt` and `featindex.txt`. \n\nIn `yzbx.txt` file, each line is a sample containing the \"`yztx`\" data (here we use `t` and `b` exchangably), the information is splitted by `SPACE`.\nHere `z` is the true event time, `t` is the observation time and `x` is the list of features (multi-hot encoded as `feat_id:1`).\nIn the experiment, we only use `ztx` data.\nNote that, for the uncensored data, `z \u003c= t`, while for the censored data, `z \u003e t`.\n\nWe conduct a simulation of observation experiments which ranges from the whole timeline of each dataset. Then the end of each observation (in right-censored situation) is tracked as `t` in the final data `yztx` along with the true event time `z`.\nThe true event time `z` is originally logged in the raw data file.\nThe raw data file (without any feature engineering) is from the other related works as described in the exp. part of our paper. We put the download links as below:\n* clinic: http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets  (supposed to be support2csv.zip, but the raw CLINIC dataset is somehow different, so we have uploaded the raw dataset in this repository.)\n* music: https://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html  http://ocelma.net/MusicRecommendationDataset/lastfm-1K.html\n* bidding: https://github.com/rk2900/make-ipinyou-data\n\n### Installation and Reproduction\n[TensorFlow](https://www.tensorflow.org/)(\u003e=1.3) and the other dependant packages (e.g., `numpy`, `sklearn` and `matplotlib`) should be pre-installed before running the code. The Python version we used is 2.7.6.\n\nAfter package installation, you can simply run the code in `python` directory with the demo tiny dataset(sampled from BIDDING dataset). The outputs of the code are in `python/output` directory.\n\nThe running command are listed as below.\n```\npython km.py             # for Kaplan-Meier\npython gamma_model.py    # for Gamma\npython cox.py            # for Lasso-Cox and DeepSurv\npython deephit.py        # for DeepHit\npython DRSA.py 0.0001     # for DRSA\n```\nWe have set default hyperparameters in the model implementation. So the parameter arguments are optional for running the code.\n\nThe results will be printed on the screen with the format:\nSubset, Train/Test,  Step,  Cross Entropy, AUC(C-index), ANLP, Total Loss, batch size, hidden state size, learing rate, anlp learning rate, alpha, beta.\n\n### Citation\nYou are more than welcome to cite our paper:\n```\n@inproceedings{ren2019deep,\n  title={Deep recurrent survival analysis},\n  author={Ren, Kan and Qin, Jiarui and Zheng, Lei and Yang, Zhengyu and Zhang, Weinan and Qiu, Lin and Yu, Yong},\n  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n  volume={33},\n  number={01},\n  pages={4798--4805},\n  year={2019}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frk2900%2Fdrsa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frk2900%2Fdrsa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frk2900%2Fdrsa/lists"}