https://github.com/pr38/cox_ph_estimation_notebooks

Personal discovery work on estimating Cox Proportional hazards coefficients for for both breslow and efron ties, using both autograd and directly calculating the gradient and hessian
https://github.com/pr38/cox_ph_estimation_notebooks

cox-regression dask data-science machine-learning numpy pytensor statistics survival-analysis

Last synced: 2 months ago
JSON representation

Personal discovery work on estimating Cox Proportional hazards coefficients for for both breslow and efron ties, using both autograd and directly calculating the gradient and hessian

Host: GitHub
URL: https://github.com/pr38/cox_ph_estimation_notebooks
Owner: pr38
Created: 2025-07-15T22:03:43.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-07-15T22:35:46.000Z (3 months ago)
Last Synced: 2025-07-16T21:04:53.258Z (3 months ago)
Topics: cox-regression, dask, data-science, machine-learning, numpy, pytensor, statistics, survival-analysis
Language: Jupyter Notebook
Homepage:
Size: 14.6 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# cox_ph_estimation_notebooks
Personal discovery work on estimating Cox Proportional Hazards coefficients for for both breslow and efron ties, using both autograd and directly calculating the gradient and hessian

This Repo contains some of my personal work regarding solving the Cox Proportional Hazard's 'negative log partial likelihood' loss functions, for breslow and efron ties. I used pytensor's autograd engine to solve for the gradient/jacobian and hessian matrices; pytensor seems to be the only library whose autograd engine covers all the vector operations I needed. I also used Ralph Newton, as recommended in the literature, with half stepping(also done in R's survival package and elsewhere) to solve for the coordinates. Due to my code being completely vectorized(and mostly outside of the python runtime), I was able to get up to a 30X performance boost for training time over the primary python survival analysis libraries(lifelines & scikit survival).

I have also translated the known solutions for the jacobian and hessian matrices for the breslow log partial likelihood into pure numpy(without autograd). The pure numpy solution is slightly faster than the pytensor autograd, given extra effort I can further optimize it (and perhaps also add efron).

Finally, I have included my attempt to translate the pure numpy solution for the jacobian and hessian for breslow ties to dask array. Due to an issue I experienced with dask indexing/slicing/take I am unable to get correct results. Assuming I finish the dask implementation, the number of shuffles required to solve the jacobian and hessian would be unwieldy, in addition to the headaches of dealing with a distributed system.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pr38/cox_ph_estimation_notebooks

Awesome Lists containing this project

README