Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ragulpr/wtte-rnn

WTTE-RNN a framework for churn and time to event prediction
https://github.com/ragulpr/wtte-rnn

churn-prediction failure-rate keras machine-learning-algorithms neural-network rnn tensorflow weibull

Last synced: 1 day ago
JSON representation

WTTE-RNN a framework for churn and time to event prediction

Awesome Lists containing this project

README

        

# WTTE-RNN

[![Build Status](https://travis-ci.org/ragulpr/wtte-rnn.svg?branch=master)](https://travis-ci.org/ragulpr/wtte-rnn)

๋ฒ ์ด๋ถˆ ์‹œ๊ฐ„-์ด๋ฒคํŠธ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง (Weibull Time To Event Recurrent Neural Network)

์ด๋ฒคํŠธ ๋ฐœ์ƒ ๋ฐ ์‹œ๊ฐ„ ์˜ˆ์ธก์— ๋Œ€ํ•œ ๋œ ์–ด๋ ค์šด ๊ธฐ๊ณ„ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค.

์„œ๋ฒ„ ๋ชจ๋‹ˆํ„ฐ๋ง๋ถ€ํ„ฐ ์ง€์ง„ ๋ฐœ์ƒ ๋ฐ ์ƒ์‚ฐ๋Ÿ‰ ์˜ˆ์ธก ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ๋ฌธ์ œ๋Š” ํฌ๊ฒŒ ๋ณด๋ฉด ์ด๋ฒคํŠธ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์‹œ๊ฐ„์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.
WTTE-RNN์€ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๊ฐ€ ์–ด๋–ป๊ฒŒ ๋‹ค๋ฃจ์–ด์ ธ์•ผ ํ•˜๋Š”๊ฐ€์— ๋Œ€ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

* [๋ธ”๋กœ๊ทธ ๊ธ€(์˜์–ด)](https://ragulpr.github.io/2016/12/22/WTTE-RNN-Hackless-churn-modeling/)
* [์„์‚ฌ ๋…ผ๋ฌธ](https://ragulpr.github.io/assets/draft_master_thesis_martinsson_egil_wtte_rnn_2016.pdf)
* [๋ชจ๋ธ](https://imgur.com/a/HX4KQ) ์— ๋Œ€ํ•œ ๋น ๋ฅธ ์‹œ๊ฐ์  ์†Œ๊ฐœ
* Jupyter notebook: [๊ฐ„๋‹จ](examples/keras/simple_example.ipynb), [๋ชจ๋“  ๊ณผ์ • ํฌํ•จ](examples/data_pipeline/data_pipeline.ipynb )

# ์„ค์น˜

## Python

[ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€๋ฅผ ์œ„ํ•œ README](python/README.md)๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

ํ•„์š”์—†๋Š” ๋‚ด์šฉ์ด ๋„ˆ๋ฌด ๋งŽ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์‹ ๋‹ค๋ฉด, ๊ธฐ๋ณธ์ ์ธ ๊ตฌํ˜„์€ [jupyter notebook](examples/keras/standalone_simple_example.ipynb) ์˜ ๊ฐ ์ค„์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# ์•„์ด๋””์–ด ๋ฐ ๊ธฐ๋ณธ ๊ฐœ๋…

๋งŽ์€ ์ด๋ฒคํŠธ๋กœ ๊ตฌ์„ฑ๋œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ–๊ณ  ์žˆ๊ณ , ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋‹ค์Œ ์ด๋ฒคํŠธ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ์‹œ๊ฐ„ (TTE, time to the next event)์„ ์˜ˆ์ธกํ•˜๊ณ  ์‹ถ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ด…์‹œ๋‹ค. ์•„์ง ๋งˆ์ง€๋ง‰ ์ด๋ฒคํŠธ๋ฅผ ๊ด€์ธกํ•˜์ง€ ์•Š์•˜๋‹ค๋ฉด ์šฐ๋ฆฌ๋Š” ํ›ˆ๋ จ ํ•  ์ˆ˜ ์žˆ๋Š” TTE์˜ ์ตœ์†Œ ๊ฒฝ๊ณ„์„ ๊นŒ์ง€๋งŒ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๊ฒƒ์„ ์ค‘๋„์ ˆ๋‹จ๋œ ๋ฐ์ดํ„ฐ (*censored data*)๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. (๋นจ๊ฐ„์ƒ‰์œผ๋กœ ํ‘œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค):

![์ค‘๋„์ ˆ๋‹จ๋œ ๋ฐ์ดํ„ฐ](./readme_figs/data.gif)

์ผ์ข…์˜ ํŠธ๋ฆญ์œผ๋กœ, TTE ์ž์ฒด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋Œ€์‹  ๊ธฐ๊ณ„ ํ•™์Šต ๋ชจ๋ธ์ด *ํ™•๋ฅ  ๋ถ„ํฌ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜* ๋ฅผ ์ถœ๋ ฅํ•˜๋„๋ก ํ•ด ๋ด…์‹œ๋‹ค. ์–ด๋–ค ํ™•๋ฅ  ๋ถ„ํฌ๋„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค๋งŒ, ๊ทธ [๊ต‰์žฅํ•จ](https://ragulpr.github.io/2016/12/22/WTTE-RNN-Hackless-churn-modeling/#embrace-the-Weibull-euphoria) ์„ ์ข‹์•„ํ•ด์„œ *Weibull ๋ถ„ํฌ* ๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๊ณ„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋˜ํ•œ ๊ทธ๋ž˜๋””์–ธํŠธ ๊ธฐ๋ฐ˜์˜ ์ž„์˜์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค๋งŒ, ์šฐ๋ฆฌ๊ฐ€ ๊ทธ [๊ต‰์žฅํ•จ](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)์„ ์ข‹์•„ํ•ด์„œ RNN์„ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค.

![WTTE-RNN ๊ตฌ์กฐ ์˜ˆ์ œ](./readme_figs/fig_rnn_weibull.png)

๋‹ค์Œ ๋‹จ๊ณ„๋Š” ์ค‘๋„์ ˆ๋‹จ๋œ ๋ฐ์ดํ„ฐ์—๋„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํŠน์ˆ˜ํ•œ log-loss ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•ด ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ ๋ฐฐ๊ฒฝ์ด ๋˜๋Š” ์ง๊ด€์€, *๋‹ค์Œ* ์— ๋ฐœ์ƒํ•  ์ด๋ฒคํŠธ ์‹œ์ ์— ๋†’์€ ํ™•๋ฅ ์„ ํ• ๋‹นํ•˜๊ฑฐ๋‚˜, (์ค‘๋„์ ˆ๋‹จ๋œ ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ) ์ด๋ฒคํŠธ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ *์•Š์„* ์‹œ์ ์— ๋‚ฎ์€ ํ™•๋ฅ ์„ ํ• ๋‹นํ•˜๊ณ  ์‹ถ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

![์‹œ๊ฐ„์ถ•์— ๋”ฐ๋ฅธ WTTE-RNN ์˜ˆ์ธก](./readme_figs/solution_beta_2.gif)

๊ทธ ๊ฒฐ๊ณผ๋กœ ์šฐ๋ฆฌ๋Š” ๊ฐ ๋‹จ๊ณ„์—์„œ *TTE์˜ ๋ถ„ํฌ* ์— ๋Œ€ํ•œ ๊ฝค ๊น”๋”ํ•œ ์˜ˆ์ธก์„ ์–ป์Šต๋‹ˆ๋‹ค (์•„๋ž˜๋Š” ๋‹จ์ผ ์ด๋ฒคํŠธ์— ๋Œ€ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค):

![WTTE-RNN prediction](./readme_figs/it_61786_pmf_151.png)

๋‹ค๋ฅธ ์žฌ๋ฏธ์žˆ๋Š” ๊ฒฐ๊ณผ๋Š”, ์˜ˆ์ธก๋œ ๋งค๊ฐœ ๋ณ€์ˆ˜๋“ค์ด ์ด๋ฒคํŠธ *์–ผ๋งˆ๋‚˜ ๋นจ๋ฆฌ* (์•ŒํŒŒ, alpha) ์ผ์–ด๋‚˜๊ณ , *์–ผ๋งˆ๋‚˜ ํ™•์‹คํ•˜๊ฒŒ* (๋ฒ ํƒ€, beta) ์ผ์–ด๋‚  ์ง€์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ์‹œ๊ฐํ™”ํ•˜๊ณ  ๊ทธ๋ฃนํ™”ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” 2์ฐจ์› ์ž„๋ฒ ๋”ฉ์ด๋ผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ์ธกํ•œ ์•ŒํŒŒ (์™ผ์ชฝ)์™€ ๋ฒ ํƒ€ (์˜ค๋ฅธ์ชฝ)์˜ ์‹œ๊ฐ„์ถ•์„ ์Œ“์œผ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

![WTTE-RNN alphabeta.png](./readme_figs/alphabeta.png)

## ์ฃผ์˜ํ•  ์ 

ํŠน์ • ์ƒํ™ฉ์—์„œ ์•„๋ž˜์˜ ๋ฉ‹์ง„ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด๋„ ๋˜๋Š” ์—ฌ๋Ÿฌ ์ˆ˜ํ•™ ์ด๋ก ๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค.

![loss-equation](./readme_figs/equation.png)

๋”ฐ๋ผ์„œ ์ค‘๋„ ์ ˆ๋‹จ ๋œ ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ ์ ˆ๋‹จ๋œ ์‹œ์ ์„ ๋„˜์€ ์‹œ์ ์— ๋Œ€ํ•ด์„œ๋Š” ๋ถ„ํฌ๋ฅผ ์ƒํ–ฅํ•˜๋Š” ์ชฝ์œผ๋กœ๋งŒ ๋ณด์ƒํ•ฉ๋‹ˆ๋‹ค. ์ œ๋Œ€๋กœ ๋œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์œผ๋ ค๋ฉด ํŠน์ง•(feature) ๋ฐ์ดํ„ฐ์— ์ ˆ๋‹จ ๋ฉ”์ปค๋‹ˆ์ฆ˜์— ์˜ํ•œ ๊ฒฐ๊ณผ๊ฐ€ ๋ฐ˜์˜๋˜์ง€ ์•Š๊ณ  ์™„์ „ํžˆ ๋…๋ฆฝ์ ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํŠน์ง•(feature)์ด ์ ˆ๋‹จ ์‹œ์ ์„ ํฌํ•จํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ๋‹ค๋ฉด, ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ TTE ๋Œ€์‹  ์ ˆ๋‹จ ํ™•๋ฅ ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์˜ˆ์ธกํ•˜๋Š” ์‹์œผ๋กœ ํ›ˆ๋ จ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ ์˜ค๋ฒ„ํ”ผํŒ…/์•„ํ‹ฐํŒฉํŠธ ํ•™์Šต์ด ์ผ์–ด๋‚ฉ๋‹ˆ๋‹ค. ์ „์—ญ ํŠน์ง•(global feature)์€ ์ž˜ ๋‹ค๋ฃจ์ง€ ์•Š์œผ๋ฉด ์ด๋Ÿฐ ํšจ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# ์ƒํƒœ ๋ฐ ๋กœ๋“œ๋งต

์ด ํ”„๋กœ์ ํŠธ๋Š” ๊ฐœ๋ฐœ์ค‘์ธ ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค. ์‰ฝ๊ฒŒ ํฌํฌํ•˜๊ณ  ์„ค์น˜ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. WTTE-RNN์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๊ณ , churn_watch๋Š” ์ƒ์‚ฐ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋ณด๊ณ ๊ฐ€ ์–ด๋–ป๊ฒŒ ์•„๋ฆ„๋‹ต๊ณ  ์‰ฝ๊ฒŒ ์ด๋ฃจ์–ด์งˆ ์ˆ˜ ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ๋…์ฐฝ์ ์ธ ์•„์ด๋””์–ด๋กœ ๊ตฌํ˜„์ค‘์ธ ๋ฐฐํฌํŒ์ž…๋‹ˆ๋‹ค. pull-request, ์ถ”์ฒœ, ์˜๊ฒฌ ๋ฐ ๊ธฐ๊ณ ๋ฅผ ๋งค์šฐ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค.

# ์ €์žฅ์†Œ ๋‚ด์šฉ

* ๋ณ€ํ™˜๊ธฐ
* ๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ ๋ณ€ํ™˜๋“ค (pandas `DataFrame` of expected format to numpy)
* ์‹œ๊ฐ„-์ด๋ฒคํŠธ ๋ฐ ์ ˆ๋‹จ์  ๊ณ„์‚ฐ๋“ค
* Weibull ํ•จ์ˆ˜๋“ค(cdf,pdf,quantile,mean ๋“ฑ)
* ํ”„๋ ˆ์ž„์›Œํฌ ์šฉ์œผ๋กœ ์ž‘์„ฑ๋œ ๋ชฉ์ ํ•จ์ˆ˜๋“ค
* tensorflow
* keras
* ์‹ ๊ฒฝ์ธต
* Keras๋กœ ๊ตฌํ˜„ํ•œ Weibull ์ถœ๋ ฅ์ธต

## ๋ฉ€ํ‹ฐ ํ”„๋ ˆ์ž„์›Œํฌ ์ง€์›

ํ•ต์‹ฌ ๊ธฐ์ˆ ์€ ๋ชฉ์ ํ•จ์ˆ˜๋“ค์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฐ ์ด ๋ชฉ์ ํ•จ์ˆ˜๋“ค์„ ๋‹ค์–‘ํ•œ ๊ธฐ๊ณ„ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์šฉ์œผ๋กœ ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

* TensorFlow โœ”
* Keras (TensorFlow wrapper) โœ”
* MXnet
* Theano
* Torch
* h2o
* scikitFlow
* MLlib

## ๋ชจ๋ธ ์ž…๋ ฅ ๋ฐ ์ถœ๋ ฅ

๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์›์‹œ ๋ฐ์ดํ„ฐ๋ฅผ tte์šฉ์œผ๋กœ ๋ณ€ํ™˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ํ›ˆ๋ จ์„ ์œ„ํ•ด์„œ๋Š” ์ตœ์ข… ์ถœ๋ ฅ์„ ์œ„ํ•ด Weibull ํ•จ์ˆ˜๋“ค์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

* SQL, R, Python์œผ๋กœ ์ž‘์„ฑ๋œ ๋„์›€ ํ•จ์ˆ˜๋“ค.

## ๋ชจ๋‹ˆํ„ฐ๋ง

WTTE-RNN์€ ๊ธฐ๊ณ„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ ๋™์‹œ์— ๋ฐ์ดํ„ฐ์˜ ํ˜•ํƒœ, ์†์„ฑ ๋ฐ ์˜ˆ์ธก์— ๋Œ€ํ•ด ๋…ผ์˜ํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ๊ฐ ์–ธ์–ด์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

* ํ”Œ๋กฏ (์ผ๋ถ€๊ฐ€ ๋งˆ๋ฌด๋ฆฌ๋จ)
* ๋ฉ‹์ง“ ์›น์•ฑ ๋˜๋Š” ๊ทธ ๋น„์Šทํ•œ ๊ฒƒ (๋‹ค๋ฅธ ๊ณณ์—์„œ ์ผ๋ถ€ ๋งˆ๋ฌด๋ฆฌ)
* ์•Œ๋ฆผ๊ธฐ๋Šฅ (e.g.,Slack/e-mail ๋ด‡ & ์š”์•ฝ)
* API

# ๋ผ์ด์„ ์Šค

* MIT license

## Citation

```
@MastersThesis{martinsson:Thesis:2016,
author = {Egil Martinsson},
title = {{WTTE-RNN : Weibull Time To Event Recurrent Neural Network}},
school = {Chalmers University Of Technology},
year = {2016},
}
```

## ๊ธฐ์—ฌ

์งˆ๋ฌธ์€ egil.martinsson[at]gmail.com ์œผ๋กœ ๋ณด๋‚ด์ฃผ์„ธ์š”.
๊ฐ€๋Šฅํ•˜๋ฉด ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค๋„ ๋„์šธ ์ˆ˜ ์žˆ๋„๋ก ์ด์Šˆ๋ฅผ ์—ด์–ด์ฃผ์„ธ์š”.
๊ธฐ์—ฌ/PR/๋Œ“๊ธ€ ๋ฐ ๊ธฐํƒ€ ๋“ฑ๋“ฑ ๋ชจ๋‘ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!