An open API service indexing awesome lists of open source software.

https://github.com/DeDeDeDer/Personal_Projects

This holds all my personal data-related project's (Automation, Modelling, Analysis)
https://github.com/DeDeDeDer/Personal_Projects

actuarial-science actuarial-statistics claims-reserving datascience datascraping excelvba exploratory-data-analysis feature-engineering insurance-claims modelling-framework predictive-modeling python3

Last synced: 4 months ago
JSON representation

This holds all my personal data-related project's (Automation, Modelling, Analysis)

Awesome Lists containing this project

README

        

# Personal_Projects
This GitHub holds all my personal project's that I have worked on as a past time.
Project's are mainly focused on Data Science, Insurance Pricing & Reserving fields.

A mapping of these are laid out below.

# Mapping
| Scraped Data Analysis | Articles | Simulators & Kernels |
| ------ | ------ | ------ |
| Python Scripts | Article Profile | Python Scripts |
| Geo-Visual SG Housing Prices
![ScreenShot](images/Geo_Property.png) | Predictive Modelling
![ScreenShot](images/MindM_Modelling.png)| Claims Simulator
![ScreenShot](images/Line_ClaimSim.png)
| Box-Plot SG Housing Prices
![ScreenShot](images/Box_Property.png) | Web-Scraping Workflow
![ScreenShot](images/Flow_Scrape.png)| FOREX & ML Algorithms
![](images/Bubb_MLA_2.gif)
| Private Insurance 14-Years
![](images/Bubb_PteInsur.gif) | FOREX ML Algorithms Workflow
![ScreenShot](images/Flow_MLA.png) | Pending Stock Screener

| Public Insurance 14-Years
![](images/Bubb_PubInsur.gif) |
|

# For more info..
![ScreenShot](/Pictures/MapBackground_4.png)



> # **Insurance (Pricing) & Data Science**


# **What is Predictive Modelling?**



It is simply the framework to integrate past data & statistics to predict
future outcomes or project liabilities. There are 4 main techniques;
Bayesian, Decision Trees, Support Vector Machines & Neural Networks.
My project's utilizes mainly Bayesian & Decision Tree techniques.
Hence, focused primarily on linear regression models.

.


## [At Its Simplest, Predictive Modelling](https://medium.com/@DRicky.Ch29/at-its-simplest-predictive-modelling-b3c0c0b0716d)

![ScreenShot](/Pictures/IntroModel_1.png)

An article publication aimed at explaining concepts to:

1. Generalised structure to Predictive Modelling

2. Alternative interpretations to various statistical model metrics



The article follows the generalized framework of:


    Data preparation

    - Preliminary data analysis, executing 4-Tier's of data cleaning. (Correct, Complete, Create, Convert)

    Exploratory Data Analysis

    - Uni- Bi- & Multi- Analysis

    Model Preparation

    - Data stratified Train/Test splits, Hyper parameter tuning, parameter evaluation metrics.

    - Feature Engineering (Quantity & Quality), Feature evaluation metrics

    Predictive Modelling (Classification Problem)

    - Ensembles (Hard & Soft Voting)



Click To View


# **What is Web Scraping?**



In short, it is simply the automated process of extracting data from the web.
Subsequently, cleaning any irregularities & conducting Exploratory Data Analysis
to spot Trends & Patterns.

.


## Python Web Scraping PDF & Data Cleaning (Part 1)
[Article](https://medium.com/@DRicky.Ch29/web-scraping-pdf-tables-data-cleaning-part-1-cb6d8d47a6de)
or
[Python Code](https://github.com/DeDeDeDer/Personal_Projects/blob/master/Web%20Scraping%20(Data%20Science%20%26%20Insurance%20Pricing)/Web_Scrap_Insurance_Returns.py)

![ScreenShot](/Pictures/WebScrapPart1.png)

A Python Kernel written to automate repetitive clicking of 1,228c URLs &
converting 1,000c PDF Tables into CSV to compile data.


Contents:

    1. Collate online source code URLs & sub-page URLs

    2. Download online data via URLs

    3. Convert & Neaten PDF Table into CSV

    4. Compile all CSV Tables



Click To View

.


## [Python Web Scraping Data Analysis Motor Insurance (Part 2)](https://medium.com/@DRicky.Ch29/python-web-scraping-data-analysis-motor-insurance-part-2-4cd7162ba644)

![ScreenShot](/Pictures/WebScrapPart2.png)

After extracting Annual Insurance Data Returns in the Part 1 series, we proceed to
analyze the data.


Contents:

    Patterns

    1. Benchmark Range of ROC on Expense & Loss Ratios

    Trends

    2. Growing reinsurance ceded abroad beyond the ASEAN region

    3. Declining averages for Earned Premiums & Claims Incurred (with falling inflation rates)

    4. Average ROC, Expense & Loss Ratios



Click To View


# **What is Exploratory Data Analysis?**



It is simply the analyzing of data sets to summarize characteristics & patterns.
These include Uni- Bi- & Multi- Variate Analysis. Often discovering underlying
relationships that conventional models overlook.

.


## [EDA & Feature Engineering Focused](https://www.kaggle.com/derrickchua29/feature-engineering-eda-focused/notebook)

![ScreenShot](/Pictures/EDA_article_1.png)


EDA Summary


1. Those who have had past experience of financial distress (target variable):

>Made lesser loans or exceed deadlines

>Tend to have lesser dependents & debt ratio & net worth

>As expected are of lower-tier income, But lower debt ratio


2. Ignoring mortality and time value of money (i.e.Annuities)

>Debt ratio & Net worth shows gaussian distribution against age


3. Those who had acts of debt delinquency (Made loans or exceed deadlines)

>Tend to be from the higher-tier income or Retired


4. Others

>The higher the income, the higher the debt ratio

>The higher the income, the lower the dependents



Click To View

# **What is General Linear Modelling?**



It is simply applying the fundamental straight line concept of a Y = mx + C.
In other words, the idea that variable relationships are 1-dimensional (positive
or negative).

.

## [Ensemble Models Comparison Techniques](https://www.kaggle.com/derrickchua29/ensemble-models-comparison-techniques)

![ScreenShot](/Pictures/GLM_article_1.png)


A Python Kernel aimed to:

    1. Get a better understanding of the simplified predictive modelling framework

    2. Grasp the logic behind different coding methods & concise techniques used

    3. Comparisons between different models



    Coding Techniques :

    A.List comprehensions

    B.Samples to reduce computational cost

    C.Concise 'def' functions that can be used repetitively

    D.Pivoting using groupby

    E.When & How to convert and reshape dictionary’s into lists or dataframes

    F.Quickly split dataframe columns

    H.Loop Sub-plots

    I.Quick Lambda formulae functions

    J.Quick looping print or DataFrame conversion of summative scores

    K.Order plot components

    L.Create & Plot Bulk Ensemble comparative results



Click To View






> # **Insurance (Reserving)**


# **Claim Simulations**



In short, this projects contains a Python Kernel to automate the probabilistic
claims simulation process for actuarial reserving calculations.


Reserving Method Used: Inflation Adjusted Chain Ladder

.


## Claims Simulation
[Article](https://medium.com/@DRicky.Ch29/inflation-adjusted-chain-ladder-iacl-with-only-python-pandas-module-512914d9a1d)
or
[Python Code Guide](https://www.kaggle.com/derrickchua29/simulating-claim-data-iacl-calculation)
or
[Python Code v2](https://github.com/DeDeDeDer/Personal_Projects/blob/master/Claims%20Simulation%20(Insurance%20Reserving)/Claims_Simulator.py)

![ScreenShot](/Pictures/ClaimsSimu_article_1.png)


Present: Simulation supports Claim Numbers (Poisson, Negative Binomial) & Amounts (Gaussian, LogNormal).

Ongoing:

1. Support Bornhuetter-Ferguson Method (BF).


Contents:

    0. Assumptions

    1. Development-Year lags

    2. Incremental & Cumulative claim amounts

    3. Uplift past inflation for incremental amounts & Derive cumulative

    4. Individual Loss Development Factors (LDFs)

    5. Raw preliminary view of triangle

    6. Establish predicted lag years data frame

    7. Impute latest cumulative amounts

    8. Simple Mean & Volume Weighted LDFs & 5/3 Year Averages & Select

    9. Predict future cumulative amounts

    10. Calculate incremental amounts

    11. Project future inflation for incremental amounts

    12. Reserve summation



Click To View






> # **Microsoft Package**


# **Microsoft Package**



Prior to learning Python coding language, I had to refine the basics.
Since Excel & VBA are broadly deemed essential skill-sets, I thought
I build some personal models. Ideas are inspired whilst at my work
placement tenure at a consultancy company. The main objective was to
ease manual & repetitive tasking's.

.


## Word Documentations
[Spreadsheet](https://www.dropbox.com/s/b4cgvhjui2mj0qq/Bulk%20MailMerge%20v2.0.xlsm?dl=0)
or
[Excel VBA Code](https://www.dropbox.com/s/b4cgvhjui2mj0qq/Bulk%20MailMerge%20v2.0.xlsm?dl=0)

![ScreenShot](/Pictures/WordExcelLogo_1.png)



A reproducible Excel VBA programme that automates bulk simultaneous word
document mail merges. Data entry checks (file exists etc.) & cleaning (excess
spaces, invalid file directory ...) are done by the coding as well. This code
does NOT use the standard mail merge function that operates ONLY on 1-single
document. Instead allows running on mass word documentations.


Inspiration:

Whilst assisting my previous employer to prepare clients for the European
General Data Protection Regulations (GDPR) privacy documentations, I created
this programme to streamline over 30hours of manual work.

.


## Outlook Communications
[Spreadsheet](https://www.dropbox.com/s/o50up79cttwyfa3/Bulk%20Emailing%20v2.0.xlsm?dl=0)
or
[Excel VBA Code](https://www.dropbox.com/s/o50up79cttwyfa3/Bulk%20Emailing%20v2.0.xlsm?dl=0)

![ScreenShot](/Pictures/OutlookExcelLogo_1.png)



A reproducible Excel VBA programme that automates multiple simultaneous email
communications if recipients receive overlapping/same attachments or spreadsheet
tables.


Inspiration:

A responsibility of mine at a previous company involved weekly roll-forward
projection updates. I found this repetitive & build this model to automate the
job. It mitigated manual human input errors & eased the job handing over
process.