https://github.com/DeDeDeDer/Personal_Projects
This holds all my personal data-related project's (Automation, Modelling, Analysis)
https://github.com/DeDeDeDer/Personal_Projects
actuarial-science actuarial-statistics claims-reserving datascience datascraping excelvba exploratory-data-analysis feature-engineering insurance-claims modelling-framework predictive-modeling python3
Last synced: 4 months ago
JSON representation
This holds all my personal data-related project's (Automation, Modelling, Analysis)
- Host: GitHub
- URL: https://github.com/DeDeDeDer/Personal_Projects
- Owner: DeDeDeDer
- Created: 2018-08-26T05:13:33.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-08-24T00:33:52.000Z (over 5 years ago)
- Last Synced: 2024-08-14T07:09:41.513Z (8 months ago)
- Topics: actuarial-science, actuarial-statistics, claims-reserving, datascience, datascraping, excelvba, exploratory-data-analysis, feature-engineering, insurance-claims, modelling-framework, predictive-modeling, python3
- Language: Python
- Homepage:
- Size: 19.6 MB
- Stars: 7
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - DeDeDeDer/Personal_Projects - This holds all my personal data-related project's (Automation, Modelling, Analysis) (Python)
README
# Personal_Projects
This GitHub holds all my personal project's that I have worked on as a past time.
Project's are mainly focused on Data Science, Insurance Pricing & Reserving fields.
A mapping of these are laid out below.# Mapping
| Scraped Data Analysis | Articles | Simulators & Kernels |
| ------ | ------ | ------ |
| Python Scripts | Article Profile | Python Scripts |
| Geo-Visual SG Housing Prices
 | Predictive Modelling
| Claims Simulator

| Box-Plot SG Housing Prices
 | Web-Scraping Workflow
| FOREX & ML Algorithms

| Private Insurance 14-Years
 | FOREX ML Algorithms Workflow
 | Pending Stock Screener
| Public Insurance 14-Years
 |
|# For more info..

> # **Insurance (Pricing) & Data Science**
# **What is Predictive Modelling?**
It is simply the framework to integrate past data & statistics to predict
future outcomes or project liabilities. There are 4 main techniques;
Bayesian, Decision Trees, Support Vector Machines & Neural Networks.
My project's utilizes mainly Bayesian & Decision Tree techniques.
Hence, focused primarily on linear regression models..
## [At Its Simplest, Predictive Modelling](https://medium.com/@DRicky.Ch29/at-its-simplest-predictive-modelling-b3c0c0b0716d)
An article publication aimed at explaining concepts to:
1. Generalised structure to Predictive Modelling
2. Alternative interpretations to various statistical model metrics
The article follows the generalized framework of:
Data preparation
- Preliminary data analysis, executing 4-Tier's of data cleaning. (Correct, Complete, Create, Convert)
Exploratory Data Analysis
- Uni- Bi- & Multi- Analysis
Model Preparation
- Data stratified Train/Test splits, Hyper parameter tuning, parameter evaluation metrics.
- Feature Engineering (Quantity & Quality), Feature evaluation metrics
Predictive Modelling (Classification Problem)
- Ensembles (Hard & Soft Voting)
# **What is Web Scraping?**
In short, it is simply the automated process of extracting data from the web.
Subsequently, cleaning any irregularities & conducting Exploratory Data Analysis
to spot Trends & Patterns..
## Python Web Scraping PDF & Data Cleaning (Part 1)
[Article](https://medium.com/@DRicky.Ch29/web-scraping-pdf-tables-data-cleaning-part-1-cb6d8d47a6de)
or
[Python Code](https://github.com/DeDeDeDer/Personal_Projects/blob/master/Web%20Scraping%20(Data%20Science%20%26%20Insurance%20Pricing)/Web_Scrap_Insurance_Returns.py)
A Python Kernel written to automate repetitive clicking of 1,228c URLs &
converting 1,000c PDF Tables into CSV to compile data.
Contents:1. Collate online source code URLs & sub-page URLs
2. Download online data via URLs
3. Convert & Neaten PDF Table into CSV
4. Compile all CSV Tables
.
## [Python Web Scraping Data Analysis Motor Insurance (Part 2)](https://medium.com/@DRicky.Ch29/python-web-scraping-data-analysis-motor-insurance-part-2-4cd7162ba644)
After extracting Annual Insurance Data Returns in the Part 1 series, we proceed to
analyze the data.
Contents:Patterns
1. Benchmark Range of ROC on Expense & Loss Ratios
Trends
2. Growing reinsurance ceded abroad beyond the ASEAN region
3. Declining averages for Earned Premiums & Claims Incurred (with falling inflation rates)
4. Average ROC, Expense & Loss Ratios
# **What is Exploratory Data Analysis?**
It is simply the analyzing of data sets to summarize characteristics & patterns.
These include Uni- Bi- & Multi- Variate Analysis. Often discovering underlying
relationships that conventional models overlook..
## [EDA & Feature Engineering Focused](https://www.kaggle.com/derrickchua29/feature-engineering-eda-focused/notebook)
EDA Summary
1. Those who have had past experience of financial distress (target variable):
>Made lesser loans or exceed deadlines
>Tend to have lesser dependents & debt ratio & net worth
>As expected are of lower-tier income, But lower debt ratio
2. Ignoring mortality and time value of money (i.e.Annuities)
>Debt ratio & Net worth shows gaussian distribution against age
3. Those who had acts of debt delinquency (Made loans or exceed deadlines)
>Tend to be from the higher-tier income or Retired
4. Others
>The higher the income, the higher the debt ratio
>The higher the income, the lower the dependents# **What is General Linear Modelling?**
It is simply applying the fundamental straight line concept of a Y = mx + C.
In other words, the idea that variable relationships are 1-dimensional (positive
or negative)..
## [Ensemble Models Comparison Techniques](https://www.kaggle.com/derrickchua29/ensemble-models-comparison-techniques)
A Python Kernel aimed to:1. Get a better understanding of the simplified predictive modelling framework
2. Grasp the logic behind different coding methods & concise techniques used
3. Comparisons between different models
Coding Techniques :
A.List comprehensions
B.Samples to reduce computational cost
C.Concise 'def' functions that can be used repetitively
D.Pivoting using groupby
E.When & How to convert and reshape dictionary’s into lists or dataframes
F.Quickly split dataframe columns
H.Loop Sub-plots
I.Quick Lambda formulae functions
J.Quick looping print or DataFrame conversion of summative scores
K.Order plot components
L.Create & Plot Bulk Ensemble comparative results
> # **Insurance (Reserving)**
# **Claim Simulations**
In short, this projects contains a Python Kernel to automate the probabilistic
claims simulation process for actuarial reserving calculations.
Reserving Method Used: Inflation Adjusted Chain Ladder.
## Claims Simulation
[Article](https://medium.com/@DRicky.Ch29/inflation-adjusted-chain-ladder-iacl-with-only-python-pandas-module-512914d9a1d)
or
[Python Code Guide](https://www.kaggle.com/derrickchua29/simulating-claim-data-iacl-calculation)
or
[Python Code v2](https://github.com/DeDeDeDer/Personal_Projects/blob/master/Claims%20Simulation%20(Insurance%20Reserving)/Claims_Simulator.py)
Present: Simulation supports Claim Numbers (Poisson, Negative Binomial) & Amounts (Gaussian, LogNormal).
Ongoing:
1. Support Bornhuetter-Ferguson Method (BF).
Contents:0. Assumptions
1. Development-Year lags
2. Incremental & Cumulative claim amounts
3. Uplift past inflation for incremental amounts & Derive cumulative
4. Individual Loss Development Factors (LDFs)
5. Raw preliminary view of triangle
6. Establish predicted lag years data frame
7. Impute latest cumulative amounts
8. Simple Mean & Volume Weighted LDFs & 5/3 Year Averages & Select
9. Predict future cumulative amounts
10. Calculate incremental amounts
11. Project future inflation for incremental amounts
12. Reserve summation
> # **Microsoft Package**
# **Microsoft Package**
Prior to learning Python coding language, I had to refine the basics.
Since Excel & VBA are broadly deemed essential skill-sets, I thought
I build some personal models. Ideas are inspired whilst at my work
placement tenure at a consultancy company. The main objective was to
ease manual & repetitive tasking's..
## Word Documentations
[Spreadsheet](https://www.dropbox.com/s/b4cgvhjui2mj0qq/Bulk%20MailMerge%20v2.0.xlsm?dl=0)
or
[Excel VBA Code](https://www.dropbox.com/s/b4cgvhjui2mj0qq/Bulk%20MailMerge%20v2.0.xlsm?dl=0)
A reproducible Excel VBA programme that automates bulk simultaneous word
document mail merges. Data entry checks (file exists etc.) & cleaning (excess
spaces, invalid file directory ...) are done by the coding as well. This code
does NOT use the standard mail merge function that operates ONLY on 1-single
document. Instead allows running on mass word documentations.
Inspiration:
Whilst assisting my previous employer to prepare clients for the European
General Data Protection Regulations (GDPR) privacy documentations, I created
this programme to streamline over 30hours of manual work..
## Outlook Communications
[Spreadsheet](https://www.dropbox.com/s/o50up79cttwyfa3/Bulk%20Emailing%20v2.0.xlsm?dl=0)
or
[Excel VBA Code](https://www.dropbox.com/s/o50up79cttwyfa3/Bulk%20Emailing%20v2.0.xlsm?dl=0)
A reproducible Excel VBA programme that automates multiple simultaneous email
communications if recipients receive overlapping/same attachments or spreadsheet
tables.
Inspiration:
A responsibility of mine at a previous company involved weekly roll-forward
projection updates. I found this repetitive & build this model to automate the
job. It mitigated manual human input errors & eased the job handing over
process.