https://github.com/kdayno/workforce-flux
End-to-end people analytics on an HR dataset: DuckDB + dbt + Evidence. Translates workforce trends into actionable findings on retention, attrition, and pay.
https://github.com/kdayno/workforce-flux
analytics-engineering data-engineering dbt duckdb hr-analytics portfolio
Last synced: 7 days ago
JSON representation
End-to-end people analytics on an HR dataset: DuckDB + dbt + Evidence. Translates workforce trends into actionable findings on retention, attrition, and pay.
- Host: GitHub
- URL: https://github.com/kdayno/workforce-flux
- Owner: kdayno
- License: mit
- Created: 2026-06-01T23:26:36.000Z (21 days ago)
- Default Branch: main
- Last Pushed: 2026-06-02T00:25:45.000Z (21 days ago)
- Last Synced: 2026-06-02T01:19:24.125Z (21 days ago)
- Topics: analytics-engineering, data-engineering, dbt, duckdb, hr-analytics, portfolio
- Homepage:
- Size: 29.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

- An end-to-end People Analytics project: a raw HR dataset (sourced from [Kaggle](#data-source)) transformed into decision-useful insight on headcount, attrition, and pay.
- File-based analytics stack (DuckDB + dbt + Evidence), deployed as a static Vercel site.
> π Live demo: [workforceflux.kdayno.com](https://workforceflux.kdayno.com)
## Objectives
1. **Surface decision-useful HR insight.** Quantify workforce dynamics
(headcount growth, annualised turnover, attrition drivers, and the
effectiveness of recruitment channels), and translate the findings into
recommended actions.
2. **Apply analytics-engineering best practice.** Transform raw, inconsistent
source data into clean, tested, and documented analytical models through a
layered ELT pipeline and dimensional modelling.
3. **Demonstrate analytical rigour.** Define HR metrics correctly (for example,
annualised versus cumulative turnover), segment responsibly given the sample
size, and state every assumption and limitation transparently.
## Key findings
| # | Finding | Headline number |
|---|---|---|
| 1 | Hiring freeze, not attrition crisis | Annual turnover 3.6β9.9% (low by [BLS 2019 labour-turnover data](https://www.bls.gov/opub/mlr/2020/article/job-openings-hires-and-quits-set-record-highs-in-2019.htm)) |
| 2 | Voluntary attrition concentrated in Production department | 86% of voluntary exits from 67% of headcount |
| 3 | Production has a structural pay-competitiveness gap | Stayers earn 12.7% more than leavers at 5β10 yrs tenure |
| 4 | Pay equity is healthy; raw gap is composition | 2.1% raw gap β ~0% within position |
> **Full analysis.** Per-finding tables, methodology, assumptions, and caveats:
> [`docs/full-analysis.md`](docs/full-analysis.md). The subject company is anonymised in
> the source dataset; this README refers to it as **Company X**.
## Recommendations
The single highest-leverage intervention indicated by the analysis is a
**market-rate salary review for Production roles at 3+ years of tenure**.
This would directly address the 11 explicit "more money" voluntary exits and
likely absorb a portion of the 17 "Another position" exits.
Two supporting recommendations (engagement-survey replacement and a merit-pay
premium for Production) are detailed in [`docs/full-analysis.md#recommendations`](docs/full-analysis.md#recommendations).
## Tech stack
| Layer | Tool | Role |
|-------|------|------|
| Storage | [DuckDB](https://duckdb.org) | Embedded analytical database |
| Transformation | [dbt](https://www.getdbt.com) (`dbt-duckdb`) | Tested, layered SQL models |
| Visualisation | [Evidence](https://evidence.dev) | BI-as-code reports |
| Hosting | [Vercel](https://vercel.com) | Static hosting + auto-deploy on push |
## Data source
[Human Resources Data Set](https://www.kaggle.com/datasets/rhuebner/human-resources-data-set)
by Dr. Rich Huebner & Dr. Carla Patalano (Kaggle). A single CSV,
`HRDataset_v14.csv` (**~311 employees, 36 columns**), one row per employee.
The raw file is **not committed** (see `.gitignore`). Download it from Kaggle
(a free account is required) and place it at:
```
data/raw/HRDataset_v14.csv
```
## Project structure
```
workforce-flux/
βββ LICENSE
βββ README.md
βββ requirements.txt
βββ .gitignore
βββ data/
β βββ raw/ # HRDataset_v14.csv goes here (not committed)
βββ docs/
β βββ full-analysis.md # Full per-finding analysis (tables, caveats)
βββ eda/ # Exploratory data analysis (SQL, run against hr.duckdb)
β βββ 01_decline_diagnosis.sql
β βββ 02_retention.sql
β βββ 03_exit_reasons.sql
β βββ 04_compensation_equity.sql
βββ hr_dbt/ # dbt project
β βββ dbt_project.yml
β βββ profiles.yml
β βββ packages.yml
β βββ models/
β βββ staging/
β β βββ stg_employees.sql
β β βββ _staging.yml
β βββ intermediate/
β β βββ int_employees_enriched.sql
β β βββ int_date_spine.sql
β β βββ int_headcount_monthly.sql
β βββ marts/
β βββ dim_employee.sql
β βββ mart_headcount_monthly.sql
β βββ mart_attrition.sql
β βββ mart_recruitment_effectiveness.sql
β βββ _marts.yml
βββ reports/ # Evidence reports (live at workforceflux.kdayno.com)
βββ hr.duckdb # built by dbt; committed for Vercel
```
## Setup
```bash
# 1. Python environment
# NOTE: dbt does not yet support Python 3.14 (its mashumaro dependency fails
# to import). Use Python 3.13 or earlier.
python3.13 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
# 2. Download HRDataset_v14.csv from Kaggle into data/raw/
# 3. Build the pipeline (run dbt from inside hr_dbt/)
cd hr_dbt
dbt deps # installs dbt_utils
dbt build --profiles-dir . # runs models + tests
# 4. Run the Evidence reports locally
cd ../reports
npm install
npm run dev # opens http://localhost:3000
```
## Pipeline / data model
```
HRDataset_v14.csv
ββ stg_employees ................. clean + type-cast, 1 row per employee
ββ int_employees_enriched ..... + derived fields (age, tenure, bandsβ¦)
ββ dim_employee ............ employee dimension
ββ mart_attrition .......... department-level separation summary
ββ mart_recruitment_effectiveness
ββ int_headcount_monthly ... employee-month grain (uses int_date_spine)
ββ mart_headcount_monthly monthly time series + turnover rate
```
Layer materialisation: staging & intermediate are **views**, marts are **tables**.
## Next steps
- **Recruitment-source effectiveness as a fifth finding.** The
`mart_recruitment_effectiveness` model exists but no narrative is built on
it. Which channels (Indeed, LinkedIn, referral, diversity job fair) produce
stayers vs leavers, and at what cost-per-retained-hire?
- **Tenure survival curve for voluntary exits.** A hazard curve by
month-of-tenure, Production vs non-Production, would localise *when* in the
lifecycle attrition happens and sharpen the salary-review recommendation to
a specific tenure-month trigger.
- **Quantify the salary-review intervention.** The primary recommendation
("market-rate review for Production at 3+ years") is qualitative. A
cost-benefit estimate (retained employees and avoided replacement cost vs
the raise bill) would turn it into a business case.