An open API service indexing awesome lists of open source software.

https://github.com/kdayno/workforce-flux

End-to-end people analytics on an HR dataset: DuckDB + dbt + Evidence. Translates workforce trends into actionable findings on retention, attrition, and pay.
https://github.com/kdayno/workforce-flux

analytics-engineering data-engineering dbt duckdb hr-analytics portfolio

Last synced: 7 days ago
JSON representation

End-to-end people analytics on an HR dataset: DuckDB + dbt + Evidence. Translates workforce trends into actionable findings on retention, attrition, and pay.

Awesome Lists containing this project

README

          

![Workforce Flux Header Image](./docs/workforce-flux-header.png)

- An end-to-end People Analytics project: a raw HR dataset (sourced from [Kaggle](#data-source)) transformed into decision-useful insight on headcount, attrition, and pay.
- File-based analytics stack (DuckDB + dbt + Evidence), deployed as a static Vercel site.

> πŸ”— Live demo: [workforceflux.kdayno.com](https://workforceflux.kdayno.com)

## Objectives

1. **Surface decision-useful HR insight.** Quantify workforce dynamics
(headcount growth, annualised turnover, attrition drivers, and the
effectiveness of recruitment channels), and translate the findings into
recommended actions.
2. **Apply analytics-engineering best practice.** Transform raw, inconsistent
source data into clean, tested, and documented analytical models through a
layered ELT pipeline and dimensional modelling.
3. **Demonstrate analytical rigour.** Define HR metrics correctly (for example,
annualised versus cumulative turnover), segment responsibly given the sample
size, and state every assumption and limitation transparently.

## Key findings

| # | Finding | Headline number |
|---|---|---|
| 1 | Hiring freeze, not attrition crisis | Annual turnover 3.6–9.9% (low by [BLS 2019 labour-turnover data](https://www.bls.gov/opub/mlr/2020/article/job-openings-hires-and-quits-set-record-highs-in-2019.htm)) |
| 2 | Voluntary attrition concentrated in Production department | 86% of voluntary exits from 67% of headcount |
| 3 | Production has a structural pay-competitiveness gap | Stayers earn 12.7% more than leavers at 5–10 yrs tenure |
| 4 | Pay equity is healthy; raw gap is composition | 2.1% raw gap β†’ ~0% within position |

> **Full analysis.** Per-finding tables, methodology, assumptions, and caveats:
> [`docs/full-analysis.md`](docs/full-analysis.md). The subject company is anonymised in
> the source dataset; this README refers to it as **Company X**.

## Recommendations

The single highest-leverage intervention indicated by the analysis is a
**market-rate salary review for Production roles at 3+ years of tenure**.
This would directly address the 11 explicit "more money" voluntary exits and
likely absorb a portion of the 17 "Another position" exits.

Two supporting recommendations (engagement-survey replacement and a merit-pay
premium for Production) are detailed in [`docs/full-analysis.md#recommendations`](docs/full-analysis.md#recommendations).

## Tech stack

| Layer | Tool | Role |
|-------|------|------|
| Storage | [DuckDB](https://duckdb.org) | Embedded analytical database |
| Transformation | [dbt](https://www.getdbt.com) (`dbt-duckdb`) | Tested, layered SQL models |
| Visualisation | [Evidence](https://evidence.dev) | BI-as-code reports |
| Hosting | [Vercel](https://vercel.com) | Static hosting + auto-deploy on push |

## Data source

[Human Resources Data Set](https://www.kaggle.com/datasets/rhuebner/human-resources-data-set)
by Dr. Rich Huebner & Dr. Carla Patalano (Kaggle). A single CSV,
`HRDataset_v14.csv` (**~311 employees, 36 columns**), one row per employee.

The raw file is **not committed** (see `.gitignore`). Download it from Kaggle
(a free account is required) and place it at:

```
data/raw/HRDataset_v14.csv
```

## Project structure

```
workforce-flux/
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .gitignore
β”œβ”€β”€ data/
β”‚ └── raw/ # HRDataset_v14.csv goes here (not committed)
β”œβ”€β”€ docs/
β”‚ └── full-analysis.md # Full per-finding analysis (tables, caveats)
β”œβ”€β”€ eda/ # Exploratory data analysis (SQL, run against hr.duckdb)
β”‚ β”œβ”€β”€ 01_decline_diagnosis.sql
β”‚ β”œβ”€β”€ 02_retention.sql
β”‚ β”œβ”€β”€ 03_exit_reasons.sql
β”‚ └── 04_compensation_equity.sql
β”œβ”€β”€ hr_dbt/ # dbt project
β”‚ β”œβ”€β”€ dbt_project.yml
β”‚ β”œβ”€β”€ profiles.yml
β”‚ β”œβ”€β”€ packages.yml
β”‚ └── models/
β”‚ β”œβ”€β”€ staging/
β”‚ β”‚ β”œβ”€β”€ stg_employees.sql
β”‚ β”‚ └── _staging.yml
β”‚ β”œβ”€β”€ intermediate/
β”‚ β”‚ β”œβ”€β”€ int_employees_enriched.sql
β”‚ β”‚ β”œβ”€β”€ int_date_spine.sql
β”‚ β”‚ └── int_headcount_monthly.sql
β”‚ └── marts/
β”‚ β”œβ”€β”€ dim_employee.sql
β”‚ β”œβ”€β”€ mart_headcount_monthly.sql
β”‚ β”œβ”€β”€ mart_attrition.sql
β”‚ β”œβ”€β”€ mart_recruitment_effectiveness.sql
β”‚ └── _marts.yml
β”œβ”€β”€ reports/ # Evidence reports (live at workforceflux.kdayno.com)
└── hr.duckdb # built by dbt; committed for Vercel
```

## Setup

```bash
# 1. Python environment
# NOTE: dbt does not yet support Python 3.14 (its mashumaro dependency fails
# to import). Use Python 3.13 or earlier.
python3.13 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

# 2. Download HRDataset_v14.csv from Kaggle into data/raw/

# 3. Build the pipeline (run dbt from inside hr_dbt/)
cd hr_dbt
dbt deps # installs dbt_utils
dbt build --profiles-dir . # runs models + tests

# 4. Run the Evidence reports locally
cd ../reports
npm install
npm run dev # opens http://localhost:3000
```

## Pipeline / data model

```
HRDataset_v14.csv
└─ stg_employees ................. clean + type-cast, 1 row per employee
└─ int_employees_enriched ..... + derived fields (age, tenure, bands…)
β”œβ”€ dim_employee ............ employee dimension
β”œβ”€ mart_attrition .......... department-level separation summary
β”œβ”€ mart_recruitment_effectiveness
└─ int_headcount_monthly ... employee-month grain (uses int_date_spine)
└─ mart_headcount_monthly monthly time series + turnover rate
```

Layer materialisation: staging & intermediate are **views**, marts are **tables**.

## Next steps

- **Recruitment-source effectiveness as a fifth finding.** The
`mart_recruitment_effectiveness` model exists but no narrative is built on
it. Which channels (Indeed, LinkedIn, referral, diversity job fair) produce
stayers vs leavers, and at what cost-per-retained-hire?
- **Tenure survival curve for voluntary exits.** A hazard curve by
month-of-tenure, Production vs non-Production, would localise *when* in the
lifecycle attrition happens and sharpen the salary-review recommendation to
a specific tenure-month trigger.
- **Quantify the salary-review intervention.** The primary recommendation
("market-rate review for Production at 3+ years") is qualitative. A
cost-benefit estimate (retained employees and avoided replacement cost vs
the raise bill) would turn it into a business case.