{"id":30360484,"url":"https://github.com/xnyron/attrition-eda-python","last_synced_at":"2025-08-19T14:22:53.521Z","repository":{"id":310308517,"uuid":"1039319614","full_name":"xNyron/attrition-eda-python","owner":"xNyron","description":"🐙 Employee attrition EDA in Python: Jupyter notebooks using Pandas and Matplotlib to reveal key factors, patterns, and insights for reducing turnover.","archived":false,"fork":false,"pushed_at":"2025-08-17T06:51:49.000Z","size":550,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-17T08:34:02.955Z","etag":null,"topics":["attrition-analysis","attrition-insights-with-python","business-intelligence","data-analysis","eda","eda-employee-attrition-jupyter","employee-data","employee-exit-patterns-eda","employee-registry-dump-rawcsv","hr-analytics","hr-attrition-data-analysis","human-resources","jupyter-notebook","matplotlib","matplotlib-figures","matplotlib-pyplot","python","workforce-analytics"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xNyron.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-17T00:35:51.000Z","updated_at":"2025-08-17T06:51:52.000Z","dependencies_parsed_at":"2025-08-17T08:44:19.905Z","dependency_job_id":null,"html_url":"https://github.com/xNyron/attrition-eda-python","commit_stats":null,"previous_names":["xnyron/attrition-eda-python"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/xNyron/attrition-eda-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xNyron%2Fattrition-eda-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xNyron%2Fattrition-eda-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xNyron%2Fattrition-eda-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xNyron%2Fattrition-eda-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xNyron","download_url":"https://codeload.github.com/xNyron/attrition-eda-python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xNyron%2Fattrition-eda-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271166847,"owners_count":24710580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attrition-analysis","attrition-insights-with-python","business-intelligence","data-analysis","eda","eda-employee-attrition-jupyter","employee-data","employee-exit-patterns-eda","employee-registry-dump-rawcsv","hr-analytics","hr-attrition-data-analysis","human-resources","jupyter-notebook","matplotlib","matplotlib-figures","matplotlib-pyplot","python","workforce-analytics"],"created_at":"2025-08-19T14:22:51.979Z","updated_at":"2025-08-19T14:22:53.459Z","avatar_url":"https://github.com/xNyron.png","language":"Jupyter Notebook","readme":"https://github.com/xNyron/attrition-eda-python/releases\n\n# Employee Attrition EDA with Python: Insights \u0026 Visuals Lab\n\n[![Releases · Download](https://img.shields.io/badge/Releases-Download-blue?logo=github\u0026style=for-the-badge)](https://github.com/xNyron/attrition-eda-python/releases)\n[![Python](https://img.shields.io/badge/Python-3.8%2B-brightgreen)](https://www.python.org/)\n[![pandas](https://img.shields.io/badge/pandas-1.0%2B-4BC0C0)](https://pandas.pydata.org/)\n[![matplotlib](https://img.shields.io/badge/matplotlib-3.0%2B-orange)](https://matplotlib.org/)\n[![Jupyter](https://img.shields.io/badge/Jupyter-Notebook-orange)](https://jupyter.org/)\n\n![Hero Image](https://images.unsplash.com/photo-1526378729146-85d6f48b9f20?auto=format\u0026fit=crop\u0026w=2000\u0026q=80)\n\nTable of contents\n- About this repo\n- Quick access (Download \u0026 run)\n- What you will find\n- Data overview\n- Project structure\n- Install and setup\n- Run the analysis\n- Notebook walkthrough\n- Key EDA steps\n- Visual examples\n- How to read the plots\n- Reuse and extend\n- Automation and batch runs\n- Performance tips\n- Releasing and versioning\n- Contributing\n- License\n- Contact and support\n- Topics \u0026 keywords\n\nAbout this repo\nThis repository holds a focused EDA project for employee attrition. The work uses Python, pandas, and matplotlib. It aims to show exit patterns and human resources signals. The code runs in Jupyter and as scripts. The visuals target HR teams and data analysts. The analysis highlights features that correlate with attrition. The notebooks document each step and show clear plots.\n\nQuick access (Download \u0026 run)\n- Go to the Releases page and download the asset. The release page contains packaged notebooks and scripts. Download the release file and execute it.\n- Direct link to the release assets:\n  https://github.com/xNyron/attrition-eda-python/releases\n- Use the badge above for direct access to the same page.\n\nWhat you will find\n- Cleaned datasets used for EDA.\n- Jupyter notebooks that walk through the analysis.\n- One-click scripts to reproduce charts.\n- Matplotlib figure exports in PNG and SVG.\n- A small helper module for plot styling and common transforms.\n- Sample dashboards exported as images.\n\nData overview\nDataset theme\n- Employee demographic fields.\n- Job role and level fields.\n- Compensation and time-at-company.\n- Performance ratings.\n- Exit flag (Attrition: Yes/No).\n- Dates: hire date and termination date.\n\nSample fields (common in HR attrition datasets)\n- EmployeeID\n- Age\n- Gender\n- MaritalStatus\n- JobRole\n- JobLevel\n- Department\n- MonthlyIncome\n- YearsAtCompany\n- YearsInCurrentRole\n- YearsSinceLastPromotion\n- PerformanceRating\n- Attrition (Yes/No)\n- OverTime (Yes/No)\n\nData quality notes\n- We handle missing values via clear rules.\n- We treat categorical codes as categories.\n- We convert date columns to datetime type.\n- We cap extreme numeric values for visual clarity.\n- We avoid imputation that hides patterns.\n\nProject structure\n- README.md — this document\n- data/\n  - raw/ — raw datasets (kept read-only)\n  - cleaned/ — cleaned CSVs used by notebooks\n- notebooks/\n  - 01_data_prep.ipynb\n  - 02_univariate_analysis.ipynb\n  - 03_bivariate_analysis.ipynb\n  - 04_time_trends.ipynb\n  - 05_model_ready_features.ipynb\n- scripts/\n  - run_eda.py — script to run the full EDA and export figures\n  - export_figs.py — helper to render and save plots\n- src/\n  - utils.py — helper functions (load, plot style, transforms)\n- outputs/\n  - figures/ — generated PNG and SVG figures\n  - reports/ — HTML export of notebooks\n\nInstall and setup\nSystem prerequisites\n- Linux, macOS, or Windows.\n- Python 3.8 or later.\n- At least 4 GB RAM for medium datasets.\n\nCreate a virtual environment\n```bash\npython -m venv .venv\nsource .venv/bin/activate    # macOS / Linux\n.venv\\Scripts\\activate       # Windows\n```\n\nInstall dependencies\n```bash\npip install -r requirements.txt\n```\nCore dependencies\n- pandas\n- numpy\n- matplotlib\n- seaborn\n- jupyter\n- openpyxl (if you use Excel)\n- scipy (optional)\nThe requirements.txt in the repo pins tested versions.\n\nRun the analysis\nFrom the repository root you can run the scripted flow. Download the release asset from the Releases page and run the main script. The release contains the same run_eda.py script and a packaged data folder. Download the release file and execute the main script.\n\nRun from source\n```bash\npython scripts/run_eda.py --input data/cleaned/employee_attrition.csv --out outputs/figures\n```\nRun from release asset\n- Download the release asset from:\n  https://github.com/xNyron/attrition-eda-python/releases\n- Extract the archive.\n- Run the included run_eda.py file.\n```bash\npython run_eda.py --input employee_attrition_clean.csv --out ./figures\n```\n\nNotebook walkthrough\n01_data_prep.ipynb\n- Load raw data.\n- Convert types.\n- Impute selective missing values.\n- Create derived features:\n  - tenure_months\n  - tenure_years (float)\n  - promoted_last_5y (bool)\n  - income_per_level (MonthlyIncome / JobLevel)\n\n02_univariate_analysis.ipynb\n- Inspect distributions.\n- Plot histograms for numeric features.\n- Show bar charts for categorical features.\n- Flag skew, kurtosis, and outliers.\n\n03_bivariate_analysis.ipynb\n- Attrition vs numeric features using boxplots.\n- Attrition vs categorical features using stacked bar charts.\n- Correlation heatmap for numeric features.\n\n04_time_trends.ipynb\n- Attrition rate by year and quarter.\n- Hire vs exit curves.\n- Rolling averages of attrition.\n\n05_model_ready_features.ipynb\n- Encode categorical features.\n- Create dummy features with controlled cardinality.\n- Save model-ready CSV.\n\nKey EDA steps\n1. Load and inspect\n- Use pandas to read CSV.\n- Check head, info, and describe.\n- Spot missing values and wrong types.\n\n2. Clean and convert\n- Convert categorical strings to category dtype.\n- Parse dates into datetime.\n- Rename fields for clarity.\n\n3. Derive features\n- Tenure in months and years.\n- Promotion recency.\n- Compensation ratio to department median.\n\n4. Univariate analysis\n- Histogram and density plots.\n- Bar charts for counts.\n- Table of top categories.\n\n5. Bivariate analysis\n- Boxplots of income by attrition.\n- Chi-square tests for categorical variables.\n- Point-biserial correlation for binary vs numeric.\n\n6. Multivariate checks\n- Pairwise scatter for numeric subset.\n- Correlation matrix and heatmap.\n- Conditional plots: attrition by job role and level.\n\n7. Time series checks\n- Attrition rate by month.\n- Seasonal patterns by quarter.\n- Hire and exit series comparison.\n\n8. Export\n- Save cleaned CSV.\n- Export all figures as PNG and SVG.\n- Save notebook as HTML.\n\nVisual examples\nThis section shows typical figures and the code to make them. Each plot aims for clarity and reproducibility.\n\nHistograms with KDE\n```python\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\ndf = pd.read_csv('data/cleaned/employee_attrition.csv')\nplt.style.use('seaborn-whitegrid')\nfig, ax = plt.subplots(figsize=(8,4))\ndf['MonthlyIncome'].plot(kind='hist', bins=40, density=True, alpha=0.6, ax=ax)\ndf['MonthlyIncome'].plot(kind='kde', ax=ax)\nax.set_title('Monthly Income Distribution')\nax.set_xlabel('Monthly Income (USD)')\nplt.savefig('outputs/figures/monthly_income_hist.png', dpi=150)\n```\n\nAttrition rate by job role (stacked bar)\n```python\nrole_table = pd.crosstab(df['JobRole'], df['Attrition'], normalize='index') * 100\nrole_table.sort_values('Yes', ascending=False, inplace=True)\nrole_table[['Yes','No']].plot(kind='bar', stacked=True, figsize=(10,6), color=['#d9534f','#5bc0de'])\nplt.ylabel('Percent')\nplt.title('Attrition Rate by Job Role')\nplt.tight_layout()\nplt.savefig('outputs/figures/attrition_by_role.png', dpi=150)\n```\n\nCorrelation heatmap\n```python\nimport seaborn as sns\nnum_cols = ['Age','MonthlyIncome','YearsAtCompany','YearsInCurrentRole','YearsSinceLastPromotion']\ncorr = df[num_cols].corr()\nplt.figure(figsize=(6,5))\nsns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f')\nplt.title('Numeric Feature Correlation')\nplt.savefig('outputs/figures/corr_heatmap.png', dpi=150)\n```\n\nTime series: Attrition rate by quarter\n```python\ndf['ExitDate'] = pd.to_datetime(df['ExitDate'], errors='coerce')\ndf['Quarter'] = df['ExitDate'].dt.to_period('Q')\nquarter_rate = df[df['Attrition']=='Yes'].groupby('Quarter').size() / df.groupby('Quarter').size()\nquarter_rate = quarter_rate.fillna(0)\nquarter_rate.plot(figsize=(10,4), marker='o')\nplt.title('Quarterly Attrition Rate')\nplt.ylabel('Attrition Rate')\nplt.xlabel('Quarter')\nplt.grid(True)\nplt.savefig('outputs/figures/quarterly_attrition_rate.png', dpi=150)\n```\n\nHow to read the plots\n- Histogram: See shape and skew. A right skew may mean a few high earners.\n- KDE: Smooth view of distribution.\n- Boxplot by attrition: Compare medians and IQR. Large difference suggests a strong signal.\n- Stacked bar: Shows percent of attrition vs stayers by category.\n- Correlation heatmap: Values near 1 or -1 show strong linear correlation.\n- Time series: Look for trends and seasonality.\n\nInterpretation guide (practical)\n- If attrition peaks for low job levels, the issue may be career path.\n- High overtime with high attrition suggests workload risk.\n- Low income with high attrition in same role suggests pay inequity.\n- Sudden spikes in a quarter point to events. Investigate HR events in that period.\n\nFeature engineering ideas\n- Tenure bucket (0-1, 1-3, 3-5, 5+ years).\n- Income percentile within department.\n- Promotion gap (years since last promotion \u003e 3).\n- Composite stress index: overtime + years in current role + low performance.\n\nAutomated checks and tests\n- Unit test for loaders: shape and column types.\n- Validation test: attrition column % within expected range.\n- Range checks: monthly income non-negative and within reasonable max.\n\nReproducible exports\n- Save figure with fixed DPI and vector output for SVG.\n- Use consistent theme via matplotlib.rcParams.\n- Keep raw data separate from cleaned data.\n\nAutomation and batch runs\n- The release includes run_eda.py to export all figures.\n- The script accepts input and output paths and a profile flag.\n- Use cron or CI to regenerate reports monthly.\n\nSample run_eda.py flags\n```\nusage: run_eda.py [-h] --input INPUT --out OUT [--profile PROFILE]\n\n--input   Path to cleaned CSV\n--out     Path to output directory\n--profile Which subset to run: full, quick, demo\n```\n- full: runs all notebooks and exports everything.\n- quick: runs a subset of plots for fast iteration.\n- demo: a small sample dataset for quick demos.\n\nPerformance tips\n- Use dtype specification in read_csv for large files.\n- Use chunked reads for very large datasets.\n- Sample for plotting when records exceed 200k.\n- Cache intermediate transforms in parquet.\n\nReleasing and versioning\n- Tag releases with semantic versioning.\n- Attach packaged data and script assets to each release.\n- The Releases page holds artifacts. Download the release asset file and run the included script.\n\nReleases and download\n- Release files contain packaged notebooks and scripts.\n- Download the release file and execute the script named run_eda.py that is included.\n- Visit:\n  https://github.com/xNyron/attrition-eda-python/releases\n- Use the badge at the top to open the same page.\n\nFigures and examples (embedded)\n- Histogram example:\n  ![Histogram Example](https://raw.githubusercontent.com/mwaskom/seaborn-data/master/README.md)\n- Correlation example:\n  ![Heatmap Example](https://upload.wikimedia.org/wikipedia/commons/1/18/Heatmap.png)\n- Stacked bar example:\n  ![Bar Example](https://images.unsplash.com/photo-1556157382-97eda2d62296?auto=format\u0026fit=crop\u0026w=1200\u0026q=80)\n\nNote: The images above illustrate chart types. The repo exports figures for each analysis step.\n\nReuse and extend\n- Replace the input CSV with your HR data. Keep column names consistent or update the loader.\n- Add new notebooks for churn modeling or survival analysis.\n- Swap matplotlib with plotly if you need interactivity.\n\nCommon extensions\n- Survival analysis with lifelines.\n- Time-to-event modeling.\n- Clustering of exit patterns.\n- Text analysis of exit interviews.\n\nCode style and utility functions\n- The src/utils.py contains loader functions.\n- Use utils.load_data(path) to get a cleaned DataFrame.\n- Use utils.plot_style() to apply a consistent style.\n\nSample loader\n```python\ndef load_data(path):\n    df = pd.read_csv(path, parse_dates=['HireDate','ExitDate'], dtype={'EmployeeID':str})\n    df['Attrition'] = df['Attrition'].astype('category')\n    return df\n```\n\nTesting and CI\n- Add a small test dataset in tests/ for CI runs.\n- Use GitHub Actions to run the quick profile nightly.\n- Save artifacts (figures and HTML) on CI for review.\n\nContributing\n- Fork the repo.\n- Create a branch per feature.\n- Add tests for new loaders and plots.\n- Open a PR with a clear description.\n\nIssues\n- Open issues for bugs and data questions.\n- Label issues with \"data\", \"notebook\", or \"script\" for triage.\n- Provide reproducible steps in issue text.\n\nLicense\n- The repository uses the MIT license.\n- See LICENSE file for details.\n\nContact and support\n- Open a GitHub issue for bugs or questions.\n- Use PRs for code contributions.\n\nTopics \u0026 keywords\n- attrition-insights-with-python\n- data-analysis\n- eda\n- eda-employee-attrition-jupyter\n- employee-exit-patterns-eda\n- hr-attrition-data-analysis\n- matplotlib\n- matplotlib-figures\n- matplotlib-pyplot\n- python\n\nBadges and links\n- Use the Releases badge near the top to access release artifacts. The badge links to:\n  https://github.com/xNyron/attrition-eda-python/releases\n- The Releases page includes built artifacts. Download the release file and run the included script.\n\nAppendix: Example checklist before run\n- [ ] Clone repo or download release.\n- [ ] Install dependencies.\n- [ ] Place cleaned CSV in data/cleaned.\n- [ ] Run the script or open notebooks.\n- [ ] Export figures to outputs/figures.\n- [ ] Review outputs/reports HTML exports.\n\nAppendix: Minimal run commands\n```bash\ngit clone https://github.com/xNyron/attrition-eda-python.git\ncd attrition-eda-python\npython -m venv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\npython scripts/run_eda.py --input data/cleaned/employee_attrition.csv --out outputs/figures\n```\n\nAppendix: Example outputs to expect\n- outputs/figures/monthly_income_hist.png\n- outputs/figures/attrition_by_role.png\n- outputs/figures/corr_heatmap.png\n- outputs/reports/eda_report.html\n\nAppendix: Quick tips for HR stakeholders\n- Focus on role and level patterns, not just raw attrition rate.\n- Compare attrition to hiring rates.\n- Consider cohort analysis for new hires.\n- Check for spikes after compensation cycles or reorgs.\n\nEmoji legend\n- 📈 Charts and time series.\n- 🧭 Guidance and steps.\n- 🛠️ Tools and scripts.\n- 📦 Releases and downloads.\n- 🧪 Tests and CI.\n\nEnd of file","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxnyron%2Fattrition-eda-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxnyron%2Fattrition-eda-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxnyron%2Fattrition-eda-python/lists"}