{"id":50733407,"url":"https://github.com/amirhosseinhonardoust/workout-efficiency-benchmark","last_synced_at":"2026-06-10T11:01:38.897Z","repository":{"id":361653971,"uuid":"1159937009","full_name":"AmirhosseinHonardoust/Workout-Efficiency-Benchmark","owner":"AmirhosseinHonardoust","description":"Streamlit + Python pipeline that benchmarks gym workout efficiency (kcal/min) using present sessions only. Generates sortable workout-type benchmarks, distribution plots, fairness-aware gap analysis with uncertainty/low-sample flags, and a data-quality report to prevent misleading comparisons.","archived":false,"fork":false,"pushed_at":"2026-05-31T16:28:28.000Z","size":417,"stargazers_count":11,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-31T18:18:39.031Z","etag":null,"topics":["analytics","benchmarking","bias-audit","dashboard","data-analysis","data-quality","data-science","eda","fairness","fitness","health-data","pandas","plotly","python","reporting","reproducible-research","statistics","streamlit","visualization","workout"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AmirhosseinHonardoust.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-17T10:50:07.000Z","updated_at":"2026-05-31T16:28:32.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/AmirhosseinHonardoust/Workout-Efficiency-Benchmark","commit_stats":null,"previous_names":["amirhosseinhonardoust/workout-efficiency-benchmark"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/AmirhosseinHonardoust/Workout-Efficiency-Benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmirhosseinHonardoust%2FWorkout-Efficiency-Benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmirhosseinHonardoust%2FWorkout-Efficiency-Benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmirhosseinHonardoust%2FWorkout-Efficiency-Benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmirhosseinHonardoust%2FWorkout-Efficiency-Benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AmirhosseinHonardoust","download_url":"https://codeload.github.com/AmirhosseinHonardoust/Workout-Efficiency-Benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AmirhosseinHonardoust%2FWorkout-Efficiency-Benchmark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34149132,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","benchmarking","bias-audit","dashboard","data-analysis","data-quality","data-science","eda","fairness","fitness","health-data","pandas","plotly","python","reporting","reproducible-research","statistics","streamlit","visualization","workout"],"created_at":"2026-06-10T11:01:37.864Z","updated_at":"2026-06-10T11:01:38.892Z","avatar_url":"https://github.com/AmirhosseinHonardoust.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Workout Efficiency Benchmark Report  \n**Calories per Minute (kcal/min) + fairness-aware comparisons (present sessions only)**\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Python-3.10%2B-blue\" alt=\"Python\"/\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Streamlit-App-FF4B4B\" alt=\"Streamlit\"/\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Pandas-Data%20Wrangling-150458\" alt=\"Pandas\"/\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Plotly-Interactive%20Charts-3F4F75\" alt=\"Plotly\"/\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Status-Report%20Ready-success\" alt=\"Status\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  A decision-safe benchmarking report for gym activity data: \u003cb\u003eefficiency\u003c/b\u003e, \u003cb\u003edistributions\u003c/b\u003e, \u003cb\u003esegment gaps\u003c/b\u003e, and \u003cb\u003edata quality\u003c/b\u003e, shipped as a reproducible pipeline + a Streamlit dashboard.\n\u003c/p\u003e\n \n\u003c/div\u003e\n   \n---\n\n## Why this project exists\n\n**“Calories burned” alone is not a benchmark.**  \nIf one person burns 600 kcal in 60 minutes and another burns 600 kcal in 120 minutes, they produced the same total output but **very different efficiency**.\n\nThis project turns a noisy workout dataset into a **fairer** and **more interpretable** report by:\n\n- Defining a simple benchmark metric: **efficiency = calories / minute**\n- Comparing workout types on **robust statistics** (median + IQR, not just means)\n- Showing how “global benchmarks” can mislead when segments differ\n- Adding **fairness-aware “gap analysis”** with uncertainty + low-sample flags\n- Producing a clean, navigable **Streamlit dashboard** and reproducible outputs\n\n\u003e This is **not medical advice** and not a physiological claim. It’s a **data product**: benchmarking and analytics.\n\n---\n\n## What you get (outputs)\n\nWhen you run the pipeline, you get:\n\n- **Figures** saved to: `reports/figures/`\n- **Structured outputs** (JSON/CSV) saved to: `outputs/`\n- A **Streamlit dashboard** to explore report + benchmarks + fairness + data quality\n\nExample pipeline log (typical):\n- Outputs: `outputs/`\n- Figures: `reports/figures/`\n- Present sessions used: `1250` (present rate ≈ `0.48`)\n\nThe report intentionally focuses on **present sessions only** so the efficiency metric stays meaningful.\n\n---\n\n## Metric definition (the core idea)\n\n### Efficiency (kcal/min)\n\n\u003cimg width=\"410\" height=\"82\" alt=\"Screenshot 2026-02-17 at 14-27-03 Math Equation Representation\" src=\"https://github.com/user-attachments/assets/d238bade-ebb4-4b79-91e2-f1d11370ca56\" /\u003e\n\nWhy this works:\n- Normalizes for time (60 min vs 120 min become comparable)\n- More stable as a *benchmarking* measure than raw calories\n- Still interpretable: **“How many calories per minute did this session yield?”**\n\nWhat it **does not** mean:\n- It does **not** measure “fitness” or “effort” directly\n- It is affected by confounders (intensity, body mass, workout goal, device error, etc.)\n\n---\n\n## Dataset notes (how the pipeline interprets rows)\n\nTypical fields used:\n- `visit_date`, `check_in_time`\n- `workout_type`\n- `workout_duration_minutes`\n- `calories_burned`\n- `gender`, `age`, `membership_type`\n- A presence/attendance signal inferred from the row content\n\n### “Present sessions only” rule\nEfficiency is computed only when:\n- duration is valid and \u003e 0\n- calories are valid and \u003e 0\n- the session is treated as **present**\n\nRows that look like “absent but still have activity fields” are handled via a **data quality report** (see below). The default behavior is conservative: **do not mix ambiguous/absent rows into efficiency benchmarking**.\n\n---\n\n## Quickstart\n\n### 1) Install\n```bash\npip install -r requirements.txt\n````\n\n### 2) Run the report pipeline\n\n```bash\npython -m src.pipeline --input data/raw/daily_gym_attendance_workout_data.csv\n```\n\n### 3) Launch the dashboard\n\n```bash\nstreamlit run app/app.py\n```\n\n---\n\n## Project structure\n\n```text\nworkout-efficiency-benchmark/\n├─ app/\n│  └─ app.py\n├─ data/\n│  └─ raw/\n│     └─ daily_gym_attendance_workout_data.csv\n├─ outputs/\n│  ├─ (report JSON/CSV artifacts written here)\n├─ reports/\n│  └─ figures/\n│     ├─ calories_vs_duration.png\n│     ├─ efficiency_by_age_band.png\n│     ├─ efficiency_by_workout_type.png\n│     └─ heatmap_efficiency_workout_gender.png\n└─ src/\n   ├─ pipeline.py\n   ├─ clean.py\n   ├─ fairness.py\n   └─ plots.py\n```\n\n---\n\n## Streamlit dashboard: what each tab is for\n\nBelow are the dashboard sections you captured. The goal is to keep a “decision-safe narrative”: **what we measured → how reliable it is → where comparisons break → what to do next**.\n\n\u003e Tip: Put your dashboard screenshots in `reports/screenshots/` and keep the file names stable, so the README stays evergreen.\n\n### 1) Report\n\nThis is the “one-screen summary” you’d show to a stakeholder:\n\n* total rows processed\n* present sessions used (and present rate)\n* number of workout types\n* top workout types by **median efficiency**\n* the core figures in a clean grid\n\n\u003cimg width=\"1689\" height=\"922\" alt=\"Screenshot 2026-02-17 at 12-41-22 Workout Efficiency Benchmark Report\" src=\"https://github.com/user-attachments/assets/d55ee3db-5027-4f0b-98c6-e27041c9328c\" /\u003e\n\n**How to read this page**\n\n* If present rate is low (example: 0.48), it’s a warning: benchmarks are only for the subset that truly represents workouts.\n* Rankings use **median efficiency**, not mean, because workout data usually has skew and outliers.\n\n---\n\n### 2) Benchmarks (sortable table + distribution exploration)\n\nThis is your “analyst workbench”:\n\n* A sortable table summarizing each workout type:\n\n  * sessions, members\n  * duration stats (mean/median)\n  * calories stats (mean/median)\n  * efficiency stats (mean/median)\n  * spread stats (IQR, percentiles)\n* Distribution exploration:\n\n  * efficiency histogram for a workout type\n  * box/violin plots by segments (e.g., gender within a workout type)\n\n\u003cimg width=\"1695\" height=\"936\" alt=\"Screenshot 2026-02-17 at 12-41-35 Workout Efficiency Benchmark Report\" src=\"https://github.com/user-attachments/assets/c6024117-3f47-4e56-900f-e7f310970da7\" /\u003e\n\n**Why the table matters**\nTwo workout types can have similar *median efficiency* but different risk profiles:\n\n* One might have tight IQR (predictable, stable sessions)\n* Another might be wide (high variance: depends on intensity or session type)\n\n---\n\n### 3) Fairness (gap analysis)\n\nThis tab is designed to prevent a common analytics failure:\n\n\u003e “Global benchmarks become unfair benchmarks when populations differ.”\n\nIt provides:\n\n* A group dimension selector (e.g., `age_band`, `gender`, `membership_type`)\n* A group table:\n\n  * sessions per group\n  * group median efficiency\n  * overall median efficiency\n  * gap vs overall (directional)\n  * uncertainty bounds (CI when available)\n  * effect size (e.g., Cliff’s delta)\n  * low-sample flags\n\n\u003cimg width=\"1707\" height=\"808\" alt=\"Screenshot 2026-02-17 at 12-41-49 Workout Efficiency Benchmark Report\" src=\"https://github.com/user-attachments/assets/f4f447d0-5bac-4cd9-85e0-03b0d67848c2\" /\u003e\n\n**How to interpret a “gap” correctly**\n\n* A positive gap means that group’s median efficiency is above the overall median.\n* A negative gap means it’s below.\n* A gap is **descriptive**, not causal:\n\n  * age is correlated with many factors (training goals, injury history, workout selection, schedule constraints)\n* Low sample rows are explicitly flagged because they can “look dramatic” while being unreliable.\n\n---\n\n### 4) Data Quality (trust contract)\n\nThis tab answers: **“Can I trust these benchmarks?”**\nIt exposes:\n\n* missingness in key columns\n* how many rows appear absent or inconsistent\n* the “present-only” filtering impact\n* recommended actions when the data is messy\n\n\u003cimg width=\"618\" height=\"652\" alt=\"Screenshot 2026-02-17 at 12-41-59 Workout Efficiency Benchmark Report\" src=\"https://github.com/user-attachments/assets/9332f712-a452-4a83-8518-63d0e79882c9\" /\u003e\n\n**The key decision in this project**\nIf `absent_with_activity_rate` is high, you have two legitimate interpretations:\n\n1. **Absent rows mean “scheduled but not attended”**\n\n   * Keep for attendance analytics\n   * Exclude from efficiency benchmarking (default)\n2. **Absent rows are mislabeled**\n\n   * Only flip them if you have external validation\n   * Document the rule change explicitly\n\nThis is what makes the report decision-safe: it shows the ambiguity instead of hiding it.\n\n---\n\n### 5) Notes (guardrails + next upgrades)\n\nThis section is the “don’t misuse this report” page:\n\n* It restates the intent: benchmarking, not medical inference\n* It warns about confounders\n* It suggests upgrades when you want deeper correctness\n\n\u003cimg width=\"570\" height=\"342\" alt=\"Screenshot 2026-02-17 at 12-42-07 Workout Efficiency Benchmark Report\" src=\"https://github.com/user-attachments/assets/b985ba57-488a-473b-82f5-81b452bf9f78\" /\u003e\n\n---\n\n## Figures: what each one means\n\nAll figures are saved in `reports/figures/`. The dashboard arranges them as a clean grid, but the README explains them in detail.\n\n---\n\n### Figure 1 | Calories vs Duration (Present sessions)\n\n**File:** `reports/figures/calories_vs_duration.png`\n\n\u003cimg width=\"1530\" height=\"1080\" alt=\"calories_vs_duration\" src=\"https://github.com/user-attachments/assets/c7ca5274-0263-4e0e-bb0d-8afb4e66beb3\" /\u003e\n\n**What it shows**\n\n* Each point is a session (present only).\n* X-axis: workout duration (minutes)\n* Y-axis: calories burned\n\n**What you should notice**\n\n* The “cloud” usually slopes up: longer workouts burn more calories.\n* The vertical spread at the same duration is real variance:\n\n  * intensity differs\n  * workout type differs\n  * device estimation differs\n\n**Why this figure is essential**\nIt reveals why you should not compare total calories directly:\n\n* A person can burn high calories because they stayed longer, not because they were more efficient.\n  This motivates the normalized metric: **kcal/min**.\n\n**Common interpretation mistake**\n\n* Mistake: “More calories burned means better workout.”\n* Better framing: “Calories burned reflects *time × intensity*; efficiency isolates a benchmarkable rate.”\n\n---\n\n### Figure 2 | Efficiency distribution by workout type (boxplots)\n\n**File:** `reports/figures/efficiency_by_workout_type.png`\n\n\u003cimg width=\"1980\" height=\"1080\" alt=\"efficiency_by_workout_type\" src=\"https://github.com/user-attachments/assets/8c564dec-56fc-45cb-9428-09bbdacf17bd\" /\u003e\n\n**What it shows**\n\n* Each box summarizes sessions for one workout type.\n* The line inside the box is the **median** kcal/min.\n* The box height is the **IQR** (middle 50% of sessions).\n* Whiskers show broader spread (often up to 1.5×IQR).\n\n**Why median + IQR**\nWorkout data is often skewed:\n\n* a few extreme sessions can pull the mean upward\n* medians resist outliers and are more stable for benchmarking\n\n**How to use it**\n\n* If one workout type has a higher median efficiency AND similar IQR:\n\n  * it’s consistently more “efficient” in this dataset\n* If medians are similar but IQR differs:\n\n  * one type is more predictable\n  * another is “high variance” (sometimes very efficient, sometimes not)\n\n**Decision-safe takeaway**\nUse this to set **workout-type-specific benchmarks** instead of one global number.\n\n---\n\n### Figure 3 | Mean efficiency heatmap (workout type × gender)\n\n**File:** `reports/figures/heatmap_efficiency_workout_gender.png`\n\n\u003cimg width=\"1710\" height=\"1080\" alt=\"heatmap_efficiency_workout_gender\" src=\"https://github.com/user-attachments/assets/fa05bb7d-90ef-4c68-91bd-8eba88f68abe\" /\u003e\n\n**What it shows**\n\n* Rows: workout types\n* Columns: gender categories\n* Cell value: **mean** efficiency for that slice\n* Often includes session counts (n=...) inside cells\n\n**Why this exists**\nA single overall benchmark can hide:\n\n* workout selection patterns by group\n* different distributions within workout types\n* sample imbalance (small groups creating noisy means)\n\n**How to interpret responsibly**\n\n* Treat this as a *lens*, not a verdict.\n* Always check:\n\n  * sample size (n)\n  * whether a slice is flagged low-sample in the fairness tab\n\n**Why “mean” is okay here**\nFor a heatmap, mean works as a quick summary, but:\n\n* fairness decisions should prioritize medians + uncertainty\n* this is best used to spot “where to look deeper,” not to conclude.\n\n---\n\n### Figure 4 | Median efficiency by age band\n\n**File:** `reports/figures/efficiency_by_age_band.png`\n\n\u003cimg width=\"1620\" height=\"954\" alt=\"efficiency_by_age_band\" src=\"https://github.com/user-attachments/assets/eb61de31-8bd4-4440-8e84-f79bc429b077\" /\u003e\n\n**What it shows**\n\n* X-axis: age bands (e.g., 18–24, 25–34, …, 65+)\n* Y-axis: **median** efficiency kcal/min\n* Error bars: bootstrap confidence intervals (when available)\n\n**Why age-band benchmarking is tricky**\nAge correlates with:\n\n* workout type preferences\n* injury constraints\n* training goals\n* session duration and pacing\n  So this is **not causal**, it’s descriptive.\n\n**How to use this figure**\n\n* If bands overlap heavily within CI:\n\n  * differences are likely not stable\n* If one band is clearly separated:\n\n  * treat it as a prompt to investigate confounders\n  * check workout-type composition within that band\n\n**Why 65+ often has wider CI**\nUsually because sample size is smaller, so uncertainty grows.\nThat’s exactly why this plot includes uncertainty: to prevent overconfident claims.\n\n---\n\n## Interpretation checklist (what to conclude vs what not to conclude)\n\nSafe conclusions:\n\n* “Efficiency varies by workout type; medians and spreads differ.”\n* “Some segments have different medians vs overall; sample size affects confidence.”\n* “Data quality issues (absent/inconsistent rows) materially change benchmarking.”\n\nUnsafe conclusions:\n\n* “Group X is fitter than Group Y.”\n* “Workout type A is objectively better than B.”\n* “Efficiency is a medical metric.”\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famirhosseinhonardoust%2Fworkout-efficiency-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famirhosseinhonardoust%2Fworkout-efficiency-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famirhosseinhonardoust%2Fworkout-efficiency-benchmark/lists"}