{"id":30687593,"url":"https://github.com/seguradevinn/data-project","last_synced_at":"2025-09-02T00:04:25.757Z","repository":{"id":311779550,"uuid":"1042865273","full_name":"SeguraDevinn/Data-Project","owner":"SeguraDevinn","description":"A healthcare data audit demo using CMS SynPUF and DuckDB, showing how raw claims are cleaned, validated, and transformed into a 2009 cohort with descriptives and a RADV-style chase list.","archived":false,"fork":false,"pushed_at":"2025-08-26T14:08:51.000Z","size":30557,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-26T19:19:00.241Z","etag":null,"topics":["auditing","cms","data","duckdb","sql"],"latest_commit_sha":null,"homepage":"https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-claims-synthetic-public-use-files/cms-2008-2010-data-entrepreneurs-synthetic-public-use-file-de-synpuf/de10-sample-1","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SeguraDevinn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-22T17:50:10.000Z","updated_at":"2025-08-26T14:08:54.000Z","dependencies_parsed_at":"2025-08-26T19:29:37.527Z","dependency_job_id":null,"html_url":"https://github.com/SeguraDevinn/Data-Project","commit_stats":null,"previous_names":["seguradevinn/data-project"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/SeguraDevinn/Data-Project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeguraDevinn%2FData-Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeguraDevinn%2FData-Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeguraDevinn%2FData-Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeguraDevinn%2FData-Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SeguraDevinn","download_url":"https://codeload.github.com/SeguraDevinn/Data-Project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeguraDevinn%2FData-Project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273208777,"owners_count":25064204,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auditing","cms","data","duckdb","sql"],"created_at":"2025-09-02T00:03:21.356Z","updated_at":"2025-09-02T00:04:25.749Z","avatar_url":"https://github.com/SeguraDevinn.png","language":null,"readme":"# Medicare SynPUF Mini-Audit Project\n\n## Dataset\n- Source: [CMS SynPUF](https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-claims-synthetic-public-use-files/cms-2008-2010-data-entrepreneurs-synthetic-public-use-file-de-synpuf/de10-sample-1)\n- Files used: Beneficiary Summary (2008–2010), Inpatient Claims (2008–2010, Sample 1)\n- Disclaimer: This project replicates/extends SynPUF documentation for educational purposes. No PHI is used.\n\n## Milestones\n1. **Data loading**: DuckDB external views created from raw CSVs (`sql/01_load_raw.sql`).\n2. **Sanity checks**: Row counts, null checks, date ranges (`sql/02_sanity.sql`).\n3. **Cohort definition**: 2009 beneficiaries, enriched with age/sex/conditions (`sql/03_cohort.sql`).\n4. **Descriptives**: Age × sex, condition prevalence, utilization (`sql/04_descriptives.sql`).\n5. **Chase list**: Sampled 200 diabetes beneficiaries with claim checks (`sql/05_chase_list.sql`).\n\n## Outputs\n- `/data/clean/` → clean tables \u0026 descriptives (CSV exports).\n- `/figures/` → optional charts.\n- `chase_list_2009` → simulated RADV chase list, prioritized by missing/erroneous elements.\n\n\n\n## Traceability Matrix\n\n| Requirement                           | Source columns                  | SQL/Logic (file)           | Validation                        | Output table/view         |\n|---------------------------------------|---------------------------------|----------------------------|-----------------------------------|---------------------------|\n| Cohort = 2009 beneficiaries           | Beneficiary Summary: DESYNPUF_ID | sql/03_cohort.sql          | Row count vs. raw file            | cohort_2009               |\n| Add age, sex, chronic flags           | BENE_BIRTH_DT, SEX_IDENT_CD, condition flags | sql/03_enrichment.sql     | Age min/max, sex distribution     | cohort_2009_enriched      |\n| Age × sex descriptives                | Derived age, sex_code            | sql/04_agebands.sql        | Counts by group sum to cohort size | ageband_sex_counts_2009   |\n| Chronic condition prevalence          | DIABETES_FLAG, CHF_FLAG, COPD_FLAG | sql/04_conditions.sql     | Prevalence rates plausible        | chronic_prev_2009         |\n| Utilization (IP/OP claims per person) | CLM_FROM_DT, DESYNPUF_ID         | sql/04_utilization.sql     | Avg claims \u003e0, no null IDs        | utilization_avgs_2009     |\n| Chase list generation (sample 200)    | DESYNPUF_ID, dates, dx flags     | sql/05_chase_list.sql      | Sample size=200, flags distribution | chase_list_2009          |\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseguradevinn%2Fdata-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseguradevinn%2Fdata-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseguradevinn%2Fdata-project/lists"}