{"id":50719017,"url":"https://github.com/engineerdanny/model-evaluation-lab","last_synced_at":"2026-06-09T22:01:07.995Z","repository":{"id":357674877,"uuid":"1238062950","full_name":"EngineerDanny/model-evaluation-lab","owner":"EngineerDanny","description":"Undergraduate teaching lab on model evaluation, grouped data, and data leakage","archived":false,"fork":false,"pushed_at":"2026-05-13T19:34:35.000Z","size":20,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-13T21:29:40.357Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EngineerDanny.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-13T19:23:08.000Z","updated_at":"2026-05-13T19:34:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/EngineerDanny/model-evaluation-lab","commit_stats":null,"previous_names":["engineerdanny/model-evaluation-lab"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/EngineerDanny/model-evaluation-lab","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineerDanny%2Fmodel-evaluation-lab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineerDanny%2Fmodel-evaluation-lab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineerDanny%2Fmodel-evaluation-lab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineerDanny%2Fmodel-evaluation-lab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EngineerDanny","download_url":"https://codeload.github.com/EngineerDanny/model-evaluation-lab/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineerDanny%2Fmodel-evaluation-lab/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34127345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-09T22:01:06.628Z","updated_at":"2026-06-09T22:01:07.976Z","avatar_url":"https://github.com/EngineerDanny.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Model Evaluation Lab\n\nThis is a compact undergraduate teaching module on model evaluation and data leakage.\n\nThe lab is designed for an introductory data science, machine learning, scientific computing, or applied computer science course. Students compare a random train and test split with a group held out split, then explain why the random split can produce an overly optimistic estimate when observations from the same site or source appear in both training and test data.\n\n## Why this project exists\n\nMany students learn to call `train_test_split` before they learn to ask what kind of independence the test set represents. This lab gives them a concrete failure case:\n\n1. A model looks strong under a random split.\n2. The same model performs worse when tested on a held out site.\n3. Students diagnose the difference as leakage from grouped observations.\n4. Students propose a validation design that matches the real question.\n\n## Files\n\n- `index.html`: public project page for GitHub Pages.\n- `assignment.md`: student facing assignment handout.\n- `instructor_notes.md`: teaching notes and expected discussion points.\n- `data/synthetic_sensor_microbiome.csv`: synthetic data for the lab.\n- `notebooks/model_evaluation_lab.ipynb`: starter notebook using pandas and NumPy.\n- `tools/generate_data_and_notebook.py`: reproducible build script.\n\n## Learning goals\n\nAfter completing the lab, students should be able to:\n\n- explain why a random split can overestimate performance,\n- choose a validation design that matches the deployment question,\n- compute and compare simple regression metrics,\n- connect cross-validation choices to scientific and applied computing claims,\n- write a short technical interpretation of model results.\n\n## Local use\n\nOpen `index.html` directly in a browser, or serve this folder locally:\n\n```bash\npython3 -m http.server 8000\n```\n\nThen visit `http://localhost:8000`.\n\nTo regenerate the synthetic dataset and notebook:\n\n```bash\npython3 tools/generate_data_and_notebook.py\n```\n\nThe data are synthetic and are not derived from St. Mary's College of Maryland, NAWCAD, or any real student, biological, or defense dataset.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fengineerdanny%2Fmodel-evaluation-lab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fengineerdanny%2Fmodel-evaluation-lab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fengineerdanny%2Fmodel-evaluation-lab/lists"}