{"id":40699879,"url":"https://github.com/dssg/mlforpublicpolicylab","last_synced_at":"2026-01-21T12:03:02.595Z","repository":{"id":46119086,"uuid":"220390574","full_name":"dssg/mlforpublicpolicylab","owner":"dssg","description":"Repo for ML for Public Policy Lab course at CMU","archived":false,"fork":false,"pushed_at":"2025-11-18T03:30:49.000Z","size":159112,"stargazers_count":117,"open_issues_count":0,"forks_count":37,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-11-18T05:28:59.603Z","etag":null,"topics":["ai","machine-learning","public-policy","social-good"],"latest_commit_sha":null,"homepage":"https://dssg.github.io/mlforpublicpolicylab/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dssg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-11-08T05:04:51.000Z","updated_at":"2025-11-18T03:30:52.000Z","dependencies_parsed_at":"2025-10-22T19:22:26.349Z","dependency_job_id":null,"html_url":"https://github.com/dssg/mlforpublicpolicylab","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/dssg/mlforpublicpolicylab","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dssg%2Fmlforpublicpolicylab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dssg%2Fmlforpublicpolicylab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dssg%2Fmlforpublicpolicylab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dssg%2Fmlforpublicpolicylab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dssg","download_url":"https://codeload.github.com/dssg/mlforpublicpolicylab/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dssg%2Fmlforpublicpolicylab/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28632781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-21T04:47:28.174Z","status":"ssl_error","status_checked_at":"2026-01-21T04:47:22.943Z","response_time":86,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","machine-learning","public-policy","social-good"],"created_at":"2026-01-21T12:02:53.302Z","updated_at":"2026-01-21T12:03:02.585Z","avatar_url":"https://github.com/dssg.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 94889: Machine Learning for Public Policy Lab\n\n**Previous Versions:** [Fall 2023](https://github.com/dssg/mlforpublicpolicylab/releases/tag/Fall2023) |[Fall 2022](https://github.com/dssg/mlforpublicpolicylab/releases/tag/Fall2022) | [Fall 2021](https://github.com/dssg/mlforpublicpolicylab/releases/tag/Fall2021) | [Fall 2020](https://github.com/dssg/mlforpublicpolicylab/releases/tag/Fall2020) | [Spring 2020](https://github.com/dssg/mlforpublicpolicylab/tree/Spring2020)\n \n### Fall 2025: Tues \u0026 Thurs, 11am-12:20pm (HBH 2008), Lab Section: Wednesday 9:30-10:50am (HBH 2008)\n\n### Important\n - **All content will be on github in this repo including the [schedule](README.md#schedule) and [tech setup instructions](techhelp/)**\n - **All assignments will be on and submitted through [canvas](https://canvas.cmu.edu/courses/49133)** \n - **Class communication and announcements will be primarily through Slack and email**\n\n### Wednesday Sessions\nThe first few weeks will be hands-on tech sessions and for the remainder of the semester, we'll use the time on Wednesdays to meet with teams and check in about their progress on the project. \n\n### Course Description\nThis is a project-based course designed to provide training and experience in solving real-world problems using machine learning, with a focus on problems from public policy and social good.\n\nThrough lectures, discussions, readings, and project assignments, students will learn about and get hands-on experience building end-to-end machine learning systems, starting from project definition and scoping, to modeling, to field validation and turning their analysis into action. Through the course, students will develop skills in problem formulation, working with messy data, communicating about machine learning with non-technical stakeholders, model interpretability, understanding and mitigating algorithmic bias \u0026 disparities, evaluating the impact of deployed models, and understanding the ethical implications of design choices made throughout the ML pipeline.\n\n**Pre-Requisites**: Students will be expected to know Python (for data analysis and machine learning),SQL, and have prior graduate coursework in machine learning. This course assumes that you have taken graduate Machine Learning courses before and is focused on teaching how to use ML to solve real-world problems. Experience with *nix command line, git(hub), and working on remote machines will be helpful and is highly recommended.\n\n**[DRAFT SYLLABUS](https://github.com/dssg/mlforpublicpolicylab/blob/master/README.md#applied-ml-project)**\n\n## People\n\n### Instructor\n\n| Rayid Ghani | \n| --- | \n| \u003cimg src=\"img/rayidphoto.png\" width=\"200\" /\u003e\u003cbr /\u003e GHC 8023 \u003cbr /\u003e **Office Hours:** \u003cbr /\u003e   Tuesday 12:30-1:30pm  \u003cbr /\u003e Wednesday 4-5pm  \u003cbr /\u003e Email me if you want to meet outside these hours | \n\n### Teaching Assistant\n\n| Logan Crowl | \n| --- | \n| \u003cimg src=\"img/logan.jpeg\" width=\"200\" /\u003e\u003cbr /\u003e HBH 3024 (slack if you can't find it) \u003cbr /\u003e Office Hours: Monday 1:30-2:30pm \u003cbr/\u003e Thursday 2:15-3:15pm \u003cbr /\u003e  | \n\n## Grading\nThroughout the semester, students will work together in small groups on\na policy project using machine learning that will illustrate the concepts\ndiscussed in class and readings.  \n\nGraded components will include:\n\n  - Written scope and proposal for their project work (10%)\n\n  - Peer reviews of three peer project proposals (5%)\n\n  - Brief project progress update assignments throughout the semester (25%)\n\n  - Final group presentation of results targeted toward policy stakeholders (10%)\n\n  - Written final project report and code (25%)\n\n  - Class attendance and participation (20%)\n\n  - Submitting weekly check-in and feedback forms (5%)  \n\nThe data used for the course projects should be considered sensitive and\nprivate and must remain in the secure computing environment provided for\nthe course. **\u003cspan class=\"underline\"\u003eAny attempt to download any\nportion of the project data to a machine outside this environment will\nresult in automatic failure of the class.\u003c/span\u003e** Note that you may use\ntools like SQL clients, jupyter notebooks, etc. to interact with the\ndata on the remote servers, but may not save the dataset (or a portion\nof it) to disk on a local machine.  \n\n## Schedule\n\nSee the **[syllabus](https://github.com/dssg/mlforpublicpolicylab/blob/master/README.md#applied-ml-project)** for much more detail as well, including **links to required readings** as well as information about group projects, grading, and helpful optional readings.\n  | Week | Dates                    | Tuesday                                                                                                                                  | Wednesday                                                                                                                                                                         | Thursday                                                                                                                                                                 | Assignments                                                            | Project Focus                                                                  |\n| ---- | ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------- | ------------------------------------------------------------------------------ |\n| 1    | Tu: Aug 26\u003cbr\u003eTh: Aug 28 | [Intro/Overview + Project Overviews](https://github.com/dssg/mlforpublicpolicylab/raw/master/Lectures/01-ClassOverview.pptx)             | [Basic Tech Setup](#techsessions/session1.md): Make sure students can connect to the server through ssh, have access to github, and access the db both from psql and from dbeaver | Th: [Scoping, Problem Definition, Balancing goals](https://github.com/dssg/mlforpublicpolicylab/raw/master/Lectures/02-Scoping.pptx) (equity, efficiency, effectiveness) | 1\\. Survey (Monday)\u003cbr\u003e2\\. Project preferences + signature (Wednesday) | Get familiar with the class, goals, and understand project choices             |\n| 2    | Tu: Sep 2\u003cbr\u003eTh: Sep 4   | [Case Studies + Discussion](#case-studies-and-acquiring-data)                                                                            | [Remote Tech Workflows](https://github.com/dssg/mlforpublicpolicylab/blob/master/techsessions/remote-session2.md)                                                                 | [Acquiring Data, Privacy, Record Linkage](#case-studies-and-acquiring-data)                                                                                              |                                                                        | Data Audit and Exploration                                                     |\n| 3    | Tu: Sep 9\u003cbr\u003eTh: Sep 11  | [Data Exploration](#data-exploration-and-teamwork)\u003cbr\u003e\\+ 30 min project team meeting/coordination                                        | [Git + GitHub](https://docs.google.com/presentation/d/1qIlxS9vJdAgsu-CJK-bx1tTg4p_y0mYfokDyHJRhfAI/edit)                                                                          | Project Work                                                                                                                                                             |                                                                        | Data Stories and Finalize Project Scope                                        |\n| 4    | Tu: Sep 16\u003cbr\u003eTh: Sep 18 | [Analytical Formulation / Baselines](#analytical-formulation-baselines-and-machine-learning-pipeline-overview)                           | [Python + SQL](https://github.com/dssg/mlforpublicpolicylab/blob/master/techsessions/python_sql_tech_session.ipynb)                                                               | [Building ML Pipelines](#analytical-formulation-baselines-and-machine-learning-pipeline-overview)                                                                        | Project Proposal (tuesday)                                             | Initial ML Pipeline Setup\u003cbr\u003eAnalytical Formulation and Baselines              |\n| 5    | Tu: Sep 23\u003cbr\u003eTh: Sep 25 | [Performance Metrics / Evaluation Part 1: Choosing Metrics](#choosing-performance-metrics--evaluating-classifiers-part-1)                | Triage Configuration Tech Session                                                                                                                                                 | Project Work                                                                                                                                                             | Proposal Reviews (Wednesday)                                           | Iteration 1 - Build End to End Code Pipeline\u003cbr\u003e(Focus on end-to-end shell)    |\n| 6    | Tu: Sep 30\u003cbr\u003eTh: Oct 2  | [Performance Metrics / Evaluation Part 2: Model Selection and Validation](#choosing-performance-metrics--evaluating-classifiers-part-ii) | Group Check-Ins                                                                                                                                                                   | Temporal Deep Dive with projects                                                                                                                                         | Analytic Formulation, Baselines, and Cohort/Label Queries (monday)     |                                                                                |\n| 7    | Tu: Oct 7\u003cbr\u003eTh: Oct 9   | [Feature Engineering / Imputation](#feature-engineering-and-imputation)                                                                  | Group Check-Ins                                                                                                                                                                   | Project Work                                                                                                                                                             |                                                                        | Iteration 2 - End to End Code Pipeline\u003cbr\u003e(Focus on feature development)       |\n|      | Tu: Oct 14\u003cbr\u003eTh: Oct 16 | FALL BREAK                                                                                                                               | FALL BREAK                                                                                                                                                                        | FALL BREAK                                                                                                                                                               |                                                                        |                                                                                |\n| 8    | Tu: Oct 21\u003cbr\u003eTh: Oct 23 | [Features and Triage](#feature-engineering-in-triage)                                                                                    | Group Check-Ins                                                                                                                                                                   | triage office hours and Q\u0026A                                                                                                                                              | Modeling Plan and Temporal Validation Configuration (Monday)           |                                                                                |\n| 9    | Tu: Oct 28\u003cbr\u003eTh: Oct 30 | [ML Modeling in Practice](#ml-modeling-in-practice)                                                                                      | Group Check-Ins                                                                                                                                                                   | Project Work                                                                                                                                                             | V0 Baseline Results and (Planned) Feature List (Monday)                | Iteration 3 - End to End Code Pipeline\u003cbr\u003e(Focus on models and evaluation)     |\n| 10   | Tu: Nov 4\u003cbr\u003eTh: Nov 6   | No Classes                                                                                                                               | Group Check-Ins                                                                                                                                                                   | [Performance Metrics / Evaluation Pt. 3 (audition)](#choosing-performance-metrics--evaluating-classifiers-part-iii)                                                      | V0 Modeling Results (Monday)                                           |                                                                                |\n| 11   | Tu: Nov 11\u003cbr\u003eTh:Nov 13  | [Model Interpretability](#model-interpretability-and-ethics-workshop-thursday)                                                           | Group Check-Ins                                                                                                                                                                   | [Ethics Workshop](#ethics-workshop-thursday)                                                                                                                             | Weekly Update Assignment (Monday) - More complete results over time    | Iteration 4 - End to End Code Pipeline\u003cbr\u003e(Focus on interpreting the models)   |\n| 12   | Tu: Nov 18\u003cbr\u003eTh: Nov 20 | [Bias and Fairness](#bias-and-fairness)                                                                                                  | Group Check-Ins                                                                                                                                                                   | Project Work                                                                                                                                                             | Weekly Update Assignment (Monday) - Feature Importances + Crosstabs    |                                                                                |\n| 13   | Tu: Nov 25\u003cbr\u003eTh: Nov 27 | [Field Trials](#field-trials-validating-ml-models)                                                                                       | HOLIDAY                                                                                                                                                                           | HOLIDAY                                                                                                                                                                  | Weekly Update Assignment (Monday) - Bias                               | Final model choice and understanding its performance and impact on disparities |\n| 14   | Tu: Dec 2\u003cbr\u003eTh: Dec 4   | Wrap-up and Project Work                                                                                                                 |                                                                                                                                                                                   | Final Presentations                                                                                                                                                      | Presentations                                                          | Project Report and Presentations                                               |\n|      |                          | Finals Week                                                                                                                              | Final Report Due (Wednesday)                                                                                                                                                      |                                                                                                                                                                          | Final Report Due (Wednesday)                                           | Final Report, Code, Repo, Documentation                                        |\n\n\n\n\n## Textbook \u0026 Software\n\n**Textbook:** The course will rely on selected readings from various\nsources and has no required textbook – each week, we’ll have selected\nreadings from a variety of sources, listed below.  \n\n**Software:** For project work, we will provide students with access to\na shared data and ML infrastructure. Data will be available in a\npostgreSQL database and SQL and python will be used throughout the\ncourse. Students will be expected to store project code in a shared\ngithub repository, so you should create an account if you do not already\nhave one ([github.com](https://www.github.com/)). Additionally, we will\nbe making use of the machine learning pipeline package\n[triage](https://dssg.github.io/triage/) for modeling. \n  \n**Phone, Laptop, and Device Policy**  \nBecause much of the work in this course involves group discussions and\nresponding thoughtfully to your colleagues’ progress reports, mobile\ndevices are not permitted for use during the class. If you have a disability or other reason that necessitates\nthe use of a mobile device, please speak to one of the instructors or\nteaching assistants.  \n  \n## Class Project\n\nBeginning in the second week of class, groups of 3-4 students will\nwork together on a machine learning project throughout the semester with\none of several real-world public policy problems. Each week, every group\nwill be expected to provide an update on its current status. In\naddition to helping connect readings and discussion topics to the policy\ndomain, these updates and discussions will give you a chance to elicit\ninput and feedback from your classmates about challenges you’re facing\n(and they likely are too\\!) in your analyses.  \nThroughout the semester, students will be responsible for several\nintermediate deliverables as they work on their group projects:\n\n  - An initial project proposal, submitted as a group, including the\n    project scope and preliminary descriptive statistics about the\n    entities in their dataset. The proposal should be 4-5 pages in\n    length, not including figures, tables, or references and should\n    include the provided scoping sheet as an appendix.\n\n  - A technical ML plan, submitted as a group, detailing how the scope\n    described in their proposal can be formulated as a machine learning\n    problem and the elements of the pipeline the group will be building.\n\n  - A description of specific features to be built for the modeling\n    project, submitted as a group and describing the underlying data,\n    level at which information is available, aggregation strategies\n    (e.g., over time or geography), and plan for handling missing\n    values.\n\n  - An in-class project mid-term update presentation (approximately 7\n    minutes in length plus 3 minutes for questions), describing the\n    problem setting, approach, pipeline, and initial results.\n\n  - Brief weekly update assignments to guide our check-in discussions.\n    These typically take the form of filling results or modeling details\n    into a handful of template slides. These updates will be graded for\n    completeness and correctness, however we expect this work to be\n    iterative and errors identified in one week’s update that are\n    corrected by the next week can result in revision of the previous\n    score up to 80% of the total possible.  \n\nAt the end of the semester, each group will be responsible for a final\npresentation (10 minutes in length plus 3 minutes for questions). While\nthe deep dive presentations should be more technical in nature, the\nfinal presentation should be geared towards the relevant decision makers\nfor your project, including an overview of the problem and approach,\nyour results, policy recommendations, and limitations of the work.  \nAccompanying the final presentation is a written report, approximately\n15 pages in length, which should include:\n\n  - An executive summary not to exceed 1 page that succinctly describes\n    the project, results, and recommendations.\n\n  - An overview of the problem, its significance, and the scope and\n    goals of the current work.\n\n  - A description of the methodology and results of the analysis. The\n    report should also provide a link to well-documented code in your\n    group’s course github repository.\n\n  - Brief (1-2 paragraph) design of a field trial to evaluate the\n    accuracy of the resulting model in practice as well as its ability\n    to help the organization achieve its goals.\n\n  - Concluding lessons and recommendations for the partner organization.\n\n  - Optionally, you may also wish to include a proposal for future\n    avenues of research beyond the scope of this work, for instance on\n    novel machine learning methods to improve on the current work, new\n    policy interventions to evaluate or explore, or other related\n    research opportunities.  \n\n## Tentative Schedule\n\nIn general, the course will be structured around three sessions each\nweek:\n\n  - During the Tuesday sessions, we’ll focus on structured lectures and\n    discussions of the weekly topic (including a mix of live lectures\n    and discussions of pre-recorded content throughout the semester).\n\n  - During the Wednesday lab/recitation sessions, we’ll discuss\n    technical skills and tools you’ll need for the project work early in\n    the semester and then shift to check-ins with each team to discuss\n    the status of their project work, generally around short update\n    assignments due on Monday (each team should review the updates of\n    all teams working on the same project, and the discussion will\n    involve feedback from your peers and the instructors.\n\n  - Early in the semester, Thursday sessions will also focus on lectures\n    and discussions, but once the projects are underway, most weeks will\n    reserve this time for group meetings and project work.\n      \n    Although we’re dedicating some time in class to work with your\n    group, please note that successfully completing the project will\n    require considerable work outside of class time as well, and will\n    constitute the majority of the “homework” for the course.\n\nBelow is a preliminary schedule of the course, including the readings\nthat will be assigned for that week. Please be sure to have read and be\nprepared to discuss the readings before the specified class session.\nMost of these topics can be (and often are) the focus of entire courses\nand generally we’ll only scratch the surface, but hopefully inspire you\nto delve deeper into areas that interest you (and you’ll find plenty of\nopen research questions in each). Optional readings are also listed for\nmost sessions which may be of interest to students who wish to delve\ndeeper into a given area, as well as provide additional context for your\nrelated project work.\n\n  - **Week 1 (Aug 26, Aug 28)** Introduction and Project Scoping\n    ### Introduction and Project Scoping\n\n    **[Lecture Slides for Tuesday](https://github.com/dssg/mlforpublicpolicylab/raw/master/Lectures/01-ClassOverview.pptx)**\n\n    On Tuesday, we’ll provide an introduction to the class, its goals,\n    and an overview of the project options to help you decide what\n    you’re interested in working on for the remainder of the\n    semester.  \n      \n    **[Slides for Wednesday Lab Session](#techsessions/session1.md)**\n    \n    During the Wednesday session, we’ll help ensure everyone is set up\n    to access the class technical resources.  \n      \n    **[Lecture Slides for Thursday](https://github.com/dssg/mlforpublicpolicylab/raw/master/Lectures/02-Scoping.pptx)**\n\n    On Thursday, we’ll talk about scoping, problem definition, and\n    understanding and balancing organizational goals. Well before the\n    outset of technical work, a decision needs to be made about whether\n    a given policy problem can and should be addressed with machine\n    learning: is the problem significant, feasible to solve with a\n    technical approach, and of sufficient importance to policy makers\n    that they will devote resources to implementing the solution? How\n    will success be measured? How will (often competing) goals of\n    efficiency, effectiveness, and equity be balanced?  \n      \n    **Required Readings for Thursday**:\n    \n      - *Data Science Project Scoping Guide* [Available\n        Online](https://datasciencepublicpolicy.org/our-work/tools-guides/data-science-project-scoping-guide//)\n    \n      - *Using Machine Learning to Assess the Risk of and Prevent Water\n        Main Breaks* by Kumar, A, Rizvi, SAA, et al. KDD 2018.\n        [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/KumarWaterMains.pdf)\n    \n    Optional Reading:\n    \n      - *Deconstructing Statistical Questions* by Hand, D.J. J. Royal\n        Stat Soc. A 157(3) 1994. [Available\n        Online](https://www.jstor.org/stable/pdf/2983526.pdf)\n\n  - **Week 2 (Sep 2,4 )**\n    ### Case Studies and Acquiring Data\n    This week, we’ll organize groups and begin project work  \n      \n    **[Lecture Slides for Tuesday](https://github.com/dssg/mlforpublicpolicylab/raw/master/Lectures/03-casestudies.pptx)**\n    \n    Practical examples can provide a great way to gain an understanding\n    of the nuance of applying machine learning to policy problems, so\n    Tuesday will focus on a class discussion of a case study of a recent\n    application, scoping the case together in breakout sessions.\n  \n    **Required Reading for Tuesday**:\n    \n      - *Fine-grained dengue forecasting using telephone triage\n        services* by Rehman, NA, et al. Sci. Adv. 2016. [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/RehmanDengue.pdf)\n    \n    **[Slides for Wednesday session](techsessions/remote-session2.md)**\n    During the Wednesday session, we will lead a workshop on using\n    remote workflow tools for your class project.  \n      \n    **[Lecture Slides for Thursday](https://github.com/dssg/mlforpublicpolicylab/raw/master/Lectures/04-data.pptx)**\n\n    On Thursday, we’ll delve into some of the details of acquiring data,\n    protecting privacy, and linking records across data sources.\n    Acquiring data from a project partner is often an involved process\n    with a number of legal and technical aspects. Researchers need to\n    understand how the data acquired may and may not be used (typically\n    formalized in a data use agreement as well as underlying law) and\n    ensure that the privacy of individuals in the dataset is protected\n    (potentially both through access restrictions and techniques like\n    anonymization). Once data has been acquired, it often needs to be\n    transformed to ingest into the system used for analysis, records\n    from multiple data sources linked, and data structured for further\n    analysis.  \n      \n    During class on Thursday, we’ll also talk a little bit about working\n    together with your project team.\n      \n    Optional Readings:\n    \n      - *Predictive Modeling for Public Health: Preventing Childhood\n        Lead Poisoning* by Potash, E, et al. KDD 2015. [Available\n        Online](http://www.dssgfellowship.org/wp-content/uploads/2016/01/p2039-potash.pdf)\n    \n      - *What Happens When an Algorithm Cuts Your Health Care* by\n        Lecher, C. 2018. (The Verge) [Available\n        Online](https://www.theverge.com/2018/3/21/17144260/healthcare-medicaid-algorithm-arkansas-cerebral-palsy)\n    \n      - *Broken Promises of Privacy* by Ohm, P. UCLA Law Review. 2009.\n        Introduction and Section 1. [Available\n        Online](https://heinonline.org/HOL/Page?handle=hein.journals/uclalr57\u0026div=48\u0026g_sent=1\u0026casa_token=\u0026collection=journals)\n    \n      - *Data Matching* by Christen, P. Springer (2012). Chapter 2: The\n        Data Matching Process [Available\n        Online](https://link.springer.com/book/10.1007%2F978-3-642-31164-2)\n    \n      - *Big Data and Social Science* edited by Foster, Ghani, et al.\n        Chapter 4: Databases.\n\n  - **Week 3 (Sep 9,11)**\n    ### Data Exploration and Teamwork\n    Work on your project during this week should include continuing to\n    develop and refine your scope as you begin to explore the data.\n\n    **[Lecture Slides for Tuesday](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/06-data-exploration.pptx)**\n    \n    Tuesday of this week will provide an overview exploratory data\n    analysis, particularly focusing on downstream ML analysis. Data exploration is fundamental to developing an\n    understanding of the nuances of the data and how the policy problem\n    you initially scoped can be specifically formulated as a machine\n    learning problem. This process involves generating and plotting\n    summary statistics, exploring trends over time and understanding\n    rapid changes in distributions, as well as identifying missing data\n    and outliers. Typically, data exploration should involve\n    considerable input from domain experts as you develop an\n    understanding of how the data relates to the underlying generative\n    process, as well as its idiosyncrasies and limitations.  \n      \n    We’ll also dedicate about 30 minutes during class on Tuesday for you\n    to meet with your project teams and discuss your project scope.  \n\n    During the Wednesday session, we’ll lead a tutorial/refresher about using\n    GitHub for your project.  \n    \n    Thursday class time will be dedicated to project work - use this time to do data exploration, scoping, and proposal planning.  \n    \n\n  - **Week 4 (Sep 16, 18)**\n    **Due Tuesday, Sep 19: Project proposal with scope and descriptive statistics**\n    \n    ### Analytical Formulation, Baselines, and Machine Learning Pipeline Overview\n\n    **[Lecture Slides for Tuesday](https://github.com/dssg/mlforpublicpolicylab/raw/refs/heads/master/Lectures/07-formulation-and-baselines.pptx)**\n     \n    On Tuesday, we’ll discuss analytical formulation of policy\n    projects. Distinct from the initial scoping, a true analytical\n    formulation of your policy problem can only come after you have\n    developed an understanding of the data at hand, which in turn will\n    often result in a greater understanding of the problem itself. Here,\n    you’ll ask how specifically your target variable (if relevant) is\n    defined in the data, what types of information are available as\n    predictors, and what baseline you’ll be measuring performance against.\n    Very rarely is the appropriate baseline as simple as \"random choice\"\n    or the population prevalence. Rather, it should reflect what would\n    be expected to happen otherwise: perhaps a simple decision rule that\n    an expert would come up with or even a pre-existing statistical\n    model that the current effort is seeking to replace.  \n      \n    Required Readings for Tuesday:\n    \n      - *Dissecting Racial Bias in an Algorithm Used to Manage the\n        Health of Populations* by Obermeyer, Z., Powers, B., et al.\n        Science. 2019. [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/ObermeyerBias.pdf)\n    \n      - *Problem Formulation and Fairness* by Passi and Barocas. FAT\\*\n        2019. [Available Online](https://arxiv.org/pdf/1901.02547.pdf)\n    \n    Optional Readings:\n    \n      - *Always Start with a Stupid Model, No Exceptions* by Ameisen, E.\n        Medium. [Available\n        Online](https://blog.insightdatascience.com/always-start-with-a-stupid-model-no-exceptions-3a22314b9aaa)\n\n      - *Create a Common-Sense Baseline First* by Ramakrishnan. Medium.\n        [Available\n        Online](https://towardsdatascience.com/first-create-a-common-sense-baseline-e66dbf8a8a47)\n    \n      - *Data Science for Business* by Provost and Fawcett. O’Reilly.\n        2013. Chapter 2: Business Problems and Data Science [Available\n        Online](https://learning.oreilly.com/library/view/data-science-for/9781449374273/ch02.html)\n\n    \n    On Wednesday, we will lead a tech session on using Python and SQL\n    together.\n      \n    On Thursday, we’ll describe the components of typical machine\n    learning pipelines. End-to-end ML Pipelines can quickly become\n    unwieldy with several moving pieces and well-structured, modular\n    code is often critical to detecting and fixing bugs in the process.\n    This session will provide an overview of the pipeline, each\n    underlying element, and some best practices for building them.  \n\n    **[Lecture Slides for Thursday](https://github.com/dssg/mlforpublicpolicylab/raw/refs/heads/master/Lectures/09-Machine-Learning-Pipelines.pptx)**\n      \n    Required Reading for Thursday:\n     - [Lecture Slides for Thursday](https://github.com/dssg/mlforpublicpolicylab/raw/refs/heads/master/Lectures/09-Machine-Learning-Pipelines.pptx)\n         \n    Optional Readings:\n    \n      - *Architecting a Machine Learning Pipeline* by Koen, S. (Medium)\n        [Available\n        Online](https://towardsdatascience.com/architecting-a-machine-learning-pipeline-a847f094d1c7)\n    \n      - *Meet Michelangelo: Uber’s Machine Learning Platform* by\n        Hermann, J and Del Balso, M. [Available\n        Online](https://eng.uber.com/michelangelo/)\n\n  - **Week 5 (Sep 23, Sep 25):**\n    ### Choosing Performance Metrics \u0026 Evaluating Classifiers, Part 1\n    Pipeline development should continue in your project, with a\n    focus on producing the simplest possible version of the full\n    system.  \n      \n    **Due Tuesday, Sep 23:** Peer reviews of three project proposals.  \n      \n    In most cases, a vast array of methods — each with a number of\n    tunable hyperparameters — can be brought to bear on your modeling\n    question. How do you decide which models are better than others and\n    how can you be confident this decision will carry forward into the\n    future when the model is deployed? How should you balance\n    considerations of performance and fairness when making these\n    decisions? Are models that are performing similarly well giving\n    similar predictions? What should you do if they are not? This\n    week, we’ll begin to answer these questions, focusing on the choice\n    of performance metrics.\n\n     **[Lecture Slides for Tuesday](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/10-model-selection-and-validation-part-1.pptx?raw=true)**\n\n      \n    Required Readings for Tuesday:\n    \n      - *The Misuse of AUC, What High Impact Risk Assessment Gets Wrong[Available\n        Online](https://arxiv.org/abs/2305.18159)\n          \n    Optional Readings:\n    \n      - *Evaluating and Comparing Classifiers* by Stapor, K. CORES 2017.\n        [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/StaporEvaluating.pdf)\n    \n    During the Wednesday session, we’ll talk about using `triage`, the\n    machine learning pipeline toolkit we will use for the class\n    project.\n\n  - **Week 6 (Sep 30, Oct 2):**\n    ### Choosing Performance Metrics \u0026 Evaluating Classifiers, Part II\n    By this week, your group should have a very simple version of an\n    end-to-end pipeline with preliminary results for a single model\n    specification.  \n      \n      \n    This week, we’ll continue our discussion from the previous week,\n    focusing specifically on validation strategies that reflect how you\n    want your model to generalize. In particular, we’ll focus on the\n    common case of modeling contexts with a strong temporal component\n    where predicting into the future is desired, exploring how your\n    choice of training and validation sets can reflect this context.\n\n    **[Lecture Slides for Tuesday](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/10-model-selection-and-validation-part-2.pptx?raw=true)**\n      \n    Required Readings for Tuesday:\n    \n      - *Cross-validation strategies for data with temporal, spatial,\n        hierarchical, or phylogenetic structure* by Roberts, DR, Bahn,\n        V, et al. Ecography 40:2017. [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/RobertsCV.pdf)\n\n    - *[Amazing Things Come From Having Many Good Models](https://arxiv.org/pdf/2407.04846)\n    \n    On Wednesday, we’ll start our regular group check-ins to provide\n    feedback on your project progress and **on Thursday, we’ll meet\n    together as a class** to do a deep dive on temporal validation\n    through a few class project examples.  \n      \n    Optional Readings:\n    \n      - *Time Series Nested Cross-Validation* by Cochrane, C. Medium.\n        [Available\n        Online](https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9)\n    \n      - *The Secrets of Machine Learning* by Rudin, C. and Carlson, D.\n        arXiv preprint: 1906.01998. 2019. [Available\n        Online](https://arxiv.org/abs/1906.01998)\n    \n      - *Big Data and Social Science (2nd edition)* edited by Foster,\n        Ghani, et al. Chapter 7: Machine Learning. [Available\n        Online](https://textbook.coleridgeinitiative.org/chap-ml.html)\n\n  - **Week 7 (Oct 7,9):**\n    ### Feature Engineering and Imputation \n    By this week, your group should have a very simple version of an\n    end-to-end pipeline with preliminary results for a single model\n    specification.  \n      \n    In many real-world contexts, expressing domain expertise through\n    thoughtful feature engineering can dramatically improve model\n    performance by understanding what underlying factors are likely to\n    be predictive and helping the model find these relationships.\n    Likewise, most data sets you’ll encounter in practice are littered\n    with outliers, inconsistencies, and missingness. Handling these data\n    issues in a smart way can be critical to a project’s success.  \n    **[Lecture Slides for Tuesday](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/12-features-and-imputation.pptx?raw=true)**  \n    Required Reading/Watching for Tuesday:\n    \n      - [Short Video\n        Lecture](https://www.youtube.com/watch?v=kluqz_1GN5c) and\n        corresponding\n        [slides](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/12-features-and-imputation.pptx?raw=true)\n    \n    On Wednesday, we’ll continue our group check-ins.  \n      \n    Optional Readings:\n    \n      - *Missing Data Conundrum* by Akinfaderin, W. Medium. [Available\n        Online](https://medium.com/ibm-data-science-experience/missing-data-conundrum-exploration-and-imputation-techniques-9f40abe0fd87)\n    \n      - *Feature Engineering for Machine Learning* by Zhang, A. and\n        Casari, A. O’Reilly. 2018. Chapter 2: Fancy Tricks with Simple\n        Numbers [Available\n        Online](https://learning.oreilly.com/library/view/feature-engineering-for/9781491953235/)\n    \n      - *Missing-data imputation* by Gelman, A. [Available\n        Online](http://www.stat.columbia.edu/~gelman/arm/missing.pdf)\n\n  - **Fall Break (Oct 14, 16): No Classes**  \n    No classes for fall break.\n\n - **Week 8 (Oct 21,23):**\n   ### Feature Engineering in Triage**  \n   **[Lecture Slides](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/12-features-and-imputation.pptx?raw=true)**\n   **[Triage Features Cookbook](https://dssg.github.io/triage_cookbook/configure/features.html)**\n  \n\n - **Week 9 (Oct 28, Oct 30):**\n   ### ML Modeling in Practice\n    During this week, your pipeline development and refinement should\n    continue with a widening set of model specifications and features to\n    explore.\n\n   **[Lecture Slides](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/14-ml-modeling-in-practice.pptx?raw=true)** \n\n   \n    **Due Monday:** Technical modeling plan and detailed feature\n    list  \n      \n    On Tuesday, we’ll cover some practical tips about building machine\n    learning models for real-world projects: how should you think about\n    what types of models to build? What hyperparameters should you\n    explore and how do you design a hyperparameter grid?  \n      \n    On Wednesday, we’ll continue our group check-ins and on Thursday,\n    you’ll have time to work with your project group.  \n      \n    Required Readings:\n    \n      - *Three Pitfalls to Avoid in Machine Learning* by Riley, P.\n        Nature. 527. 2019 (Comment) [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/RileyPitfalls.pdf)\n    \n      - *Top 10 ways your Machine Learning models may have leakage* by\n        Ghani, R. et al. DSSG Blog. [Available\n        Online](http://www.dssgfellowship.org/2020/01/23/top-10-ways-your-machine-learning-models-may-have-leakage/)\n    \n    Optional Readings:\n    \n      - *Data Science for Business* by Provost and Fawcett. O’Reilly.\n        2013. Chapter 5: Overfitting and Its Avoidance [Available\n        Online](https://learning.oreilly.com/library/view/data-science-for/9781449374273/ch05.html)\n    \n      - *Leakage in Data Mining* by Kaufman, S., Rosset, S., et al.\n        TKDD. 2011. [Available\n        Online](https://www.researchgate.net/profile/Claudia_Perlich/publication/221653692_Leakage_in_Data_Mining_Formulation_Detection_and_Avoidance/links/54418bb80cf2a6a049a5a0ca/Leakage-in-Data-Mining-Formulation-Detection-and-Avoidance.pdf)\n    \n      - *Why is Machine Learning Deployment Hard?* by Gonfalonieri, A.\n        Medium. [Available\n        Online](https://towardsdatascience.com/why-is-machine-learning-deployment-hard-443af67493cd)\n    \n      - *Overview of Different Approaches to Deploying Machine Learning\n        Models in Production* by Kervizic, J. KDnuggets. [Available\n        Online](https://www.kdnuggets.com/2019/06/approaches-deploying-machine-learning-production.html)\n\n  - **Week 10 (Nov 4,6):**\n    ### Choosing Performance Metrics \u0026 Evaluating Classifiers, Part III**  \n    At this point, your group should be continuing to refine and expand\n    on your preliminary modeling results.  \n      \n    **Due Monday:** Weekly project update with updated\n    validation splits, features, and “version 0” baseline results.  \n      \n    This week, we’ll return to our discussion of model selection,\n    delving into the details of winnowing down a large number of model\n    specifications to one or a handful that perform \"best\" for some\n    definition of \"best\". In particular, we’ll focus on the common case\n    of machine learning problems with a strong time series component and\n    the desire to balance performance and stability in model\n    selection.\n\n    **[Lecture Slides](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/15-ml-in-practice-and-model-selection-part-3.pptx?raw=true)** \n    \n\n    On Wednesday, we’ll continue our group check-ins and on Thursday,\n    you’ll time to work with your project group.\n\n  - **Week 11 (Nov 11,13):**\n    ### Model Interpretability and Ethics Workshop (Thursday) \n    By this week, project work should begin to focus more heavily\n    on evaluation, model selection, and interpretation.  \n      \n    **Due Monday, Nov 10:** Weekly project update.  \n      \n    Model interpretability can be thought of at two levels: global (how\n    the model works in aggregate) and local (why an individual\n    prediction came out as it did). This week, we’ll focus on some\n    practical aspects and applications of interpretability at the two\n    levels: understanding how a model is performing globally, what it\n    means to compare this performance across model specifications, how\n    these methods can help researchers debug and improve their models,\n    build trust among stakeholders (including a growing legal movement\n    towards a \"right to explanation\"), help those acting on model\n    predictions understand when they should override the model with\n    their judgement, and importantly help those actors decide not only\n    on whom to intervene but suggest what sort of intervention to\n    take.  \n\n    **[Lecture Slides for Tuesday](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/16-interpretability-part-1.pptx?raw=true)**\n    \n    Required Readings for Tuesday:\n\n      - *Explainable Machine Learning for Public Policy: Use Cases, Gaps, and Research Directions. Kasun Amarasinghe, Kit Rodolfa, Hemank Lamba, Rayid Ghani. Data \u0026 Policy , Volume 5 , 2023[Available Online](https://arxiv.org/abs/2010.14374)\n   \n      - *Explainable machine-learning predictions for the prevention of\n        hypoxaemia during surgery* by Lundberg, SM, Nair, B, et al.\n        Nature Biomed. Eng. 2018. [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/LundbergHyboxaemia.pdf)\n    \n    On Wednesday, we’ll continue our group check-ins. \n      \n    Optional Readings:\n   \n      - *Intelligible Models for HealthCare: Predicting Pneumonia Risk\n        and Hospital 30-day Readmission* by Caruana, R, et al. KDD 2015.\n        [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/CaruanaGAM.pdf)\n    \n      - *Why Should I Trust You? Explaining the Predictions of any\n        Classifier* by Ribeiro, MT, Singh, S, and Guestring, C. KDD\n        2016. [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/RibeiroLIME.pdf)\n \n      - *Optimized Scoring Systems: Toward Trust in Machine Learning for\n        Healthcare and Criminal Justice* by Rudin, C, and Usutn, B.\n        INFORMS Journal on Applied Analytics. 2018. [Available\n        Online](https://pubsonline.informs.org/doi/pdf/10.1287/inte.2018.0957)\n    \n      - *Interpretable Classification Models for Recidivism Prediction*\n        by Zeng, J, Ustun, B, and Rudin, C. J. Royal Stat. Soc. A. 2016.\n        [Available\n        Online](https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssa.12227)\n    \n      - *Model Agnostic Supervised Local Explanations* by Plumb, G,\n        Molitor, D, and Talwalkar, AS. NIPS 2018. [Available\n        Online](http://papers.nips.cc/paper/7518-model-agnostic-supervised-local-explanations)\n    \n      - *A Unified Approach to Interpreting Model Predictions* by\n        Lundberg, SM and Lee, S. NIPS 2017. [Available\n        Online](http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predicti)\n    \n      - *Explainable AI for Trees* by Lundberg, SM, Erion, G, et al.\n        arXiv preprint: arxiv/1905.04610. [Available\n        Online](https://arxiv.org/pdf/1905.04610.pdf)\n        \n   ### Ethics Workshop (Thursday)  \n   **[Lecture Slides for Thursday: Ethics Workshop(https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/EthicsOverview.pptx?raw=true)**\n    On Thursday, we'll do an ethics workshop where we'll discuss ethical issues in designing ML Systems\n\n\n  - **Week 12 (Nov 18, 20):**\n    ###  Bias and Fairness\n \n    By this week, you should be finalizing your modeling results and\n    beginning to look at bias and disparities in your models.  \n      \n    **Due Monday, Nov 17** Weekly project update.  \n      \n    Just as important as assessing whether your model is making accurate\n    predictions is determining whether it is doing so in a fair manner.\n    But, what do we mean by fairness? How can you measure it and what\n    can you do to mitigate any disparities you might find? Where in your\n    pipeline can bias be introduced? (spoiler: everywhere). This week\n    will provide a very brief introduction to the expansive field of\n    algorithmic fairness.\n\n    **Lecture Slides for Tuesday: [Dealing with Bias and Fairness in ML](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/bias-fairness-overview.pptx?raw=true)**\n    \n    Required Readings for Tuesday:\n    \n      - *Fairness Definitions Explained* by Verma, S and Rubin, J.\n        [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/VermaFairnessDefn.pdf)\n    \n      - *A Theory of Justice* by Rawls, J. 1971. Chapter 1: Justice as\n        Fairness, pp. 1-19. [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/RawlsJustice.pdf)\n    \n      - *Racial Equity in Algorithmic Criminal Justice* by Huq, A. Duke\n        Law Journal. 2018. [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/HuqRacialEquity.pdf)\n        \\[Focus on sections: I.B.2, all of section II, III introduction,\n        III.B, and III.D.3\\]\n    \n    Optional Readings:\n\n       - *A case study of algorithm-assisted decision making in child\n        maltreatment hotline screening decisions* by Chouldechova, A,\n        Putnam-Hornstein, E, et al. PMLR. 2018. [Available\n        Online](https://github.com/dssg/mlforpublicpolicylab/blob/master/Readings/PDF/ChouldechovaFosterCare.pdf)\n    \n      - *Case study: predictive fairness to reduce misdemeanor\n        recidivism through social service interventions* by Rodolfa,\n        K.T., et al. FAT\\* 2020. [Available\n        Online](https://arxiv.org/pdf/2001.09233)\n    \n      - *Is Algorithmic Affirmative Action Legal?* by Bent, JR.\n        Georgetown Law Journal. 2019. [Available\n        Online](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3372690)\n    \n      - *Does Mitigating ML’s Impact Disparity Require Treatment\n        Disparity?* by Lipton, Z, McAuley, J, and Chouldechova, A. NIPS\n        2018. [Available\n        Online](http://papers.nips.cc/paper/8035-does-mitigating-mls-impact-disparity-require-treatment-disparity)\n    \n      - *Equality of Opportunity* by Roemer, JE and Trannoy, A. 2013.\n        [Available\n        Online](http://cowles.yale.edu/sites/default/files/files/pub/d19/d1921.pdf)\n\n    - *Equality of Opportunity in Supervised Learning* by Hardt, M.\n        and Price, E. NIPS 2016. [Available\n        Online](http://papers.nips.cc/paper/6373-equality-of-opportunity-in-supervised-learning)\n    \n      - *Classification with fairness constraints: A meta-algorithm with\n        provable guarantees* by Celis, E, Huang, L, et al. FAT\\* 2019.\n        [Available\n        Online](https://dl.acm.org/citation.cfm?doid=3287560.3287586)\n    \n      - *Fairness Through Awareness* by Dwork, C, Hardt, M, et al. ITCS\n        2012. [Available\n        Online](https://dl.acm.org/citation.cfm?id=2090255)\n    \n      - *Fairness Constraints: Mechanisms for Fair Classification*\n        Zafar, M, Valera I, et al. PMLR 2017. [Available\n        Online](http://proceedings.mlr.press/v54/zafar17a.html)\n    \n      - *Fair Prediction with Disparate Impact: A Study of Bias in\n        Recidivism Prediction Instruments* by Chouldechova, A. Big Data.\n        2017. [Available\n        Online](https://www.liebertpub.com/doi/10.1089/big.2016.0047)\n\n  - **Week 13 (Nov 25):**\n    ###  Field Trials: Validating ML Models \n    During this week, your group should be continuing to investigate any\n    disparities in your model results, as well as performing any other\n    necessary post-modeling analyses.  \n      \n    **Due Monday, Nov 24:** Weekly project update.\n    **No Class on Wednesday or Thursday - Thanksgiving**\n\n    **Lecture Slides for Tuesday: [Field Trials: Validating ML Models](https://github.com/dssg/mlforpublicpolicylab/blob/master/Lectures/FieldValidation.pptx?raw=true)**\n  \n    Required Readings for Tuesday:\n\n    On Thursday, we'll give teams time to do final project work and get ready for presentations.\n\n  - **Week 14 (Dec 5,7): Wrap-Up and Final Presentations**  \n   On Tuesday, we’ll use a little time to wrap up the class and give time for teams to finish project work\n      \n    On Thursday, each group will give a presentation about their\n    applied ML project as described above.  \n   \n\n  - **Finals Week: Final Report Due (Dec 10)**  \n    Incorporating the results of your project work throughout the\n    semester as well as feedback from your final presentation, each\n    group will write a final project report as described\n    above.  \n\n## More Resources\n\nYou may find several books useful as general background reading, but\nthese are by no means required texts for the course:\n\n  - *Data Science for Business* by Provost and Fawcett\n\n  - *Big Data and Social Science* edited by Foster, Ghani, et al.\n    [Available Online](https://textbook.coleridgeinitiative.org/)\n\n  - *Practical Fairness: Achieving Fair and Secure Data Models* by\n    Nielsen\n\n  - *Fairness and Machine Learning* by Barocas, Hardt, and Narayana\n\n  - *Exploratory Data Analysis* by Tukey  \n\nAdditionally, the Global Communication Center (GCC) can provide\nassistance with the written or oral communication assignments in this\nclass. The GCC is a free service, open to all students, and located in\nHunt Library. You can learn more on the GCC website:\n[cmu.edu/gcc](http://www.cmu.edu/gcc).\n\n## Your Responsibilities\n\n**Attendance:** Because much of this course is focused on discussion\nwith your classmates, attending each session is important to both your\nability to learn from the course and to contribute to what others get\nout of it as well. As such, you’ll be expected to attend every session\nand your participation will factor into your grade as described above.\nShould anything come up will require you to miss a class (illness,\nconferences, etc), please let one of the course staff know in advance.  \n\n**Academic Integrity:** Violations of class and university academic\nintegrity policies will not be tolerated. Any instances of copying,\ncheating, plagiarism, or other academic integrity violations will be\nreported to your advisor and the dean of students in addition to\nresulting in an immediate failure of the course.  \n\n**Data Security:** As noted above, the data used for the project work in\nthis course should be considered sensitive and care must be taken to\nprotect the privacy of those in the dataset. The data must remain on the\ncomputing environment provided for the class and attempts to download it\nto any other machine will result in failure of the course.\n\nAdditionally, care must be taken to avoid accidentally committing any\nraw data, queries containing identifiable information, or secrets (key\nfiles, database passwords, etc) to github. Should this occur, or should\nyou have any reason to believe your personal computer or private key has\nbeen compromised, you must immediately notify the course staff of the\nissue. \n\n**AI Use Policy:**  We want this class to reflect what solving problems with ML in the real world looks like, which means different policies depending on 1) where you're working 2) the data you’re using, and 3) the privacy and confidentiality requirements. For the data we are using in this class, please don't share or upload any confidential data information to any AI tool (on the web) but beyond that, you can use any tool you want. You're accountable for the output and the work you submit. Know that a lot of these models are trained on pretty bad ML code and practices :)\n\nWe also want this class to help you understand what the AI tools are good for, where they fall short, and how to use them best to solve real-world problems. So use them, but be skeptical, review and test the output, and be ready to share what you find with others in the class.\n\n**tl;dr**\n- You can use anything you want (without uploading confidential data and information), but you're accountable and responsible for what you submit\n- Be prepared to disclose and share what you used and how you used it so others can learn\n\n\n## Resources\n\n**Students with Disabilities:** We value inclusion and will work to\nensure that all students have the resources they need to fully\nparticipate in our course. Please use the Office of Disability\nResource’s online system to notify us of any necessary accommodations\nas early in the semester as possible. If you suspect that you have a\ndisability but are not yet registered with the Office of Disability\nResources, you can contact them at \u003caccess@andrew.cmu.edu\u003e  \n\n**Health and Wellness:** As a student, you may experience a range of\nchallenges that can interfere with learning, such as strained\nrelationships, increased anxiety, substance use, feeling down,\ndifficulty concentrating and/or lack of motivation. These mental health\nconcerns or stressful events may diminish your academic performance\nand/or reduce your ability to participate in daily activities. CMU\nservices are available, and treatment does work.\n\nAll of us benefit from support during times of struggle. There are many\nhelpful resources available on campus and an important part of the\ncollege experience is learning how to ask for help. Asking for support\nsooner rather than later is almost always helpful.  \n\nIf you or anyone you know experiences any academic stress, difficult\nlife events, or feelings like anxiety or depression, we strongly\nencourage you to seek support. Counseling and Psychological Services\n(CaPS) is here to help: call 412-268-2922 and visit their website at\n[cmu.edu/counseling/](http://www.cmu.edu/counseling/). Consider reaching\nout to a friend, faculty or family member you trust for help getting\nconnected to the support that can help.\n\nIf you or someone you know is feeling suicidal or in danger of\nself-harm, call someone immediately, day or night:\n\nCaPS: 412-268-2922  \nRe:solve Crisis Network: 888-796-8226  \nIf the situation is life threatening, call the police  \nOn campus: CMU Police: 412-268-2323  \nOff campus: 911  \n\n**Discrimination and Harassment:** Everyone has a right to feel safe and\nrespected on campus. If you or someone you know has been impacted by\nsexual harassment, assault, or discrimination, resources are available\nto help. You can make a report by contacting the University’s Office of\nTitle IX Initiatives by email (\u003ctix@andrew.cmu.edu\u003e) or phone\n(412-268-7125).\n\nConfidential reporting services are available through the [Counseling\nand Psychological Services](http://www.cmu.edu/counseling/) and\n[University Health\nCenter](http://www.cmu.edu/health-services/index.html), as well as the\nEthics Reporting Hotline at 877-700-7050 or\n[www.reportit.net](http://www.reportit.net/) (user name: tartans;\npassword: plaid).\n\nYou can learn more about these options, policies, and resources by\nvisiting the University’s Title IX Office webpage at\n\u003chttps://www.cmu.edu/title-ix/index.html\u003e  \n\nIn case of an emergency, contact University Police 412-268-2323 on\ncampus or call 911 off campus.  \n\n**[Student Academic Success Center\n(SASC)](https://www.cmu.edu/student-success/)**\nSASC focuses on creating spaces for students to engage in their\ncoursework and approach learning through a variety of group and\nindividual tutoring options. They offer many opportunities for students\nto deepen their understanding of who they are as learners,\ncommunicators, and scholars. Their\n[workshops](https://www.cmu.edu/student-success/programs/workshops/index.html)\nare free to the CMU community and meet the needs of all disciplines and\nlevels of study. SASC programs to support student learning include the\nfollowing (program titles link to webpages):\n\n  - [Academic\n    Coaching](https://www.cmu.edu/student-success/programs/coaching.html)\n    – This program provides holistic, one-on-one peer support and group\n    workshops to help undergraduate and graduate students implement\n    habits for success. Academic Coaching assists students with time\n    management, productive learning and study habits, organization,\n    stress management, and other skills. Request an initial consultation\n    [here](https://docs.google.com/forms/d/e/1FAIpQLSfMAnCWkyPdXRb0zOsMar7nzpUau8hqN_gIFm3OISY5QMWwyw/viewform).\n\n  - [Peer\n    Tutoring](https://www.cmu.edu/student-success/programs/tutoring.html)\n    – Peer Tutoring is offered in two formats for students seeking\n    support related to their coursework. Drop-In tutoring targets our\n    highest demand courses through regularly scheduled open tutoring\n    sessions during the fall and spring semesters. Tutoring by\n    appointment consists of ongoing individualized and small group\n    sessions.You can utilize tutoring to discuss course related content,\n    clarify and ask questions, and work through practice problems. Visit\n    the\n    [webpage](https://www.cmu.edu/student-success/programs/tutoring.html)\n    to see courses currently being supported by Peer Tutoring.\n\n  - [Communication\n    Support](https://www.cmu.edu/student-success/programs/communication-support/index.html)\n    – Communication Support offers free one-on-one communication\n    consulting as well as group workshops to support strong written,\n    oral, and visual communication in texts including IMRaD and\n    thesis-driven essays, data-driven reports, oral presentations,\n    posters and visual design, advanced research, application materials,\n    grant proposals, business and public policy documents, data\n    visualisation, and team projects. Appointments are available to\n    undergraduate and graduate students from any discipline at CMU.\n    Schedule an\n    [appointment](https://www.cmu.edu/student-success/programs/communication-support/make-an-appointment.html)\n    on their website (in-person, zoom synchronous, or recorded video),\n    attend a\n    [workshop](https://www.cmu.edu/student-success/calendar.html), or\n    consult [handouts or\n    videos](https://www.cmu.edu/student-success/other-resources/index.html)\n    to strengthen communication skills.\n\n  - [Language and Cross-Cultural\n    Support](https://www.cmu.edu/student-success/programs/language-support/index.html)\n    – This program supports students seeking help with language and\n    cross-cultural skills for academic and professional success through\n    individual and group sessions. Students can get assistance with\n    writing academic emails, learning expectations and strategies for\n    clear academic writing, pronunciation, grammar, fluency, and more.\n    Make an\n    [appointment](https://docs.google.com/forms/d/e/1FAIpQLSfMAnCWkyPdXRb0zOsMar7nzpUau8hqN_gIFm3OISY5QMWwyw/viewform)\n    with a Language Development Specialist to get individualized\n    coaching.\n\n  - [Supplemental Instruction\n    (SI)](https://www.cmu.edu/student-success/programs/supp-inst.html) –\n    This program offers a non-remedial approach to learning in\n    historically difficult courses at CMU. It utilizes a peer-led\n    collaborative group study approach to help students succeed and is\n    facilitated by an SI leader, a CMU student who has successfully\n    completed the course. SI offers a way to connect with other students\n    studying the same course, a guaranteed weekly study time that\n    reinforces learning and retention of information, as well as a place\n    to learn and integrate study tools and exam techniques specific to a\n    course. Visit the website to see courses with SI available\n    [here](https://www.cmu.edu/student-success/programs/supp-inst.html).\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdssg%2Fmlforpublicpolicylab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdssg%2Fmlforpublicpolicylab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdssg%2Fmlforpublicpolicylab/lists"}