https://github.com/hackyourfuture/data-assignment-week-1
HackYourFuture data track week 1 assignment files
https://github.com/hackyourfuture/data-assignment-week-1
Last synced: 5 days ago
JSON representation
HackYourFuture data track week 1 assignment files
- Host: GitHub
- URL: https://github.com/hackyourfuture/data-assignment-week-1
- Owner: HackYourFuture
- Created: 2026-04-28T13:17:00.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-22T12:07:54.000Z (about 1 month ago)
- Last Synced: 2026-05-22T17:43:42.646Z (about 1 month ago)
- Language: Shell
- Size: 24.4 KB
- Stars: 0
- Watchers: 0
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Track โ Week 1 Assignment (Template)
The HackYourFuture Data Track Week 1 assignment: **The Data Cleaning Pipeline**.
> ๐ฉโ๐ **Students:** you are in the wrong place. Do **not** fork or use this template.
> Go to your cohort's assignment repo under
> [`HackYourAssignment`](https://github.com/HackYourAssignment) (e.g. `c55-data-week1`,
> `c56-data-week1`, โฆ). Your teacher posts the exact link in your cohort channel.
> Fork the cohort repo, branch, and open a PR back to it. Full instructions live in the
> [Week 1 Assignment on Notion](https://www.notion.so/hackyourfuture/Week-1-Assignment-The-Data-Cleaning-Pipeline-3cc37d4bf482470cbc6667bd1d1bb605).
## For instructors / track maintainers
This repo is the **upstream template** for the Week 1 assignment. At the start of each
cohort, generate a cohort-specific repo under the `HackYourAssignment` org from this
template (GitHub: **Use this template โ Create a new repository**, owner =
`HackYourAssignment`, name = `c-data-week1`). Students then fork *that* cohort repo
and open PRs back to it; the auto-grader runs on every push.
Edits to the assignment, dataset, or grader belong here on the template, not on the
cohort copies.
## Tasks at a glance
| Task | Folder | Points | What you build |
|---|---|---|---|
| **Task 1** โ Cleaner Pipeline | `task-1/` | 60 | A modular Python pipeline that reads `data/messy_users.csv`, cleans each field via helpers in `src/utils.py`, validates, and writes JSON to `output/clean_users.json`. |
| **Task 2** โ AI Debug Report | `task-2/` | 20 | Document one debugging session where you used an LLM to fix a bug. Fill in the four sections of `AI_DEBUG.md`. |
| **Task 3** โ HYF Azure proof | `task-3/` | 20 | Accept the HYF Azure tenant invite, switch to that directory, screenshot proof at `task-3/azure_proof.png`. |
Total: 100 ยท Passing: 60.
## Repository layout
```text
.
โโโ task-1/
โ โโโ data/
โ โ โโโ messy_users.csv # the dataset (committed; do not edit)
โ โโโ src/
โ โ โโโ cleaner.py # entry point โ fill in TODOs
โ โ โโโ utils.py # field-cleaning helpers โ fill in TODOs
โ โโโ output/
โ โโโ clean_users.json # your cleaner writes here
โโโ task-2/
โ โโโ AI_DEBUG.md # fill in the four sections
โโโ task-3/
โ โโโ azure_proof.png # add your screenshot here
โโโ .hyf/
โ โโโ test.sh # auto-grader (read it to see exactly what it checks)
โโโ .github/workflows/
โโโ grade-assignment.yml # runs .hyf/test.sh on every PR
```
## Run the grader locally
Before opening a PR, run the same checks the auto-grader runs:
```bash
bash .hyf/test.sh
cat .hyf/score.json
```
This prints a per-task breakdown and writes `score.json`. Iterate until
`pass: true` (or until you've given it your best attempt), then push.
## Submission
Students submit on the cohort repo (`HackYourAssignment/c-data-week1`), not here:
open a PR from the student's fork against the cohort repo's `main`, then share the PR
URL with the teacher.