{"id":51043703,"url":"https://github.com/hackyourfuture/data-assignment-week-2","last_synced_at":"2026-06-22T12:02:11.229Z","repository":{"id":356025287,"uuid":"1223590088","full_name":"HackYourFuture/data-assignment-week-2","owner":"HackYourFuture","description":"HackYourFuture data track week 2 assignment files","archived":false,"fork":false,"pushed_at":"2026-05-13T19:24:56.000Z","size":3707,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-13T21:26:13.902Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HackYourFuture.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-28T13:18:14.000Z","updated_at":"2026-05-13T19:24:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/HackYourFuture/data-assignment-week-2","commit_stats":null,"previous_names":["hackyourfuture/data-assignment-week-2"],"tags_count":0,"template":true,"template_full_name":"HackYourFuture/assignment-template","purl":"pkg:github/HackYourFuture/data-assignment-week-2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-assignment-week-2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-assignment-week-2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-assignment-week-2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-assignment-week-2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HackYourFuture","download_url":"https://codeload.github.com/HackYourFuture/data-assignment-week-2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-assignment-week-2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34647750,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-22T02:00:06.391Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-22T12:02:10.401Z","updated_at":"2026-06-22T12:02:11.220Z","avatar_url":"https://github.com/HackYourFuture.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Track — Week 2 Assignment (Template)\n\nThe HackYourFuture Data Track Week 2 assignment: **Refactoring to a Clean Pipeline**.\n\n\u003e 👩‍🎓 **Students:** you are in the wrong place. Do **not** fork or use this template.\n\u003e Go to your cohort's assignment repo under\n\u003e [`HackYourAssignment`](https://github.com/HackYourAssignment) (e.g. `c55-data-week2`,\n\u003e `c56-data-week2`, …). Your teacher posts the exact link in your cohort channel.\n\u003e Fork the cohort repo, branch, and open a PR back to it. Full instructions live in the\n\u003e [Week 2 Assignment on Notion](https://www.notion.so/hackyourfuture/Week-2-Assignment-Refactoring-to-a-Clean-Pipeline-f8c27aa88d144cb18f54c49d02f50b73).\n\n## For instructors / track maintainers\n\nThis repo is the **upstream template** for the Week 2 assignment. At the start of each\ncohort, generate a cohort-specific repo under the `HackYourAssignment` org from this\ntemplate (GitHub: **Use this template → Create a new repository**, owner =\n`HackYourAssignment`, name = `c\u003cNN\u003e-data-week2`). Students then fork *that* cohort repo\nand open PRs back to it; the auto-grader runs on every push.\n\nEdits to the assignment, dataset, or grader belong here on the template, not on the\ncohort copies.\n\n## Tasks at a glance\n\n| Task | Folder | Points | What you build |\n|---|---|---|---|\n| **Task 1** — Cleaner Pipeline | `task-1/` | 60 | A modular Python pipeline with `config.py` (env-var loading), `models.py` (`Transaction` dataclass with `__post_init__` validation), `transforms.py` (4+ pure composable functions, no mutation), `pipeline.py` (orchestrator), and `tests/test_transforms.py` (4+ pytest tests). Reads `data/messy_sales.csv`, writes `output/clean_sales.csv`. |\n| **Task 2** — AI Debug Report | `task-2/` | 20 | Document one debugging session where you used an LLM to fix a bug. Fill in the four sections of `AI_DEBUG.md`. |\n| **Task 3** — Azure Blob Upload | `task-3/` | 20 | Upload `task-1/output/clean_sales.csv` to a private Blob container in the HYF Azure storage account using the portal's Storage Browser. Save your screenshot as `task-3/assets/azure_blob_week2.png` (`.jpg`/`.jpeg` also accepted) and the blob URL in `task-3/assets/blob_url.txt`. Working in Codespaces? See [AZURE_LOGIN.md](AZURE_LOGIN.md) to authenticate first. |\n\nTotal: 100 · Passing: 60.\n\n## Repository layout\n\n```text\n.\n├── task-1/\n│   ├── data/\n│   │   └── messy_sales.csv      # the dataset (committed; do not edit)\n│   ├── src/\n│   │   ├── config.py            # env-var loader — fill in TODOs\n│   │   ├── models.py            # Transaction dataclass — fill in TODOs\n│   │   ├── transforms.py        # 4 pure transform functions — fill in TODOs\n│   │   └── pipeline.py          # orchestrator — fill in TODOs\n│   ├── tests/\n│   │   └── test_transforms.py   # 4 pytest tests — fill in TODOs\n│   ├── output/                  # your pipeline writes clean_sales.csv here (gitignored)\n│   ├── .env.example             # copy to .env (gitignored) before running\n│   └── requirements.txt         # python3 -m pip install -r requirements.txt\n├── task-2/\n│   └── AI_DEBUG.md              # fill in the four sections\n├── task-3/\n│   └── assets/\n│       ├── azure_blob_week2.png # add your screenshot here (jpg/jpeg also accepted)\n│       └── blob_url.txt         # paste your Azure Storage blob URL here\n├── .hyf/\n│   └── test.sh                  # auto-grader (read it to see exactly what it checks)\n└── .github/workflows/\n    └── grade-assignment.yml     # runs .hyf/test.sh on every PR\n```\n\n## Run the grader locally\n\nBefore opening a PR, run the same checks the auto-grader runs:\n\n```bash\ncd task-1\npython3 -m pip install -r requirements.txt\ncp .env.example .env\ncd ..\nbash .hyf/test.sh\ncat .hyf/score.json\n```\n\nThe grader prints a per-task breakdown so you can see exactly which check failed and\nwhy. The PR-time grader does the same — your local run and the CI run are identical.\n\n## Scoring ladder (Task 1)\n\nThe grader awards points incrementally so partial credit is meaningful:\n\n- **10/60** — required files exist (`config.py`, `models.py`, `transforms.py`, `pipeline.py`, `tests/test_transforms.py`, `.env.example`).\n- **20/60** — `python -m src.pipeline` runs from `task-1/` without crashing (the grader injects `INPUT_PATH` and `OUTPUT_PATH` inline; your local `.env` is not used during grading).\n- **40/60** — `output/clean_sales.csv` passes structural checks: 12 rows (15 input − 3 invalid/zero-quantity), lowercased emails, title-cased product names, \"Unknown\" filled in for missing categories, `revenue` and `vat` columns present and correctly calculated.\n- **60/60** — code looks engineered: `models.py` defines a `@dataclass` with `__post_init__`; `transforms.py` uses the `{**row, ...}` spread pattern (no mutation); `pytest tests/` reports all tests passing.\n\nThe 40-point cap exists to stop a 5-line script that hardcodes the expected JSON from getting full marks. Real engineering patterns (dataclass + spread + tests) are required for the top 20 points.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhackyourfuture%2Fdata-assignment-week-2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhackyourfuture%2Fdata-assignment-week-2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhackyourfuture%2Fdata-assignment-week-2/lists"}