{"id":51043678,"url":"https://github.com/hackyourfuture/data-mid-project","last_synced_at":"2026-06-22T12:02:09.102Z","repository":{"id":363818104,"uuid":"1264848526","full_name":"HackYourFuture/data-mid-project","owner":"HackYourFuture","description":null,"archived":false,"fork":false,"pushed_at":"2026-06-18T07:47:48.000Z","size":133,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-18T09:21:58.591Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HackYourFuture.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-10T08:29:31.000Z","updated_at":"2026-06-18T07:47:50.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/HackYourFuture/data-mid-project","commit_stats":null,"previous_names":["hackyourfuture/data-mid-project"],"tags_count":0,"template":true,"template_full_name":null,"purl":"pkg:github/HackYourFuture/data-mid-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-mid-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-mid-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-mid-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-mid-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HackYourFuture","download_url":"https://codeload.github.com/HackYourFuture/data-mid-project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HackYourFuture%2Fdata-mid-project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34647750,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-22T02:00:06.391Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-22T12:02:07.648Z","updated_at":"2026-06-22T12:02:09.094Z","avatar_url":"https://github.com/HackYourFuture.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Week 7 Project: [Your Project Name]\n\n## What it does\n\n\u003c!-- Describe your pipeline in 1-2 sentences. What data does it fetch? Where does it store the results? --\u003e\n\n## Architecture\n\n```text\n[Your API] ──► pipeline.py ──► Pydantic validation ──► Postgres INSERT (your schema)\n                                                     ──► Blob Storage (raw JSON)\n```\n\n## Run locally\n\n```bash\n# 1. Populate .env from Azure Key Vault\ncp .env.example .env\necho \"POSTGRES_URL=$(az keyvault secret show --vault-name kv-hyf-data --name postgres-url --query value -o tsv)\" \u003e\u003e .env\necho \"AZURE_STORAGE_CONNECTION_STRING=$(az keyvault secret show --vault-name kv-hyf-data --name storage-connection-string --query value -o tsv)\" \u003e\u003e .env\n# Set your personal schema (replace alice with your GitHub handle):\necho \"DB_SCHEMA=dev_alice\" \u003e\u003e .env\n\n# 2. Install dependencies\nuv sync\n\n# 3. Run directly (without Docker)\nuv run python -m src.pipeline\n\n# 4. Or build and run with Docker\ndocker build -t my-pipeline .\ndocker run --env-file .env my-pipeline\n```\n\n## Run tests\n\n```bash\nuv run pytest tests/ -v\n```\n\n## Deploy to Azure\n\n```bash\n# Build for linux/amd64 (required by Azure Container Apps) and push to ACR\ndocker build --platform linux/amd64 -t hyfregistry.azurecr.io/my-pipeline:latest .\ndocker push hyfregistry.azurecr.io/my-pipeline:latest\n\n# Create Container App Job (runs daily at 06:00 UTC)\naz containerapp job create \\\n  --name my-pipeline-job \\\n  --resource-group rg-hyf-data \\\n  --environment env-hyf-data \\\n  --image hyfregistry.azurecr.io/my-pipeline:latest \\\n  --registry-server hyfregistry.azurecr.io \\\n  --trigger-type Schedule \\\n  --cron-expression \"0 6 * * *\" \\\n  --replica-timeout 300 \\\n  --replica-retry-limit 0 \\\n  --env-vars \\\n    POSTGRES_URL=\"$(az keyvault secret show --vault-name kv-hyf-data --name postgres-url --query value -o tsv)\" \\\n    AZURE_STORAGE_CONNECTION_STRING=\"$(az keyvault secret show --vault-name kv-hyf-data --name storage-connection-string --query value -o tsv)\" \\\n    DB_SCHEMA=dev_alice \\\n    LOG_LEVEL=INFO\n\n# Trigger a manual run for testing (without waiting for the schedule)\naz containerapp job start --name my-pipeline-job --resource-group rg-hyf-data\n```\n\n## Enable ACR push from CI (optional)\n\nThe `push-to-acr` job in `.github/workflows/ci.yml` is commented out by default.\nTo enable it, add two secrets in your repo's **Settings → Secrets and variables → Actions**:\n\n| Secret name | Value |\n|-------------|-------|\n| `ACR_USERNAME` | `hyfregistry` |\n| `ACR_PASSWORD` | Ask your teacher for the ACR password |\n\nThen uncomment the `push-to-acr` job in `ci.yml`. Every push to `main` will build\nand push the image automatically.\n\n## Install psql\n\n`psql` is the Postgres command-line client used to verify results. Install it once:\n\n**macOS**\n```bash\nbrew install libpq\necho 'export PATH=\"/opt/homebrew/opt/libpq/bin:$PATH\"' \u003e\u003e ~/.zshrc\nsource ~/.zshrc\n```\n\n**Linux (Debian/Ubuntu)**\n```bash\nsudo apt-get install -y postgresql-client\n```\n\n**Windows**\nDownload and run the installer from [postgresql.org/download/windows](https://www.postgresql.org/download/windows/). The installer includes `psql`. After installing, open a new terminal and verify with `psql --version`.\n\n## Verify results\n\n```bash\n# Check job execution\naz containerapp job execution list --name my-pipeline-job --resource-group rg-hyf-data --output table\n\n# Check Postgres (replace dev_alice with your schema, \u003cyour_table\u003e with your table name)\npsql \"$POSTGRES_URL\" -c \"SELECT COUNT(*) FROM dev_alice.\u003cyour_table\u003e;\"\n\n# Check Blob Storage\naz storage blob list --account-name hyfstoragedev --container-name raw --prefix pipeline/ --output table\n```\n\n## Clean up\n\n```bash\naz containerapp job delete --name my-pipeline-job --resource-group rg-hyf-data --yes\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhackyourfuture%2Fdata-mid-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhackyourfuture%2Fdata-mid-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhackyourfuture%2Fdata-mid-project/lists"}