{"id":50508860,"url":"https://github.com/denisecase/streaming-01-foundations","last_synced_at":"2026-06-02T18:31:11.939Z","repository":{"id":352456289,"uuid":"1214511060","full_name":"denisecase/streaming-01-foundations","owner":"denisecase","description":"Professional Python project: streaming foundations.","archived":false,"fork":false,"pushed_at":"2026-05-09T21:11:12.000Z","size":561,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-09T23:19:38.961Z","etag":null,"topics":["data-analytics","git","github-actions","professional-python","project-template","python","ruff","src-layout","streaming","uv","zensical"],"latest_commit_sha":null,"homepage":"https://denisecase.github.io/streaming-01-foundations/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/denisecase.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-18T17:16:34.000Z","updated_at":"2026-05-09T21:11:55.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/denisecase/streaming-01-foundations","commit_stats":null,"previous_names":["denisecase/streaming-01-foundations"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/denisecase/streaming-01-foundations","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisecase%2Fstreaming-01-foundations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisecase%2Fstreaming-01-foundations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisecase%2Fstreaming-01-foundations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisecase%2Fstreaming-01-foundations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/denisecase","download_url":"https://codeload.github.com/denisecase/streaming-01-foundations/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisecase%2Fstreaming-01-foundations/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33833277,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analytics","git","github-actions","professional-python","project-template","python","ruff","src-layout","streaming","uv","zensical"],"created_at":"2026-06-02T18:31:11.289Z","updated_at":"2026-06-02T18:31:11.933Z","avatar_url":"https://github.com/denisecase.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# streaming-01-foundations\n\n[![Workflow Guide](https://img.shields.io/badge/Pro--Guide-pro--analytics--02-green)](https://denisecase.github.io/pro-analytics-02/workflow-b-apply-example-project/)\n[![Python 3.14](https://img.shields.io/badge/python-3.14%2B-blue?logo=python)](./pyproject.toml)\n[![MIT](https://img.shields.io/badge/license-see%20LICENSE-yellow.svg)](./LICENSE)\n\n\u003e Streaming data analytics: local streaming foundations.\n\nData analytics requires a variety of skills.\nThis course builds capabilities through working projects.\n\nIn the age of generative AI, **durable skills** are grounded in real work:\nsetting up a professional environment,\nreading and running code,\nunderstanding the logic,\nand pushing work to a shared repository.\nEach project follows the structure of professional Python projects.\nWe learn by doing.\n\n## This Project\n\nThis project introduces the workflow shape used throughout the course.\n\nThe project does not require Kafka to run,\nbut we start the install process\nand begin practicing with multiple terminals.\n\nInstallations are early to allow for issues.\n\nOur producer this week reads sales records from a local CSV file,\nprocesses each record one at a time,\nand writes consumed records to an output CSV (it's a proxy for a\nKafka topic that we will use for \"real\" streaming projects).\n\nKafka setup begins here,\nbut Kafka does not need to be running for this project to succeed.\nThe goal is to get the local project working first and\nbegin work with Kafka so we can use it in the next module.\n\nAsk lots of questions - we are here to help.\nIt's only really bad the very first time we use it.\nIt gets better.\n\n## Working Files\n\nYou'll work with just these areas:\n\n- **data/** - input data and generated output files\n- **docs/** - the project narrative and documentation\n- **src/streaming/** - producer, consumer, and supporting code\n- **pyproject.toml** - update authorship \u0026 links\n- **zensical.toml** - update authorship \u0026 links\n\n## Instructions\n\nFollow the\n[step-by-step workflow guide](https://denisecase.github.io/pro-analytics-02/workflow-b-apply-example-project/)\nto complete:\n\n1. Phase 1. **Start \u0026 Run**\n2. Phase 2. **Change Authorship**\n3. Phase 3. **Read \u0026 Understand**\n4. Phase 4. **Modify**\n5. Phase 5. **Apply**\n\n## Challenges\n\nChallenges are expected.\nSometimes instructions may not quite match your operating system.\nWhen issues occur, share screenshots, error messages, and details about what you tried.\nWorking through issues is part of implementing professional projects.\n\n## Success\n\nAfter completing Phase 1. **Start \u0026 Run**, you'll have your own GitHub project\nrunning with Kafka.\n\nUse four named terminals for practice:\n\n1. **kafka** - where kafka will run (if Win, use WSL)\n2. **topics** - manage topics (if Win, use WSL)\n3. **producer** - run the project and producer\n4. **consumer** - run the consumer\n\nAfter the producer and consumer run successfully, you should see:\n\n```shell\n========================\nConsumer executed successfully!\n========================\n```\n\nA new file `project.log` will appear in the root project folder and\nthe producer will stream messages to a new **data/output** file.\nThe consumer will read and process message events from that file.\n\n## Command Reference\n\nThe commands below are used in the workflow guide above.\nThey are provided here for convenience.\n\nFollow the guide for the **full instructions**.\n\n\u003cdetails\u003e\n\u003csummary\u003eShow command reference\u003c/summary\u003e\n\n### In a machine terminal (open in your `Repos` folder)\n\nAfter you get a copy of this repo in your own GitHub account,\nopen a machine terminal in your `Repos` folder:\n\n```shell\n# Replace username with YOUR GitHub username.\ngit clone https://github.com/username/streaming-01-foundations\n\ncd streaming-01-foundations\ncode .\n```\n\n### In VS Code Terminal 1: Start Kafka (kafka)\n\nFor full instructions see\n[**start kafka**](https://denisecase.github.io/pro-analytics-02/kafka/start-kafka/).\n\nIf any command fails,\nrepeat the steps at\n[**install kafka**](https://denisecase.github.io/pro-analytics-02/kafka/install-kafka/)\nuntil starting up is reliable.\n\nOpen a new VS Code terminal. Rename it `kafka`.\nIf running Windows, specify the terminal type as **wsl** or\ntype `wsl`.\nRun the commands one at a time.\n\nStep 1. Verify Java and PATH\n\n```bash\necho \"$JAVA_HOME\"\n\n\"$JAVA_HOME/bin/java\" --version\n```\n\nStep 2. Rebuild ClusterID (as needed)\n\n```bash\ncd ~/kafka\n\nrm -rf /tmp/kraft-combined-logs\n\nKAFKA_CLUSTER_ID=\"$(bin/kafka-storage.sh random-uuid)\"\n\necho \"Cluster ID: $KAFKA_CLUSTER_ID\"\n\nbin/kafka-storage.sh format --standalone -t \"$KAFKA_CLUSTER_ID\" -c config/server.properties\n```\n\nStep 3. Start kafka server (keep running)\n\n```bash\ncd ~/kafka\n\nbin/kafka-server-start.sh config/server.properties\n```\n\n### In VS Code terminal 2: Create Topic (topics)\n\nFor full instructions see\n[**create topic**](https://denisecase.github.io/pro-analytics-02/kafka/create-topic/).\n\nThe topic name must match the name defined in your\n`.env` file (copy `.env.example` to `.env`).\n\nOpen another VS Code terminal. Rename it `topics`.\nIf running Windows, specify the terminal type as **wsl** or\ntype `wsl`.\nRun the commands one at a time.\n\n```bash\ncd ~/kafka\n\nbin/kafka-topics.sh --create \\\n  --bootstrap-server localhost:9092 \\\n  --partitions 1 \\\n  --replication-factor 1 \\\n  --topic streaming-01-foundations-case\n```\n\n### In VS Code Terminal 3: Run Project and Producer (producer)\n\nOpen another VS Code terminal. Rename it `producer`.\nIf running Windows, use **PowerShell**.\nRun the commands one at a time.\n\n```shell\n\n```shell\n# reset uv cache only after suspected cache corruption or strange dependency errors\n# uv cache clean\n\nuv self update\nuv python pin 3.14\nuv sync --extra dev --extra docs --upgrade\n\nuvx pre-commit install\n\ngit add -A\nuvx pre-commit run --all-files\n\n# repeat if changes were made by pre-commit tasks\ngit add -A\nuvx pre-commit run --all-files\n\n# run the producer (produces messages)\nuv run python -m streaming.producer_case\n\n# do chores\nuv run ruff format .\nuv run ruff check . --fix\nuv run python -m pyright\nuv run python -m pytest\nuv run python -m zensical build\n\n# save progress\ngit add -A\ngit commit -m \"your message here\"\n\n# repeat if changes were made (try the UP ARROW)\ngit add -A\ngit commit -m \"your message here\"\n\ngit push -u origin main\n```\n\n### In VS Code Terminal 4: Run Consumer (consumer)\n\nOpen another VS Code terminal. Rename it `consumer`.\nIf running Windows, use **PowerShell**.\nRun the commands one at a time.\nVerify Kafka is reachable, then start the consumer.\n\n```shell\nclear\nuv run python -m streaming.consumer_case\n```\n\n\u003c/details\u003e\n\n## Notes\n\n- Use the **UP ARROW** and **DOWN ARROW** in the terminal to scroll through past commands.\n- Use `CTRL+f` to find (and replace) text within a file.\n- You do not need to add to or modify `tests/`. They are provided for example only.\n- Many files are silent helpers. Explore as you like, but nothing is required.\n- You do NOT not to understand everything; understanding builds naturally over time.\n\n## Troubleshooting \u003e\u003e\u003e\n\nIf you see something like this in your terminal: `\u003e\u003e\u003e` or `...`\nYou accidentally started Python interactive mode.\nIt happens.\nPress `Ctrl+c` (both keys together) or `Ctrl+Z` then `Enter` on Windows.\n\n## Missing .env?\n\nSee [create topic](https://denisecase.github.io/pro-analytics-02/kafka/create-topic/)\nfor why we must copy `.env.example` to `.env`.\n\n## Many Terminals\n\nSee [many terminals](https://denisecase.github.io/pro-analytics-02/kafka/many-terminals/)\nfor how we name our terminals (and if Windows, how we get the different types).\nYou can split terminals shown below, or just click between them as you like.\n\n## Example Producer Output\n\n```text\n| P01 | === RUN START ===\n| P01 | project=P01\n| P01 | repo_dir=streaming-01-foundations\n| P01 | python=3.14.0\n| P01 | os=Windows 11\n| P01 | shell=powershell\n| P01 | cwd=.\n| P01 | github_actions=False\n| P01 | ========================\n| P01 | START producer main()\n| P01 | ========================\n| P01 | ROOT_DIR = .\n| P01 | DATA_DIR = data\n| P01 | SALES_CSV = data\\sales.csv\n| P01 | TOPIC_CSV = data\\output\\streaming-01-foundations-case.csv\n| P01 | ========================\n| P01 | SECTION A. Acquire\n| P01 | ========================\n| P01 | Loading settings from .env...\n| P01 | KAFKA_TOPIC                       = streaming-01-foundations-case\n| P01 | KAFKA_CLEAR_TOPIC_ON_START        = True\n| P01 | PRODUCER_MESSAGE_COUNT            = 3\n| P01 | PRODUCER_MESSAGE_INTERVAL_SECONDS = 2.0\n| P01 | Verifying local source data...\n| P01 | Source file found: sales.csv\n| P01 | Preparing local simulated topic file...\n| P01 | Deleted existing topic file: streaming-01-foundations-case.csv\n| P01 | Topic file will be created: streaming-01-foundations-case.csv\n| P01 | ========================\n| P01 | SECTION P. Produce Messages\n| P01 | ========================\n| P01 | Sending messages...\n| P01 | Sending up to 3 local message(s).\n| P01 | Writing to simulated topic file: streaming-01-foundations-case.csv\n| P01 | Watch each sale arrive. Press CTRL+C to stop early.\n\n| P01 | {\n  order_id: e7324981-a9f0-419f-b708-d0a333451fff\n  datetime: 2026-05-04T08:11:00Z\n  region_id: US-TX\n  currency_code: USD\n  product_id: PY-STREAM-005\n  unit_price: 59.99\n  quantity: 3\n  is_online: true\n  customer_id: CUST-4150\n  is_new_customer: false\n  device_type: tablet\n  payment_method: paypal\n  referral_source: paid_search\n  discount_code:\n  customer_note: Gift for my team\n}\n| P01 |   Sending local message with key=US-TX\n| P01 |   MESSAGE SENT  sent=1\n2026-05-10 07:37:20 | P01 | {\n  order_id: d61943e0-f543-4b5f-9c9a-18605ea4cfe5\n  datetime: 2026-05-04T08:23:00Z\n  region_id: US-TX\n  currency_code: USD\n  product_id: PY-DATA-002\n  unit_price: 49.99\n  quantity: 1\n  is_online: true\n  customer_id: CUST-1106\n  is_new_customer: false\n  device_type: mobile\n  payment_method: paypal\n  referral_source: paid_search\n  discount_code:\n  customer_note: Gift for my team\n}\n2026-05-10 07:37:20 | P01 |   Sending local message with key=US-TX\n2026-05-10 07:37:20 | P01 |   MESSAGE SENT  sent=2\n| P01 | {\n  order_id: 14da1915-8e74-47be-9e10-f7275d31af46\n  datetime: 2026-05-04T08:28:00Z\n  region_id: CA-QC\n  currency_code: CAD\n  product_id: PY-NLP-006\n  unit_price: 54.99\n  quantity: 1\n  is_online: true\n  customer_id: CUST-2133\n  is_new_customer: false\n  device_type: desktop\n  payment_method: paypal\n  referral_source: organic\n  discount_code:\n  customer_note: Learning at my own pace\n}\n| P01 |   Sending local message with key=CA-QC\n| P01 |   MESSAGE SENT  sent=3\n| P01 | ========================\n| P01 | SECTION E. Exit\n| P01 | ========================\n| P01 | Summary:\n| P01 | Sent 3 message(s).\n| P01 | WROTE TOPIC_CSV = data\\output\\streaming-01-foundations-case.csv\n| P01 | ========================\n| P01 | Producer executed successfully!\n| P01 | ========================\n```\n\n## Example Consumer Output\n\n```text\n| C01 | ========================\n| C01 | START consumer main()\n| C01 | ========================\n| C01 | ROOT_DIR = .\n| C01 | DATA_DIR = data\n| C01 | TOPIC_CSV = data\\output\\streaming-01-foundations-case.csv\n| C01 | OUTPUT_CSV = data\\output\\consumed_sales.csv\n| C01 | ========================\n| C01 | SECTION A. Acquire\n| C01 | ========================\n| C01 | Loading settings from .env...\n| C01 | KAFKA_TOPIC                    = streaming-01-foundations-case\n| C01 | CONSUMER_MAX_MESSAGES          = 1000\n| C01 | CONSUMER_POLL_INTERVAL_SECONDS = 0.5\n| C01 | CONSUMER_TIMEOUT_SECONDS       = 10.0\n| C01 | Verifying local simulated topic file...\n| C01 | Topic file found: streaming-01-foundations-case.csv\n| C01 | ========================\n| C01 | SECTION C. Consume and Process Messages\n| C01 | ========================\n| C01 | Initializing output...\n| C01 | Output CSV cleared: consumed_sales.csv\n| C01 | Consuming local messages...\n| C01 | Waiting for up to 1000 message(s).\n| C01 | Stopping after 10.0s with no new message.\n\n| C01 | {'order_id': 'e7324981-a9f0-419f-b708-d0a333451fff', 'datetime': '2026-05-04T08:11:00Z', 'region_id': 'US-TX', 'currency_code': 'USD', 'product_id': 'PY-STREAM-005', 'unit_price': '59.99', 'quantity': '3', 'is_online': 'true', 'customer_id': 'CUST-4150', 'is_new_customer': 'false', 'device_type': 'tablet', 'payment_method': 'paypal', 'referral_source': 'paid_search', 'discount_code': '', 'customer_note': 'Gift for my team'}\n| C01 | Processing raw local message.\n| C01 | MESSAGE CONSUMED\n| C01 | consumed=1\n| C01 | {'order_id': 'd61943e0-f543-4b5f-9c9a-18605ea4cfe5', 'datetime': '2026-05-04T08:23:00Z', 'region_id': 'US-TX', 'currency_code': 'USD', 'product_id': 'PY-DATA-002', 'unit_price': '49.99', 'quantity': '1', 'is_online': 'true', 'customer_id': 'CUST-1106', 'is_new_customer': 'false', 'device_type': 'mobile', 'payment_method': 'paypal', 'referral_source': 'paid_search', 'discount_code': '', 'customer_note': 'Gift for my team'}\n| C01 | Processing raw local message.\n| C01 | MESSAGE CONSUMED\n| C01 | consumed=2\n| C01 | {'order_id': '14da1915-8e74-47be-9e10-f7275d31af46', 'datetime': '2026-05-04T08:28:00Z', 'region_id': 'CA-QC', 'currency_code': 'CAD', 'product_id': 'PY-NLP-006', 'unit_price': '54.99', 'quantity': '1', 'is_online': 'true', 'customer_id': 'CUST-2133', 'is_new_customer': 'false', 'device_type': 'desktop', 'payment_method': 'paypal', 'referral_source': 'organic', 'discount_code': '', 'customer_note': 'Learning at my own pace'}\n| C01 | Processing raw local message.\n| C01 | MESSAGE CONSUMED\n| C01 | consumed=3\n| C01 | No new message received within 10.0s timeout.\n| C01 | Producer finished or paused. Stopping consumer.\n| C01 | Saving artifacts...\n| C01 | WROTE OUTPUT_CSV = data\\output\\consumed_sales.csv\n| C01 | ========================\n| C01 | SECTION E. Exit\n| C01 | ========================\n| C01 | Summary:\n| C01 | Consumed 3 message(s).\n| C01 | OUTPUT_CSV = data\\output\\consumed_sales.csv\n| C01 | ========================\n| C01 | Consumer executed successfully!\n| C01 | ========================\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdenisecase%2Fstreaming-01-foundations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdenisecase%2Fstreaming-01-foundations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdenisecase%2Fstreaming-01-foundations/lists"}