{"id":26658850,"url":"https://github.com/denisecase/buzzline-05-case","last_synced_at":"2025-04-11T14:09:32.154Z","repository":{"id":275266883,"uuid":"897593391","full_name":"denisecase/buzzline-05-case","owner":"denisecase","description":"Kafka pipelines with data storage","archived":false,"fork":false,"pushed_at":"2025-02-24T17:11:21.000Z","size":32,"stargazers_count":0,"open_issues_count":0,"forks_count":30,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T10:17:00.519Z","etag":null,"topics":["consumer","data","kafka","producer","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/denisecase.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-02T22:34:25.000Z","updated_at":"2025-02-24T17:11:25.000Z","dependencies_parsed_at":"2025-02-01T09:37:52.416Z","dependency_job_id":null,"html_url":"https://github.com/denisecase/buzzline-05-case","commit_stats":null,"previous_names":["denisecase/buzzline-05-case"],"tags_count":0,"template":true,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisecase%2Fbuzzline-05-case","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisecase%2Fbuzzline-05-case/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisecase%2Fbuzzline-05-case/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/denisecase%2Fbuzzline-05-case/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/denisecase","download_url":"https://codeload.github.com/denisecase/buzzline-05-case/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248413664,"owners_count":21099341,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["consumer","data","kafka","producer","python"],"created_at":"2025-03-25T10:17:05.010Z","updated_at":"2025-04-11T14:09:32.132Z","avatar_url":"https://github.com/denisecase.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# buzzline-05-case\n\nNearly every streaming analytics system stores processed data somewhere for further analysis, historical reference, or integration with BI tools.\n\nIn this example project, we incorporate a relational data store. \nWe use SQLite, but the example could be altered to work with MySQL, PostgreSQL, or MongoDB.\n\n## VS Code Extensions\n\n- Black Formatter by Microsoft\n- Markdown All in One by Yu Zhang\n- PowerShell by Microsoft (on Windows Machines)\n- Pylance by Microsoft\n- Python by Microsoft\n- Python Debugger by Microsoft\n- Ruff by Astral Software (Linter)\n- **SQLite Viewer by Florian Klampfer**\n- WSL by Microsoft (on Windows Machines)\n\n## Task 1. Use Tools from Module 1 and 2\n\nBefore starting, ensure you have completed the setup tasks in \u003chttps://github.com/denisecase/buzzline-01-case\u003e and \u003chttps://github.com/denisecase/buzzline-02-case\u003e first. \n\nVersions matter. Python 3.11 is required. See the instructions for the required Java JDK and more. \n\n## Task 2. Copy This Example Project and Rename\n\nOnce the tools are installed, copy/fork this project into your GitHub account\nand create your own version of this project to run and experiment with. \nFollow the instructions in [FORK-THIS-REPO.md](https://github.com/denisecase/buzzline-01-case/docs/FORK-THIS-REPO.md).\n\nOR: For more practice, add these example scripts or features to your earlier project. \nYou'll want to check requirements.txt, .env, and the consumers, producers, and util folders. \nUse your README.md to record your workflow and commands. \n    \n\n## Task 3. Manage Local Project Virtual Environment\n\nFollow the instructions in [MANAGE-VENV.md](https://github.com/denisecase/buzzline-01-case/docs/MANAGE-VENV.md) to:\n1. Create your .venv\n2. Activate .venv\n3. Install the required dependencies using requirements.txt.\n\n## Task 4. Start Zookeeper and Kafka (Takes 2 Terminals)\n\nIf Zookeeper and Kafka are not already running, you'll need to restart them.\nSee instructions at [SETUP-KAFKA.md] to:\n\n1. Start Zookeeper Service ([link](https://github.com/denisecase/buzzline-02-case/blob/main/docs/SETUP-KAFKA.md#step-7-start-zookeeper-service-terminal-1))\n2. Start Kafka Service ([link](https://github.com/denisecase/buzzline-02-case/blob/main/docs/SETUP-KAFKA.md#step-8-start-kafka-terminal-2))\n\n---\n\n## Task 5. Start a New Streaming Application\n\nThis will take two more terminals:\n\n1. One to run the producer which writes messages. \n2. Another to run the consumer which reads messages, processes them, and writes them to a data store. \n\n### Producer (Terminal 3) \n\nStart the producer to generate the messages. \nThe existing producer writes messages to a live data file in the data folder.\nIf Zookeeper and Kafka services are running, it will try to write them to a Kafka topic as well.\nFor configuration details, see the .env file. \n\nIn VS Code, open a NEW terminal.\nUse the commands below to activate .venv, and start the producer. \n\nWindows:\n\n```shell\n.venv\\Scripts\\activate\npy -m producers.producer_case\n```\n\nMac/Linux:\n```zsh\nsource .venv/bin/activate\npython3 -m producers.producer_case\n```\n\nThe producer will still work if Kafka is not available.\n\n### Consumer (Terminal 4) - Two Options\n\nStart an associated consumer. \nYou have two options. \n1. Start the consumer that reads from the live data file.\n2. OR Start the consumer that reads from the Kafka topic.\n\nIn VS Code, open a NEW terminal in your root project folder. \nUse the commands below to activate .venv, and start the consumer. \n\nWindows:\n```shell\n.venv\\Scripts\\activate\npy -m consumers.kafka_consumer_case\nOR\npy -m consumers.file_consumer_case\n```\n\nMac/Linux:\n```zsh\nsource .venv/bin/activate\npython3 -m consumers.kafka_consumer_case\nOR\npython3 -m consumers.file_consumer_case\n```\n\n---\n\n## Review the Project Code\n\nReview the requirements.txt file. \n- What - if any - new requirements do we need for this project?\n- Note that requirements.txt now lists both kafka-python and six. \n- What are some common dependencies as we incorporate data stores into our streaming pipelines?\n\nReview the .env file with the environment variables.\n- Why is it helpful to put some settings in a text file?\n- As we add database access and passwords, we start to keep two versions: \n   - .evn \n   - .env.example\n - Read the notes in those files - which one is typically NOT added to source control?\n - How do we ignore a file so it doesn't get published in GitHub (hint: .gitignore)\n\nReview the .gitignore file.\n- What new entry has been added?\n\nReview the code for the producer and the two consumers.\n - Understand how the information is generated by the producer.\n - Understand how the different consumers read, process, and store information in a data store?\n\nCompare the consumer that reads from a live data file and the consumer that reads from a Kafka topic.\n- Which functions are the same for both?\n- Which parts are different?\n\nWhat files are in the utils folder? \n- Why bother breaking functions out into utility modules?\n- Would similar streaming projects be likely to take advantage of any of these files?\n\nWhat files are in the producers folder?\n- How do these compare to earlier projects?\n- What has been changed?\n- What has stayed the same?\n\nWhat files are in the consumers folder?\n- This is where the processing and storage takes place.\n- Why did we make a separate file for reading from the live data file vs reading from the Kafka file?\n- What functions are in each? \n- Are any of the functions duplicated? \n- Can you refactor the project so we could write a duplicated function just once and reuse it? \n- What functions are in the sqlite script?\n- What functions might be needed to initialize a different kind of data store?\n- What functions might be needed to insert a message into a different kind of data store?\n\n---\n\n## Explorations\n\n- Did you run the kafka consumer or the live file consumer? Why?\n- Can you use the examples to add a database to your own streaming applications? \n- What parts are most interesting to you?\n- What parts are most challenging? \n\n---\n\n## Later Work Sessions\nWhen resuming work on this project:\n1. Open the folder in VS Code. \n2. Open a terminal and start the Zookeeper service. If Windows, remember to start wsl. \n3. Open a terminal and start the Kafka service. If Windows, remember to start wsl. \n4. Open a terminal to start the producer. Remember to activate your local project virtual environment (.env).\n5. Open a terminal to start the consumer. Remember to activate your local project virtual environment (.env).\n\n## Save Space\nTo save disk space, you can delete the .venv folder when not actively working on this project.\nYou can always recreate it, activate it, and reinstall the necessary packages later. \nManaging Python virtual environments is a valuable skill. \n\n## License\nThis project is licensed under the MIT License as an example project. \nYou are encouraged to fork, copy, explore, and modify the code as you like. \nSee the [LICENSE](LICENSE.txt) file for more.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdenisecase%2Fbuzzline-05-case","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdenisecase%2Fbuzzline-05-case","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdenisecase%2Fbuzzline-05-case/lists"}