{"id":24021104,"url":"https://github.com/zipcodecore/kafka3-data","last_synced_at":"2025-04-15T21:15:32.206Z","repository":{"id":147837746,"uuid":"257346408","full_name":"ZipCodeCore/Kafka3-Data","owner":"ZipCodeCore","description":"build simple consumer (python)","archived":false,"fork":false,"pushed_at":"2022-04-22T20:22:37.000Z","size":12,"stargazers_count":0,"open_issues_count":0,"forks_count":30,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-15T21:15:23.034Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZipCodeCore.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-20T16:50:51.000Z","updated_at":"2023-04-29T12:50:45.000Z","dependencies_parsed_at":"2023-04-10T00:02:22.382Z","dependency_job_id":null,"html_url":"https://github.com/ZipCodeCore/Kafka3-Data","commit_stats":null,"previous_names":[],"tags_count":0,"template":true,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZipCodeCore%2FKafka3-Data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZipCodeCore%2FKafka3-Data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZipCodeCore%2FKafka3-Data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZipCodeCore%2FKafka3-Data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZipCodeCore","download_url":"https://codeload.github.com/ZipCodeCore/Kafka3-Data/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249153949,"owners_count":21221330,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-08T12:38:46.833Z","updated_at":"2025-04-15T21:15:32.200Z","avatar_url":"https://github.com/ZipCodeCore.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kafka3-Data\n\nbuild simple consumer (python)\n\nFork this repo, then clone from your new fork. \n\nto get k\n```bash\nbrew install kafka\nbrew install kafka-python\n```\n\n__Nota Bene 2022:__ _If you're running on a M1 Mac, take a look at the bottom of the page._\n\n## Running Kafka\n\nThere are a couple shell scripts for running `zookeeper` and `kafka`.\nYou need to run `zookeeper` first in a terminal by itself.\nThen run `kafka` in a different terminal.\n\nThen run the `Producers` and `consumers` each in their own terminals. (as needed)\n\n## ZipBank Project\n\nWe have decided to use Kafka as our main event handling infrastructure for our new bank, ZipBank.\n\nthis lab requires you to have a running kafka/zookeeper pair on your machine.\n\nKafka will take in transactions from various applications, and your job is to create the consumers needed to save all those transactions into a database.\n\nTo help, we've provided a test producer that creates random messages and send them into Kafka.\n\n(to do this lab, you will have needed to step through both of these to get kafka and zookeepder running.\n    - [Kafka on Mac1](https://yoda.zipcode.rocks/2020/04/20/kafka-on-mac/)\n    - [Kafka on Mac2](https://yoda.zipcode.rocks/2020/04/20/kafka-on-mac-2/)\n)\n\n### Phase 1\n\nbuild a kafka consumer in python to use SQLalchemy to store incoming transactions into a SQL Database. \nyou may use any SQL DB you're comfortable with. (Postgres, MySQL, RDS ... ?)\n\nWe have supplied a simple generation producer that creates random banking transactions:\n`./Kafka3-Data/phase1/producer-random-xactions.py`\nIt produces 20 transactions each time it is run, sending them to the `bank-customer-events` topic.\n(Down below, there is a shell line that can be used to create that topic within your running kafka.)\ntransaction Producer\n    a simple user by user producer that generates random deposits and withdrawals on random accounts.\n\na transaction looks like this in pseudocode:\n\n``` json\n{ custid: int, type: W/D, date: now, amt: int }\n```\n\nso a couple samples in python Dicts.\n\n``` json\n{ custid: 55, type: \"Dep\", date: 1587398219, amt: 10000 }\n{ custid: 55, type: \"Wth\", date: 1587398301, amt: 2500 }\n```\n\nwhich means:\ncustomer who's id is 55, Deposit, at time 1587398219, a total of $100.00\ncustomer who's id is 55, Withdraw, at time 1587398301, a total of $25.00\n\nthose dates are Unix Epoch second timestamps. [Unix Time](https://en.wikipedia.org/wiki/Unix_time)\n\nto create a new kafka topic (the one you need for the phase1 scripts to work. You only need to do this once.)\n\n``` bash\nkafka-topics --create \\\n--zookeeper localhost:2181 \\\n--replication-factor 1 \\\n--partitions 1 \\\n--topic bank-customer-events\n```\n\n### Your Phase 1 Mission\n\nthese scripts work. kinda. the problem is every time we re-start the consumer, we lose\nall the customer data. the reason is the coder doesn't know SQL like you do! so all the data gets\nput into in-memory data structures, but every time you restart the consumer script, they get emptied.\n\nyou need to use SQL alchemy to add to the Consumer in phase1 what's needed to save that transaction information into a \"transaction\" table in the SQL DB of your choice.\n\nyou probably need to create the \"database\" and the \"table\" within your Sql Database, and then\nconnect to it anytime someone creates a XactionConsumer() object. (so that modifying the __init__ method.)\n\nand that SQLAlchemy  might be something like\n\n``` python\nclass Transaction(Base):\n    __tablename__ = 'transaction'\n    # Here we define columns for the table person\n    # Notice that each column is also a normal Python instance attribute.\n    id = Column(Integer, primary_key=True)\n    custid = Column(Integer)\n    type = Column(String(250), nullable=False)\n    date = Column(Integer)\n    amt = Column(Integer)\n ```\n\n Read through the producer in phase1. See where it is generating random transaction sizes, and random on whether it's a deposit or withdrawal. (and random on what customer id is used for the transaction)\n\n## Phase 2\n\nBuild two \"analytical\" consumers. One, build a consumer that everytime it starts, produces an on-going statistical summary of all the transactions seen by the system. Two, build a \"limit\" watcher. This is a made-up idea, but the idea is watch for accounts that exceed a certain negative number, say -5000, and print an error message when that happens.\n\nUnless you need to, don't bother to store any of the output or state in the SQL DB, just keep it in memory.\n\n### SummaryConsumer\n\nSummaryConsumer should produce a list of outputs, the status of the\nmean (avg) deposits and mean withdrawals across all customers. You should also print the standard deviation of the distribution for both deposits and withdrawals.\nAs each transaction comes in, print a new status of the numerical summaries.\n\n### LimitConsumer\n\nLimitConsumer should keep track of the customer ids that have current balances greater or equal to the limit supplied to the constructor. The intro suggests -5000 for eaxmple, but you should be able set that with a parameter to the class' Constructor\n\n## Phase 3\n\nAdd multiple bank branches (locations) for the production of transactions.\n\nEach branch has a branch id, and a different partition in kafka. The consumers for each partition need to handle their branch's customer's transactions.\n\nThe branches also create new customers. Every so often, a create-customer event happens, and the consumer hooked up to that stream has to create a new customer in the database before any transactions get posted to that customer's account.\n\nthe topic is `bank-customer-new`\nthe SQLalchemy might look like\n\n``` python\nclass Customer(Base):\n    __tablename__ = 'transaction'\n    # Here we define columns for the table person\n    # Notice that each column is also a normal Python instance attribute.\n    custid = Column(Integer, primary_key=True)\n    createdate = Column(Integer)\n    fname = Column(String(250), nullable=False)\n    lname = Column(String(250), nullable=False)\n ```\n\na couple samples in python Dicts.\n\n``` json\n{ custid: 55, createdate: 1587398219, fname: 'Lisa' lname: 'Loopner' }\n{ custid: 56, createdate: 1587398301, fname: 'Todd' lname: 'Cushman' }\n```\n\n\n## For M1 Macs\n\nwhen using `brew`, you _might_ need to change the zookeeper and kafka start routines differently.\n\nFor `zookeeper`:\n\n```bash\n/opt/homebrew/opt/kafka/bin/zookeeper-server-start /opt/homebrew/etc/kafka/zookeeper.properties\n```\n\nAnd for the `kafka` process:\n\n```bash\n /opt/homebrew/opt/kafka/bin/kafka-server-start /opt/homebrew/etc/kafka/server.properties\n ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzipcodecore%2Fkafka3-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzipcodecore%2Fkafka3-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzipcodecore%2Fkafka3-data/lists"}