{"id":16597804,"url":"https://github.com/san089/optimizing-public-transportation","last_synced_at":"2025-06-11T17:04:21.114Z","repository":{"id":113013343,"uuid":"232393331","full_name":"san089/Optimizing-Public-Transportation","owner":"san089","description":"A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.","archived":false,"fork":false,"pushed_at":"2023-08-14T22:08:50.000Z","size":508,"stargazers_count":30,"open_issues_count":3,"forks_count":14,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-03T21:24:00.309Z","etag":null,"topics":["faust","kafka-api","kafka-application","kafka-cluster","kafka-connect","kafka-consumer","kafka-producer","kafka-schema-registry","kafka-sql","kafka-streams","kafka-topic"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/san089.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-01-07T18:48:09.000Z","updated_at":"2025-03-23T13:37:26.000Z","dependencies_parsed_at":"2023-06-05T13:39:27.279Z","dependency_job_id":"8618df9a-a3f4-4cc4-836a-c15b2c834c95","html_url":"https://github.com/san089/Optimizing-Public-Transportation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/san089/Optimizing-Public-Transportation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/san089%2FOptimizing-Public-Transportation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/san089%2FOptimizing-Public-Transportation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/san089%2FOptimizing-Public-Transportation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/san089%2FOptimizing-Public-Transportation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/san089","download_url":"https://codeload.github.com/san089/Optimizing-Public-Transportation/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/san089%2FOptimizing-Public-Transportation/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259301787,"owners_count":22837003,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["faust","kafka-api","kafka-application","kafka-cluster","kafka-connect","kafka-consumer","kafka-producer","kafka-schema-registry","kafka-sql","kafka-streams","kafka-topic"],"created_at":"2024-10-12T00:06:39.080Z","updated_at":"2025-06-11T17:04:21.093Z","avatar_url":"https://github.com/san089.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Optimizing-Public-Transportation\n\n## Architecture \n![Architecture](https://github.com/san089/Optimizing-Public-Transportation/blob/master/docs/architecture.png)\n\n#### Overview\nIn this project we construct a streaming event pipeline around Apache Kafka and its ecosystem. Using public dataset from [Chicago Transit Authority](https://www.transitchicago.com/data/) we constructed an event pipeline around Kakfa that allows to simulate and display status of train in real time.\n\n**Arrival and Turnstiles -\u003e** Producers that create train arrival and turnstile information into our kafka cluster. Arrivals indicate that a train has arrived at a particular station while the turnstile event indicate a passanger has entered the station. \n\n**Weather -\u003e** A REST Proxy prodcer that periodically emits weather data by a REST Proxy and emits that to the kafka cluster.\n\n**Postgres SQL and Kafka Connect -\u003e** Extract data from stations and push it to kafka cluster. \n\n**Kafka status server -\u003e** Consumes data from kafka topics and display on the UI.\n\n![Results](https://github.com/san089/Optimizing-Public-Transportation/blob/master/docs/results.png)\n\n### Environment \n-   Docker (I used bitnami kafka image available [here](https://hub.docker.com/r/bitnami/kafka)\n-   Python 3.7\n\n### Running and Testing\nFirst make sure all the service are up and running: \nFor docker use:\n\n    docker-compose up\nDocker-Compose will take 3-5 minutes to start, depending on your hardware.\nOnce Docker-Compose is ready, make sure the services are running by connecting to them using DOCKER URL provided below:\n\n![](https://github.com/san089/Optimizing-Public-Transportation/blob/master/docs/services.png)\n\nAlso, you need to install requirements as well, use below command to create a virtual environment and install requirements:\n1.  `cd producers`\n2.  `virtualenv venv`\n3.  `. venv/bin/activate`\n4.  `pip install -r requirements.txt`\n\nSame for the consumers, setup environment as below:\n1.  `cd consumers`\n2.  `virtualenv venv`\n3.  `. venv/bin/activate`\n4.  `pip install -r requirements.txt`\n\n#### Running Simulation\nRun the producers using simulation.py in producers folder:\n\n    python simulation.py\n\nRun the Faust Stream Processing Application:\n\n    cd consumers\n    faust -A faust_stream worker -l info\n\nRun KSQL consumer as below:\n\n    cd consumers\n    python ksql.py\n\nTo run consumer server: \n\n    cd consumers\n    python server.py\n\n\n\n### Resources\n[Confluent Python Client Documentation](https://docs.confluent.io/current/clients/confluent-kafka-python/#) \u003cbr/\u003e\n[Confluent Python Client Usage and Examples](https://github.com/confluentinc/confluent-kafka-python#usage) \u003cbr/\u003e\n[REST Proxy API Reference](https://docs.confluent.io/current/kafka-rest) \u003cbr/\u003e\n[Kafka Connect JDBC Source Connector Configuration Options](https://docs.confluent.io/current/connect/kafka-connect-jdbc/source-connector/source_config_options.html) \u003cbr/\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsan089%2Foptimizing-public-transportation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsan089%2Foptimizing-public-transportation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsan089%2Foptimizing-public-transportation/lists"}