{"id":13807570,"url":"https://github.com/quixio/quix-streams","last_synced_at":"2025-05-13T23:10:45.055Z","repository":{"id":65487960,"uuid":"567454609","full_name":"quixio/quix-streams","owner":"quixio","description":"Python Streaming DataFrames for Kafka","archived":false,"fork":false,"pushed_at":"2025-05-07T15:04:10.000Z","size":8900,"stargazers_count":1364,"open_issues_count":17,"forks_count":76,"subscribers_count":18,"default_branch":"main","last_synced_at":"2025-05-07T15:24:02.834Z","etag":null,"topics":["data-engineering","data-intensive-applications","data-science","event-driven-architecture","kafka","machine-learning","python","real-time-data-processing","stream-processing","stream-processor","streaming-data","streaming-data-pipelines","streaming-data-processing","time-series-data"],"latest_commit_sha":null,"homepage":"https://docs.quix.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/quixio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-11-17T20:36:03.000Z","updated_at":"2025-05-07T15:03:51.000Z","dependencies_parsed_at":"2023-09-23T08:26:32.617Z","dependency_job_id":"f0f996ba-4664-4a8d-bb53-8b8b58a1b9ea","html_url":"https://github.com/quixio/quix-streams","commit_stats":{"total_commits":78,"total_committers":9,"mean_commits":8.666666666666666,"dds":0.7948717948717949,"last_synced_commit":"feb98c9950e4a7a912e2b4408955b5bd076368a8"},"previous_names":[],"tags_count":69,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quixio%2Fquix-streams","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quixio%2Fquix-streams/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quixio%2Fquix-streams/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quixio%2Fquix-streams/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/quixio","download_url":"https://codeload.github.com/quixio/quix-streams/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254042330,"owners_count":22004901,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","data-intensive-applications","data-science","event-driven-architecture","kafka","machine-learning","python","real-time-data-processing","stream-processing","stream-processor","streaming-data","streaming-data-pipelines","streaming-data-processing","time-series-data"],"created_at":"2024-08-04T01:01:27.013Z","updated_at":"2025-05-13T23:10:40.040Z","avatar_url":"https://github.com/quixio.png","language":"Python","funding_links":[],"categories":["Table of Contents","Python","Stream Processing"],"sub_categories":["Streaming Library","Python Libraries"],"readme":"![Quix - React to data, fast](./images/quixstreams-banner.png)\n\n [![GitHub Version](https://img.shields.io/github/tag-pre/quixio/quix-streams.svg?label=Version\u0026color=008dff)](https://github.com/quixio/quix-streams/releases)\n![PyPI License](https://img.shields.io/pypi/l/quixstreams?label=Licence\u0026color=008dff)\n[![Docs](https://img.shields.io/badge/docs-quix.io-0345b2?label=Docs\u0026color=008dff)](https://quix.io/docs/quix-streams/introduction.html) \\\n[![Community Slack](https://img.shields.io/badge/Community%20Slack-blueviolet?logo=slack)](https://quix.io/slack-invite)\n[![YouTube](https://img.shields.io/badge/-YouTube-FF0000?logo=youtube)](https://www.youtube.com/@QuixStreams)\n[![LinkedIn](https://img.shields.io/badge/LinkedIn-0A66C2.svg?logo=linkedin)](https://www.linkedin.com/company/70925173/)\n[![X](https://img.shields.io/twitter/url?label=X\u0026style=social\u0026url=https%3A%2F%2Ftwitter.com%2Fquix_io)](https://twitter.com/quix_io)\n\n# Open source Python framework for reliable data engineering\n\nQuix Streams is an end-to-end framework for real-time Python data engineering, operational analytics and machine learning on Apache Kafka data streams. Extract, transform and load data reliably in fewer lines of code using your favourite Python libraries.\n\nBuild data pipelines and event-driven microservice architectures leveraging Kafka's low-level scalability, resiliency and durability features in a lightweight library without server-side clusters to manage.\n\nQuix Streams provides the following features to make your life easier:\n- Pure Python, meaning no wrappers around Java and no cross-language debugging.\n- Sources \u0026 Sinks API for building custom connectors that integrate data with Kafka.\n- Streaming DataFrame API for building tabular data processing pipelines.\n- Serializers API supporting JSON, Avro, Protobuf \u0026 Schema Registry.\n- State API with built-in RocksDB state object for stateful processing.\n- Application API for managing the Kafka-related setup, teardown and message lifecycle.\n- Operators for common processing tasks like Windowing, Branching, Group By and Reduce.\n- Exactly-once processing guarantees via Kafka transactions.\n\nUse Quix Streams to build simple Kafka producer/consumer applications or leverage stream processing to build complex event-driven systems, real-time data pipelines and AI/ML products.\n\n## Getting Started 🏄\n\n### Install Quix Streams\n\n```shell\n# PyPI\npython -m pip install quixstreams\n\n# or conda\nconda install -c conda-forge quixio::quixstreams\n```\n\n#### Requirements\nPython 3.9+, Apache Kafka 0.10+\n\nSee [requirements.txt](https://github.com/quixio/quix-streams/blob/main/requirements.txt) for the full list of requirements\n\n### Documentation\n[Quix Streams Docs](https://quix.io/docs/quix-streams/introduction.html)\n\n### Example\n\nHere's an example of how to \u003cb\u003eprocess\u003c/b\u003e data from a Kafka Topic with Quix Streams:\n\n```python\nfrom quixstreams import Application\n\n# A minimal application reading temperature data in Celsius from the Kafka topic,\n# converting it to Fahrenheit and producing alerts to another topic.\n\n# Define an application that will connect to Kafka\napp = Application(\n    broker_address=\"localhost:9092\",  # Kafka broker address\n)\n\n# Define the Kafka topics\ntemperature_topic = app.topic(\"temperature-celsius\", value_deserializer=\"json\")\nalerts_topic = app.topic(\"temperature-alerts\", value_serializer=\"json\")\n\n# Create a Streaming DataFrame connected to the input Kafka topic\nsdf = app.dataframe(topic=temperature_topic)\n\n# Convert temperature to Fahrenheit by transforming the input message (with an anonymous or user-defined function)\nsdf = sdf.apply(lambda value: {\"temperature_F\": (value[\"temperature\"] * 9/5) + 32})\n\n# Filter values above the threshold\nsdf = sdf[sdf[\"temperature_F\"] \u003e 150]\n\n# Produce alerts to the output topic\nsdf = sdf.to_topic(alerts_topic)\n\n# Run the streaming application (app automatically tracks the sdf!)\napp.run()\n```\n\n### Tutorials\n\nTo see Quix Streams in action, check out the Quickstart and Tutorials in the docs: \n\n- [**Quickstart**](https://quix.io/docs/quix-streams/quickstart.html)\n- [**Tutorial - Word Count**](https://quix.io/docs/quix-streams/tutorials/word-count/tutorial.html)\n- [**Tutorial - Anomaly Detection**](https://quix.io/docs/quix-streams/tutorials/anomaly-detection/tutorial.html)\n- [**Tutorial - Purchase Filtering**](https://quix.io/docs/quix-streams/tutorials/purchase-filtering/tutorial.html)\n\n\n### Key Concepts\nThere are two primary objects:\n- `StreamingDataFrame` - a predefined declarative pipeline to process and transform incoming messages.\n- `Application` - to manage the Kafka-related setup, teardown and message lifecycle (consuming, committing). It processes each message with the dataframe you provide for it to run.\n\nUnder the hood, the `Application` will:\n- Consume and deserialize messages.\n- Process them with your `StreamingDataFrame`.\n- Produce it to the output topic.\n- Automatically checkpoint processed messages and state for resiliency.\n- Scale using Kafka's built-in consumer groups mechanism.\n\n\n### Deployment\nYou can run Quix Streams pipelines anywhere Python is installed.\n\nDeploy to your own infrastructure or to [Quix Cloud](https://quix.io/product) on AWS, Azure, GCP or on-premise for a fully managed platform.  \nYou'll get self-service DevOps, CI/CD and monitoring, all built with best in class engineering practices learned from Formula 1 Racing.\n\nPlease see the [**Connecting to Quix Cloud**](https://quix.io/docs/quix-streams/quix-platform.html) page \nto learn how to use Quix Streams and Quix Cloud together.\n\n## Roadmap 📍\n\nThis library is being actively developed by a full-time team.\n\nHere are some of the planned improvements:\n\n- [x] [Windowed aggregations over Tumbling \u0026 Hopping windows](https://quix.io/docs/quix-streams/windowing.html)\n- [x] [Stateful operations and recovery based on Kafka changelog topics](https://quix.io/docs/quix-streams/advanced/stateful-processing.html)\n- [x] [Group-by operation](https://quix.io/docs/quix-streams/groupby.html)\n- [x] [\"Exactly Once\" delivery guarantees for Kafka message processing (AKA transactions)](https://quix.io/docs/quix-streams/configuration.html#processing-guarantees)\n- [x] Support for [Avro](https://quix.io/docs/quix-streams/advanced/serialization.html#avro) and [Protobuf](https://quix.io/docs/quix-streams/advanced/serialization.html#protobuf) formats\n- [x] [Schema Registry support](https://quix.io/docs/quix-streams/advanced/schema-registry.html)\n- [x] [Windowed aggregations over Sliding windows](https://quix.io/docs/quix-streams/windowing.html)\n- [ ] Joins\n\nFor a more detailed overview of the planned features, please look at [the Roadmap Board](https://github.com/orgs/quixio/projects/1).\n\n## Get Involved 🤝\n\n- Please use [GitHub issues](https://github.com/quixio/quix-streams/issues) to report bugs and suggest new features.\n- Join the [Quix Community on Slack](https://quix.io/slack-invite), a vibrant group of Kafka Python developers, data engineers and newcomers to Apache Kafka, who are learning and leveraging Quix Streams for real-time data processing.\n- Watch and subscribe to [@QuixStreams on YouTube](https://www.youtube.com/@QuixStreams) for code-along tutorials from scratch and interesting community highlights.\n- Follow us on [X](https://x.com/Quix_io) and [LinkedIn](https://www.linkedin.com/company/70925173) where we share our latest tutorials, forthcoming community events and the occasional meme.\n- If you have any questions or feedback - write to us at support@quix.io!\n\n\n## License 📗\n\nQuix Streams is licensed under the Apache 2.0 license.  \nView a copy of the License file [here](https://github.com/quixio/quix-streams/blob/main/LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquixio%2Fquix-streams","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquixio%2Fquix-streams","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquixio%2Fquix-streams/lists"}