{"id":15415025,"url":"https://github.com/voutilad/redpanda-pytorch-demo","last_synced_at":"2026-02-09T09:02:17.299Z","repository":{"id":255653064,"uuid":"852914188","full_name":"voutilad/redpanda-pytorch-demo","owner":"voutilad","description":"Putting a PyTorch ML model into production with Redpanda Connect","archived":false,"fork":false,"pushed_at":"2024-09-25T18:55:49.000Z","size":660,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-12-15T07:31:22.969Z","etag":null,"topics":["huggingface","ml","python","pytorch","redpanda","redpanda-connect","sentiment-analysis"],"latest_commit_sha":null,"homepage":"","language":"Dockerfile","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/voutilad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-05T16:41:04.000Z","updated_at":"2024-10-03T16:25:08.000Z","dependencies_parsed_at":"2024-10-08T21:40:33.678Z","dependency_job_id":null,"html_url":"https://github.com/voutilad/redpanda-pytorch-demo","commit_stats":{"total_commits":23,"total_committers":1,"mean_commits":23.0,"dds":0.0,"last_synced_commit":"dba473483bbf5a204c8e5a9f527ecb0853451473"},"previous_names":["voutilad/redpanda-mlops"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voutilad%2Fredpanda-pytorch-demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voutilad%2Fredpanda-pytorch-demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voutilad%2Fredpanda-pytorch-demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voutilad%2Fredpanda-pytorch-demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/voutilad","download_url":"https://codeload.github.com/voutilad/redpanda-pytorch-demo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230497295,"owners_count":18235472,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["huggingface","ml","python","pytorch","redpanda","redpanda-connect","sentiment-analysis"],"created_at":"2024-10-01T17:05:38.895Z","updated_at":"2026-02-09T09:02:12.262Z","avatar_url":"https://github.com/voutilad.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Putting ML to Work with Redpanda Connect and PyTorch\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./img/banner.jpeg\" height=\"35%\" width=\"60%\"\n    alt=\"A redpanda \u0026 a python exploring a cave while carrying a torch.\"\n    style=\"padding: 20px\"\n  \u003e\n\u003c/div\u003e\n\n\u003e Who's that little guy in the background? _No idea!_\n\nThis is an example of rapidly deploying a tuned classification model by using\nRedpanda Connect with Python. It leverages Python modules from\n[Hugging Face](https://huggingface.co) and [PyTorch](https://pytorch.org) with\na pre-tuned sentiment classifier for financial news derived from Meta's\n[RoBERTa base model](https://huggingface.co/FacebookAI/roberta-base).\n\nTwo examples are provided:\n\n  - an **API service** that provides an HTTP API for scoring content\n    while also caching and persisting classifier output in-memory and,\n    optionally, to a Redpanda topic for others to consume\n\n  - an **stream analytics pipeline** that takes data from one Redpanda\n    topic, classifies it, and routes output to a destination topic\n    while _reusing the same pipeline_ from the API approach\n\nThe model used is originally from Hugging Face user `mrm8448` and provides\na fine-tuned financial news implementation of Meta's RoBERTa\ntransformer-based language model:\n\nhttps://huggingface.co/mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis\n\n\u003e It's included as a git submodule, but if you're viewing this README\n\u003e via Github's web UI and trying to click the submodule link, they\n\u003e sadly don't support links out to non-Github submodules!\n\n\n## Requirements\n\nYou must have:\n\n- Python 3.12 (see the [Docker](#docker) section if you only have 3.11)\n- [`git lfs`](https://git-lfs.com)\n- Go 1.22 or so\n- Redpanda or [Redpanda Serverless](https://cloud.redpanda.com/sign-up/)\n- [`rpk`](https://docs.redpanda.com/current/get-started/rpk-install/)\n\nOptionally, you should have:\n\n- `jq` (optional)\n- Docker\n\n\n## Installation\n\nTwo methods are provided: local (recommended so you can follow all the\nexamples) and Docker-based. If you don't have a local copy of Python 3.12,\nyou should use the Docker approach.\n\n\n### Local Installation\n\nOn macOS or Linux distros, you can copy and paste these commands to\nget up and running quickly:\n\n1. Clone the project and its submodules:\n\n```sh\ngit clone https://github.com/voutilad/redpanda-pytorch-demo\n\ncd redpanda-pytorch-demo\n\ngit submodule update --init --recursive\n```\n\n2. Install a Python virtualenv and dependencies:\n\n```sh\npython3 -m venv venv\n\nsource venv/bin/activate\n\npip install -U pip\n\npip install -r requirements.txt\n```\n\n3. Build the Redpanda Connect w/ embedded Python fork:\n\n```sh\nCGO_ENABLED=0 go build -C rpcp\n```\n\n### Docker\n\nA provided [Dockerfile](./Dockerfile) makes it easy to package up the model,\nRedpanda Connect, and the Python environment. This is great if you don't have\nPython 3.12 locally (like on Debian 12 distros) or want to actually deploy\nthis thing somewhere in the cloud.\n\nUse `docker` to build our image and tag it as `redpanda-torch`:\n\n```sh\ndocker build . -t redpanda-torch\n```\n\nThe built image is pre-set to run the [HTTP server](#the-http-api-server), so\nyou just need to expose the TCP port:\n\n```sh\ndocker run --rm -it -p 8080:8080 redpanda-torch\n```\n\n\u003e In the walkthrough below, you'll use environment variables to configure\n\u003e runtime settings in Redpanda Connect. Just use the Docker conventions,\n\u003e setting them via `--env` or `-e`.\n\nFor running the streaming enrichment [example](#the-enrichment-pipeline),\noverride the command line args:\n\n```sh\ndocker run --rm -it -p 8080:8080 redpanda-torch \\\n  run -r python.yaml enrichment.yaml\n```\n\n\u003e Those yaml files are inside the Docker image, by the way.\n\n\n## Preparing Redpanda\n\nWe need a few topics created for our examples. Assuming you've installed\n[rpk](https://docs.redpanda.com/current/get-started/rpk-install/) and have\na profile that's authenticated to your Redpanda cluster, you can run the\nfollowing commands:\n\n```sh\nrpk topic create \\\n  news positive-news negative-news neutral-news unknown-news -p 5\n```\n\n\u003e If using Redpanda Serverless, you should be able to use `rpk auth login`\n\u003e to create your profile.\n\nIf using a Redpanda instance that requires authentication, such as Redpanda\nServerless, create a Kafka user and ACLs that allow the principal to both\nproduce and consume from the above topics as well as create a consumer group:\n\n```sh\nrpk security user create demo --password demo\n\nrpk security acl create \\\n  --allow-principal \"User:demo\" \\\n  --operation read,write,describe \\\n  --topic news,positive-news,negative-news,neutral-news,unknown-news \\\n  --group sentiment-analyzer\n```\n\n\u003e Feel free to use a different password!\n\n\n## The HTTP API Server\n\nThe HTTP API server example demonstrates some awesome features of Redpanda\nConnect:\n\n- Avoiding costly compute by **caching** results\n- Distributing data to multiple outputs via **fan out**\n- Providing **synchronous responses** to HTTP clients for an interactive API\n- Reusing Redpanda Components via **composable resources** to reduce code\n- Using runtime data inspection to **route based on ML output**\n\n\n### Running the HTTP Service\n\nThis example relies on environment variables for some runtime configuration.\nYou'll need to set a few depending on where you're running Redpanda:\n\n- `REDPANDA_BROKERS`: list of seed brokers (defaults to \"localhost:9092\")\n- `REDPANDA_TLS`: boolean flag for enabling TLS (defaults to \"false\")\n- `REDPANDA_SASL_USERNAME`: Redpanda Kafka API principal name (no default)\n- `REDPANDA_SASL_PASSWORD`: Redpanda Kafka API principal name (no default)\n- `REDPANDA_SASL_MECHANISM`: SASL mechanism to use (defaults to \"none\")\n- `REDPANDA_TOPIC`: Base name of the topics (defaults to \"news\")\n\nTo run in a mode that accepts HTTP POSTs of content to classify, use the\nprovided `http-server.yaml` and an HTTP client like `curl`.\n\n0. Set any of your environment variables to make things easier:\n\n```sh\nexport REDPANDA_BROKERS=tktktktktkt.any.us-east-1.mpx.prd.cloud.redpanda.com:9092\nexport REDPANDA_TLS=true\nexport REDPANDA_SASL_USERNAME=demo\nexport REDPANDA_SASL_PASSWORD=demo\nexport REDPANDA_SASL_MECHANISM=SCRAM-SHA-256\n```\n\n\u003e The above is a faux config for Redpanda Serverless and matches the details we\n\u003e created in [Preparing Redpanda](#preparing-redpanda) above.\n\n1. With your virtualenv active, start up the HTTP service:\n\n```sh\n./rpcp/rp-connect-python run -r python.yaml http-server.yaml\n```\n\n2. From another terminal, fire off a request with `curl` (and pipe to `jq`\n   if you have it):\n\n```sh\ncurl -s -X POST \\\n    -d \"The latest recall of Happy Fun Ball has sent ACME's stock plummeting.\" \\\n    'http://localhost:8080/sentiment' | jq\n```\n\nYou should get something like this in response:\n\n```json\n{\n  \"label\": \"negative\",\n  \"metadata\": {\n    \"cache_hit\": false,\n    \"sha1\": \"d7452c7cc882d1c690635cac92945e815947708d\"\n  },\n  \"score\": 0.9984525442123413,\n  \"text\": \"The latest recall of Happy Fun Ball has sent ACME's stock plummeting.\"\n}\n```\n\nOn the Redpanda side, you'll notice we don't get anything written to the\ntopics! The next section will go into more detail, but for now restart the\nservice with a new environment variable:\n\n```sh\nREDPANDA_OUTPUT_MODE=both ./rpcp/rp-connect-python \\\n  run -r python.yaml http-server.yaml\n```\n\nNow, submit the same data as before:\n\n```sh\ncurl -s -X POST \\\n    -d \"The latest recall of Happy Fun Ball has sent ACME's stock plummeting.\" \\\n    'http://localhost:8080/sentiment' | jq\n```\n\nYou should get the same JSON reply back. _So what's different?_\n\nUse `rpk` and consume from our topics:\n\n```sh\nrpk topic consume positive-news neutral-news negative-news --offset :end\n```\n\nYou should see a result from our `negative-news` topic:\n\n```json\n{\n  \"topic\": \"negative-news\",\n  \"key\": \"d7452c7cc882d1c690635cac92945e815947708d\",\n  \"value\": \"{\\\"label\\\":\\\"negative\\\",\\\"metadata\\\":{\\\"cache_hit\\\":false,\\\"sha1\\\":\\\"d7452c7cc882d1c690635cac92945e815947708d\\\"},\\\"score\\\":0.9984525442123413,\\\"text\\\":\\\"The latest recall of Happy Fun Ball has sent ACME's stock plummeting.\\\"}\",\n  \"timestamp\": 1725628216383,\n  \"partition\": 4,\n  \"offset\": 0\n}\n```\n\n\n### Under the Covers\n\nNow, for a guided walkthrough of how it works! This section breaks down how\nthe configuration in [http-server.yaml](./http-server.yaml) does what it\ndoes.\n\n\n#### Receiving HTTP POSTs\n\nThe pipeline starts off with an `http_server` input, which provides the API\nsurface area for interacting with clients:\n\n```yaml\ninput:\n  http_server:\n    address: ${HOST:127.0.0.1}:${PORT:8080}\n    path: /sentiment\n```\n\nThe `http_server` can do a lot more than this, including support TLS for\nsecure communication as well as support websocket connections. In this case,\nwe keep it simple: clients need to POST a body of text to the `/sentiment`\npath on our local web server.\n\nYou'l also notice the our first usage of _environment variable interpolation_.\nMore will be said about it in coming sections, but for now just view using\n`HOST` and `PORT` environment variables as a way to deviate from our default\nlisten address of `127.0.0.1` and port `8080`. (This is how the provided\n[Dockerfile](./Dockerfile) changes the default `HOST` to `0.0.0.0`.)\n\n\n#### Using Caching to Reduce Stress on the Model\n\nNext, we have an `memory_cache` resource. In some situations, you may want\n[other](https://docs.redpanda.com/redpanda-connect/components/caches/about/)\ncache backends, like Redis/Valkey, but this simply uses local memory.\n\nCaches are designed to be access from multiple components, so they start of\ndefined in a `cache_resources` list:\n\n```yaml\ncache_resources:\n  - label: memory_cache\n    memory:\n      default_ttl: 5m\n      compaction_interval: 60s\n```\n\nHere we're defining a single cache, called `memory_cache`. You can call it\n(almost) anything you want. We'll use the `label` to refer to the cache\ninstance.\n\n\n##### Cache Lookups\n\nIf we now look at the first stage in the pipeline, we'll see the first step\nis to utilize the cache for a lookup:\n\n```yaml\npipeline:\n  processors:\n    - cache:\n        resource: memory_cache\n        operator: get\n        key: '${!content().string().hash(\"sha1\").encode(\"hex\")}'\n```\n\nHere the `cache` `processor` uses our cache resource we defined, referenced\nby name/label.\n\nIt computes a key on the fly by decoding the content of the HTTP POST body\ninto a string and hashing it with the SHA-1 algorithm, all done via bloblang\n[interpolation](https://docs.redpanda.com/redpanda-connect/configuration/interpolation/#bloblang-queries).\nIf we have a hit, the message is replaced with the value from the cache.\n\nBut what about if we _don't_ have a cache hit?\n\n\n##### Cache Misses\n\nWe use a conditional `branch` stage to handle cache misses. It checks\nif the error flag is set by the previous stage (in this case, a cache miss\nresults in the error flag being set, so `errored()` evaluates to `true`). If\nwe've errored, we create a temporary message from the `content()` of the\nincoming message. Otherwise, we use `deleted()` to emit nothing.\n\n```yaml\n- branch:\n    request_map: |\n      # on error, we had a cache miss.\n      root = if errored() { content() } else { deleted() }\n    processors:\n      # these run only on the temporary messages from `request_map` evaluation\n      # ...\n```\n\n\u003e This can be a tad confusing at first. Essentially, you're defining/creating\n\u003e a temporary message to pass to a totally different pipeline of processors.\n\u003e In practice, this message will be based on the actual incoming message...\n\u003e but it doesn't have to be!\n\nThis temporary message is then passed into the inner `processors`.\n\nWe'll talk about updated the cache momentarily.\n\n\n#### Analyzing Sentiment with PyTorch / Hugging Face\n\nThe first inner processor is where our Python enrichment occurs. You'll notice\nit looks super boring!\n\n```yaml\n        processors:\n          - resource: python\n```\n\nIn this case, we're referencing a _processor resource_ that's defined\nelsewhere. In this case, it's the [python.yaml](./python.yaml) you\npassed with the `-r` argument to Redpanda Connect.\n\nIf you look in that file, you'll see a resource definition in a similar format\nto how our cache resource was defined. The important parts are repeated below:\n\n```yaml\npython:\n  script: |\n    from classifier import get_pipeline\n\n    device = environ.get(\"DEMO_PYTORCH_DEVICE\", \"cpu\")\n\n    text = content().decode()\n    pipeline = get_pipeline(device=device)\n    root.text = text\n\n    scores = pipeline(text)\n    if scores:\n      root.label = scores[0][\"label\"]\n      root.score = scores[0][\"score\"]\n    else:\n      root.label = \"unlabeled\"\n      root.score = 0.0\n```\n\nUsing the [Python integration](https://github.com/voutilad/rp-connect-python),\nwe can leverage PyTorch and Hugging Face tools in just a few lines of inline\ncode.\n\nThere's a runtime import of a local helper module [classifier](./classifier.py)\nthat wires up the pre-trained model and tokenizer.\n\n\u003e Q: What about GPUs? Does this work with GPUs?\n\u003e A: Yes. The code is defaulting right now to a \"cpu\" device, but you can\n\u003e    change the argument to `get_pipeline()` in the Python code and pass\n\u003e    an appropriate value that PyTorch can use. For instance, if you're\n\u003e    on macOS with Apple Silicon, use `\"mps\"`. See the\n\u003e    [torch.device](https://pytorch.org/docs/stable/tensor_attributes.html#torch.device)\n\u003e    docs for details on supported values. To do this in the demo, you\n\u003e    can set the environment variable `DEMO_PYTORCH_DEVICE` to the type\n\u003e    you want to use.\n\nFor more details on how Python integrates with Redpanda Connect, see the\nhttps://github.com/voutilad/rp-connect-python project on the nuances of\nthe bloblang-like features embedded in Python. It's beyond the scope of\nthis demonstration.\n\nAt this point, we've taken what was our boring message of just text and\ncreated a _structured message_ with multiple fields that looks like:\n\n```json\n{ \"text\": \"The original text!\", \"label\": \"positive\", \"score\": 0.999 }\n```\n\n#### Updating the Cache\nNow that we've done the computationally heavy part of applying the ML model,\nwe want to update the cache with the results so we don't have to repeat\nourselves for the same input.\n\nIn this case, we do it in a two step process for reasons we'll see later:\n\n```yaml\n          - mutation: |\n              # compute a sha1 hash as a key\n              root.metadata.sha1 = this.text.hash(\"sha1\").encode(\"hex\")\n          - cache:\n              resource: memory_cache\n              operator: set\n              key: '${!this.metadata.sha1}'\n              value: '${!content()}'\n```\n\nThe first step above is computing the sha-1 hash of the text we saved from\nthe original message. We tuck this in a nested field.\n\nThen, we have another instance of a `cache` processor that references the\n_same cache resource_ as before. (See how handy resources are?) In this\ncase, however, we're using a `set` operation _and_ providing the new\nvalue to store. The key to use is a simple bloblang interpolation\nthat points to our just-computed sha-1 hash.\n\nThe tricky thing is the value: we use `content()` to store the full\npayload of the message. It's not intuitive! The `cache` processor doesn't\nuse the message itself...you need to interpolate the message content into\na value to insert into the cache. Confusing!\n\n#### Rejoining from our Branch\nIf we had a cache miss, we're now at the end of our branch operation and\nwe need to convert that temporary message to something permanent. Did you\nforget we've been working with a _temporary_ messsage? I bet you did.\n\nThe tail end of the `branch` config tells the processor how to convert that\ntemporary message, if it exists, into a real message to pass onwards:\n\n```yaml\n        result_map: |\n          root = this\n          root.metadata.cache_hit = false\n```\n\nIn this case it's simple: we're copying `this` (the temporary message) to the\nnew message (i.e. `root`) and also setting a new nested field at the same time.\nIn this case, we mention we had a cache miss. This way we can see if we're\nactually hitting the cache or not so all your work won't be for naught.\n\n#### Last Stop before Output\nLastly, there's a trivial `mutation` step to set the nested `cache_hit` field\nif it doesn't exist. Pretty simple. If it's non-existent, then we never went\ndown the branch path...which means we must have had a cache hit:\n\n```yaml\n    - mutation: |\n        root.metadata.cache_hit = this.metadata.cache_hit | true\n```\n\n#### Getting Data to its Final Destination\nHere we use more resource magic to make the outputs toggle-able via the\nenvironment variable `DEMO_OUTPUT_MODE`. We start off with a trivial\n`output` definition that just references our resource:\n\n```yaml\noutput:\n  resource: ${DEMO_OUTPUT_MODE:http}\n```\n\nUsing interpolation, we pull the value from the environment. If it's\nnot defined, we default to `\"http\"` as the value.\n\nNow we can define our `output_resources`. You could put these in their own\nfile, but that's an exercise left to the reader.\n\nLet's take a look at them individually.\n\n\n##### HTTP Response\n\nSince this is an HTTP API, it's following what some call a _request/reply_\nprotocol. The client sends some data (via a POST, in this case) and expects\na response back. To do this, we use the `sync_response` component which\nwill do this automatically:\n\n```yaml\noutput_resources:\n  # Send the HTTP response back to the client.\n  - label: http\n    sync_response: {}\n```\n\n##### Sinking Data into Multiple Redpanda Topics\n\nOther applications might benefit from our work enriching this data, so let's\nput the data in Redpanda. We can make everyone's lives easier by sorting the\ndata based on the sentiment label: `positive`, `negative`, or (in the event\nof a failure) `neutral`. This is where our multiple topics comes into play!\n\n```yaml\n  # Send the data to Redpanda.\n  - label: redpanda\n    kafka_franz:\n      seed_brokers:\n        - ${REDPANDA_BROKERS:localhost}\n      topic: \"${!this.label | unknown}-${REDPANDA_TOPIC:news}\"\n      key: ${!this.metadata.sha1}\n      batching:\n        count: 1000\n        period: 5s\n      tls:\n        enabled: ${REDPANDA_TLS:false}\n      sasl:\n        - mechanism: ${REDPANDA_SASL_MECHANISM:none}\n          username: ${REDPANDA_SASL_USERNAME:}\n          password: ${REDPANDA_SASL_PASSWORD:}\n```\n\nYou can read the details on configuring the `kafka_franz` connector in the\n[docs](https://docs.redpanda.com/redpanda-connect/components/outputs/kafka_franz/)\nso I won't go into detail here. The important part is the `topic` configuration.\n\nYou should notice this is a combination of _bloblang and environment variable_\ninterpolation. This lets the output component programmatically define the\ntarget topic and lets us route messages.\n\nLastly, we're reusing that sha-1 hash as the key to demonstrate how that, too,\ncan be programmatic via interpolation.\n\n\n##### Why Not Both? Using Fan Out.\n\nLet's say we want to **both** reply to the client (to be helpful and polite) as\nwell as save the data in Redpanda for others. We can use a `broker` output\nthat lets us define the pattern of routing messages across multiple outputs.\n\nIn this case, we use `fan_out` to duplicate messages to all defined outputs.\n\nSince we already defined our two outputs above as part of our\n`output_resources`, this is super simple! We can just use `resource` outputs\nthat take a named resource by label so we don't have to repeat ourselves.\n\n```yaml\n  # Do both: send to Redpanda and reply to the client.\n  - label: both\n    broker:\n      pattern: fan_out\n      outputs:\n        - resource: http\n        - resource: redpanda\n```\n\n\u003e In this current configuration, using `both` will cause an initial cold-start\n\u003e latency spike as the connection to the Redpanda cluster is made while\n\u003e processing the first request. This will appear as a delay to the http client\n\u003e calling the service, but subsequent requests won't have this penalty.\n\n\n## The Data Enrichment Approach\n\nUsing what you learned above, we can easily build a _data enrichment pipeline_\nsourcing data from an input Redpanda topic, performing the same sentiment\nanalysis we configured in [python.yaml](./python.yaml), and route the output\nto different topics just like before.\n\nIn this case, we use both `kafka_franz` `input` _and_ `output`. Most\nimportantly, we can _reuse_ the same Python pipeline component as it's already\ndefined in a separate resource file.\n\nRunning this example is similar to the previous. Just change the pipeline file:\n\n```sh\n./rpcp/rp-connect-python run -r python.yaml enrichment.yaml\n```\n\nFor testing, you can produce data to your input topic using `rpk`:\n\n```sh\necho 'The Dow closed at a record high today on news that aliens are real' \\\n  | rpk topic produce news\n```\n\nAnd consume the output:\n\n\n```sh\nrpk topic consume \\\n  positive-news negative-news neutral-news unknown-news \\\n  --offset :end\n```\n\n\n### Sourcing Data\n\nThis is pretty simple using a `kafka_franz` input. You'll notice that the real\ndifference here is the `consumer_group` setting. This will let us properly\nscale up if needed and help with tracking committed offsets in the stream.\n\n```yaml\ninput:\n  kafka_franz:\n    seed_brokers:\n      - ${REDPANDA_BROKERS:localhost:9092}\n    topics:\n      - ${REDPANDA_TOPIC:news}\n    consumer_group: ${REDPANDA_CONSUMER_GROUP:sentiment-analyzer}\n    batching:\n      count: 1000\n      period: 5s\n    tls:\n      enabled: ${REDPANDA_TLS:false}\n    sasl:\n      - mechanism: ${REDPANDA_SASL_MECHANISM:none}\n        username: ${REDPANDA_SASL_USERNAME:}\n        password: ${REDPANDA_SASL_PASSWORD:}\n```\n\nIt's worth pointing out the `batching` section. The `python` component can\nprocess batches more efficiently than single messages, so it's recommended to\nbatch when you can.\n\n\n### The Enrichment Pipeline\n\nOur pipeline logic becomes trivial thanks to resources:\n\n```yaml\npipeline:\n  processors:\n    - resource: python\n```\n\nThat's it! It's _that_ easy.\n\n\n### Sinking Data\n\nWe use the same interpolation approaches as before with one exception. See\nif you can spot it:\n\n```yaml\noutput:\n  kafka_franz:\n    seed_brokers:\n      - ${REDPANDA_BROKERS:localhost}\n    topic: \"${!this.label | unknown}-${REDPANDA_TOPIC:news}\"\n    key: ${!meta(\"kafka_key\")}\n    batching:\n      count: 1000\n      period: 5s\n    tls:\n      enabled: ${REDPANDA_TLS:false}\n    sasl:\n      - mechanism: ${REDPANDA_SASL_MECHANISM:none}\n        username: ${REDPANDA_SASL_USERNAME:}\n        password: ${REDPANDA_SASL_PASSWORD:}\n```\n\nInstead of a sha-1 hash, which we don't really need or care about, we re-use\nthe original key (if any) from the incoming message. If data is produced to\nour input topic with a key, we'll re-use that key.\n\n\n## Wrapping Up\n\nHopefully this is helpful in both explaining the intricacies of Redpanda\nConnect end-to-end as well as illustrating a useful example of using a\nlow-code approach to building enrichment services and pipelines!\n\n\n## About the Banner Image\n\nThe cute Redpanda and Python exploring a cave was created by\n[Bing Image Creator](https://www.bing.com/images/create).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoutilad%2Fredpanda-pytorch-demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvoutilad%2Fredpanda-pytorch-demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoutilad%2Fredpanda-pytorch-demo/lists"}