{"id":25705218,"url":"https://github.com/aporia-ai/inferencedb","last_synced_at":"2025-04-30T09:46:42.821Z","repository":{"id":38288667,"uuid":"457485122","full_name":"aporia-ai/inferencedb","owner":"aporia-ai","description":"🚀 Stream inferences of real-time ML models in production to any data lake (Experimental)","archived":false,"fork":false,"pushed_at":"2022-06-10T15:42:34.000Z","size":257,"stargazers_count":77,"open_issues_count":0,"forks_count":2,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-03-26T12:30:11.021Z","etag":null,"topics":["kafka","machine-learning","mlops","model-monitoring","model-serving","s3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aporia-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-02-09T18:41:55.000Z","updated_at":"2024-01-04T17:05:36.000Z","dependencies_parsed_at":"2022-08-18T14:01:26.317Z","dependency_job_id":null,"html_url":"https://github.com/aporia-ai/inferencedb","commit_stats":null,"previous_names":[],"tags_count":62,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aporia-ai%2Finferencedb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aporia-ai%2Finferencedb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aporia-ai%2Finferencedb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aporia-ai%2Finferencedb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aporia-ai","download_url":"https://codeload.github.com/aporia-ai/inferencedb/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240619365,"owners_count":19830202,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["kafka","machine-learning","mlops","model-monitoring","model-serving","s3"],"created_at":"2025-02-25T06:38:41.696Z","updated_at":"2025-02-25T06:38:42.350Z","avatar_url":"https://github.com/aporia-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003cimg src=\"logo.svg\" width=\"400\" /\u003e\n\u003c/p\u003e\n\n---\n\n**InferenceDB** makes it easy to stream inferences of real-time ML models in production to a data lake, based on Kafka. This data can later be used for model retraining, data drift monitoring, performance degradation detection, AI incident investigation and more.\n\n### Quickstart\n\n* [Flask](https://github.com/aporia-ai/inferencedb/wiki/Flask-Quickstart) \n* [FastAPI](https://github.com/aporia-ai/inferencedb/wiki/FastAPI-Quickstart) \n* [KServe](https://github.com/aporia-ai/inferencedb/wiki/KServe-Quickstart) \n\n\n### Features\n\n* **Cloud Native** - Runs on top of Kubernetes and supports any cloud infrastructure\n* **Model Serving Integrations** - Connects to ML model serving tools like [KServe](https://kserve.github.io/website/)\n* **Extensible** - Add your own model serving frameworks and database destinations\n* **Horizontally Scalable** - Add more workers to support more models and more traffic \n* **Python Ecosystem** - Written in Python using [Faust](https://faust.readthedocs.io/en/latest/), so you can add your own data transformations using Numpy, Pandas, etc.\n\n\u003cp align=\"center\"\u003eMade with :heart: by \u003ca href=\"https://www.aporia.com?utm_source=github\u0026utm_medium=github\u0026utm_campaign=inferencedb\" target=\"_blank\"\u003eAporia\u003c/a\u003e\u003c/p\u003e\n\n**WARNING:** InferenceDB is still experimental, use at your own risk! 💀\n\n## Installation\n\nThe only requirement to InferenceDB is a Kafka cluster, with [Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html) and [Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html).\n\nTo install InferenceDB using Helm, run:\n\n```sh\nhelm install inferencedb inferencedb/inferencedb -n inferencedb --create-namespace \\\n  --set kafka.broker=kafka:9092 \\\n  --set kafka.schemaRegistryUrl=http://schema-registry:8081 \\\n  --set kafka.connectUrl=http://kafka-connect:8083\n```\n\n## Usage\n\nTo start logging your model inferences, create an **InferenceLogger** Kubernetes resource. This is a [Kubernetes Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) that is defined and controlled by InferenceDB.\n\n**Example:**\n\n```yaml\napiVersion: inferencedb.aporia.com/v1alpha1\nkind: InferenceLogger\nmetadata:\n  name: my-model-inference-logger\n  namespace: default\nspec:\n  topic: my-model\n  events:\n    type: kserve\n    config: {}\n  destination:\n    type: confluent-s3\n    config:\n      url: s3://my-bucket/inferencedb\n      format: parquet\n      awsRegion: us-east-2\n```\n\nThis InferenceLogger will watch the `my-model` Kafka topic for events in KServe format, and log them to a Parquet file on S3. See the [KServe quickstart guide](https://github.com/aporia-ai/inferencedb/wiki/KServe-Quickstart) for more details.\n\n## Development\n\nInferenceDB dev is done using [Skaffold](https://skaffold.dev/).\n\nMake sure you have a Kubernetes cluster with Kafka installed (can be local or remote), and edit [skaffold.yaml](skaffold.yaml) with the correct Kafka URLs and Docker image registry (for local, just use `local/inferencedb`).\n\nTo start development, run:\n\n    skaffold dev --trigger=manual\n    \nThis will build the Docker image, push it to the Docker registry you provided, and install the Helm chart on the cluster. Now, you can make changes to the code, click \"Enter\" on the Skaffold CLI and that would update the cluster.\n\n## Roadmap\n\n### Core\n\n* [ ] Add support for Spark Streaming in addition to Faust\n* [ ] Add more input validations on the Kafka URLs\n\n### Event Processors \n\n* [x] JSON\n* [x] KServe\n* [ ] Seldon Core\n* [ ] BentoML\n* [ ] MLFlow Deployments\n\n### Destinations\n\n* [x] Parquet on S3\n* [ ] HDF5 on S3\n* [ ] Azure Blob Storage\n* [ ] Google Cloud Storage\n* [ ] ADLS Gen2\n* [ ] AWS Glue\n* [ ] Delta Lake\n* [ ] PostgreSQL\n* [ ] Snowflake\n* [ ] Iceberg\n\n### Documentation\n\n* [ ] How to set up Kafka using AWS / Azure / GCP managed services\n* [ ] API Reference for the CRDs\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faporia-ai%2Finferencedb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faporia-ai%2Finferencedb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faporia-ai%2Finferencedb/lists"}