{"id":13627369,"url":"https://github.com/raptor-ml/raptor","last_synced_at":"2025-03-31T11:01:39.895Z","repository":{"id":43387220,"uuid":"478110812","full_name":"raptor-ml/raptor","owner":"raptor-ml","description":"Transform your pythonic research to an artifact that engineers can deploy easily.","archived":false,"fork":false,"pushed_at":"2025-03-21T08:16:23.000Z","size":4767,"stargazers_count":151,"open_issues_count":22,"forks_count":12,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-22T22:37:29.145Z","etag":null,"topics":["ai-infra","data-engineering","data-science","dataops","feature-engineering","feature-extraction","feature-platform","featurestore","kubeflow","kubernetes","machine-learning","ml","mlops","model-deployment","production","raptor","raptor-ml","reactive-ml"],"latest_commit_sha":null,"homepage":"https://raptor.ml","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raptor-ml.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-05T12:05:51.000Z","updated_at":"2025-03-04T05:32:37.000Z","dependencies_parsed_at":"2024-06-19T01:39:04.314Z","dependency_job_id":"c8507009-c261-4a62-9191-7b49bc92f29f","html_url":"https://github.com/raptor-ml/raptor","commit_stats":{"total_commits":286,"total_committers":6,"mean_commits":"47.666666666666664","dds":0.06293706293706292,"last_synced_commit":"3a7d93ff0768eb8272c6c35dfa79e30b425cd755"},"previous_names":["raptor-ml/natun","natun-ai/natun"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raptor-ml%2Fraptor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raptor-ml%2Fraptor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raptor-ml%2Fraptor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raptor-ml%2Fraptor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raptor-ml","download_url":"https://codeload.github.com/raptor-ml/raptor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246457967,"owners_count":20780676,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-infra","data-engineering","data-science","dataops","feature-engineering","feature-extraction","feature-platform","featurestore","kubeflow","kubernetes","machine-learning","ml","mlops","model-deployment","production","raptor","raptor-ml","reactive-ml"],"created_at":"2024-08-01T22:00:33.354Z","updated_at":"2025-03-31T11:01:39.866Z","avatar_url":"https://github.com/raptor-ml.png","language":"Go","funding_links":[],"categories":["Projects by main language","Go"],"sub_categories":["go"],"readme":"\u003cdiv id=\"top\"\u003e\u003c/div\u003e\n\n[![Go Report Card][go-report-card-shield]][go-report-card-url]\n[![Go Reference][godoc-shield]][godoc-url]\n[![E2E Tests][e2e-tests-shield]][e2e-tests-url]\n[![CII Best Practices][best-practices-shield]][best-practices-url]\n[![Forks][forks-shield]][forks-url]\n[![Stargazers][stars-shield]][stars-url]\n[![Issues][issues-shield]][issues-url]\n[![MIT License][license-shield]][license-url]\n[![LinkedIn][linkedin-shield]][linkedin-url]\n\u003c!-- [![Contributors][contributors-shield]][contributors-url] --\u003e\n\n\u003cbr /\u003e\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://raptor.ml\"\u003e\n        \u003cimg src=\".github/logo.svg\" alt=\"RaptorML - Production-ready feature engineering\" width=\"300\"\u003e\n    \u003c/a\u003e\n\n\u003ch3 align=\"center\"\u003e\n    \u003cp\u003eFrom notebook to production\u003c/p\u003e\n    Transform your data science to production-ready artifacts\n\u003c/h3\u003e\n\u003cbr /\u003e\n\nRaptor frees data scientists and ML engineers to build and deploy operational models and ML-driven functionality,\n*without learning backend* engineering.\n\nIt **compiles** your python research code to production artifacts, and takes care of the engineering concerns such as\nscalability and reliability using best-practices on Kubernetes.\n\n[**Explore the docs »**][docs-url]\n\n[**Getting started in 5 minutes »**][colab-url] · [Report a Bug][issues-url] · [Request a Feature][issues-url]\n\n\u003c/div\u003e\n\n[![RaptorML Screen Shot][product-screenshot]][colab-url]\n\n## 🧐 What is Raptor?\n\nRaptor frees data scientists and ML engineers to **focus on the data science and research work**, and build operational\nmodels and ML-driven functionality **without learning backend engineering**. Focus on what you're good at, increase your\nend-to-end velocity, and **close the gap between research and production**.\n\nWith Raptor, you can export your Python research code as **standard production artifacts**, and deploy them to\nKubernetes. Once they deployed, Raptor optimizes data processing and feature calculation for production, deploys models\nto Sagemaker or Docker containers, and connects to your production data sources, scaling, high availability, caching,\nmonitoring, and all other backend concerns.\n\n[![Colab][colab-button]][colab-url]\n\n## 😍 Why people *love* Raptor? and how does it change their lives?\n\n**Raptor is made by and for data scientists and ML engineers**. We know how hard it is to build and deploy models to be\nan integral part of your products, and we want to make it easier.\n\nBefore Raptor, data scientists had to work closely with backend engineers to build a \"production version\" of their work:\nconnect to data sources, transform their data with Flink/Spark or even Java, create APIs, dockerizing the model, handle\nscaling and high availability, and more.\n\n![High-level view of Raptor](.github/simplified-high-level.png)\n\nWith Raptor, data scientists can focus *only* on their research and model development, then export their work to\nproduction. Raptor takes care of the rest, including connecting to data sources, transforming the data, deploying and\nconnecting the model, etc. This means data scientists can focus on what they do best, and Raptor handles the rest.\n\n### ⭐️ Key Features\n\n* **Focus on _your_ work**: Raptor frees data scientists and ML engineers to focus on the model, without\n  learning backend engineering. Stop worrying about the engineering concerns, and focus on what you're good at.\n* **Eliminate serving/training skew**: You can use the same code for training and production to avoid training serving\n  skew.\n* **Real-time/on-demand**: Raptor optimizes feature calculations and predictions to be performed at the time of the\n  request.\n* **Seamless caching and storage**: Raptor uses an integrated caching system, and store your historical data for\n  training purposes. So you won't need any other data storage system such as \"Feature Store\".\n* **Turns data science work into production artifacts**: Raptor implements best-practice functionalities of Kubernetes\n  solutions, such as scaling, health, auto-recovery, monitoring, logging, and more.\n* **Integrates with R\u0026D team**: Raptor extends existing DevOps tools and infrastructure and allows you to connect your\n  ML research to the rest of your organization's R\u0026D ecosystem, utilizing tools such as CI/CD and monitoring.\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n## 🚀 Getting Started\n\nTo start, install [Raptor LabSDK](https://pypi.org/project/raptor-labsdk/). The LabSDK is a Python package that help\nyou develop models and features in notebooks or IDEs.\n\n```console\npip install raptor-labsdk\n```\n\n### ⚡ Quick Example\n\n```python\nimport pandas as pd\nfrom raptor import *\nfrom typing_extensions import TypedDict\n\n\n@data_source(\n    training_data=pd.read_csv(\n        'https://gist.githubusercontent.com/AlmogBaku/8be77c2236836177b8e54fa8217411f2/raw/hello_world_transactions.csv'),\n    production_config=StreamingConfig()\n)\nclass BankTransaction(TypedDict):\n    customer_id: str\n    amount: float\n    timestamp: str\n\n\n# Define features 🧪\n@feature(keys='customer_id', data_source=BankTransaction)\n@aggregation(function=AggregationFunction.Sum, over='10h', granularity='1h')\ndef total_spend(this_row: BankTransaction, ctx: Context) -\u003e float:\n    \"\"\"total spend by a customer in the last hour\"\"\"\n    return this_row['amount']\n\n\n@feature(keys='customer_id', data_source=BankTransaction)\n@freshness(max_age='5h', max_stale='1d')\ndef amount(this_row: BankTransaction, ctx: Context) -\u003e float:\n    \"\"\"total spend by a customer in the last hour\"\"\"\n    return this_row['amount']\n\n\n# Train the model 🤓\n@model(\n    keys='customer_id',\n    input_features=['total_spend+sum'],\n    input_labels=[amount],\n    model_framework='sklearn',\n    model_server='sagemaker-ack',\n)\n@freshness(max_age='1h', max_stale='100h')\ndef amount_prediction(ctx: TrainingContext):\n    from sklearn.linear_model import LinearRegression\n    df = ctx.features_and_labels()\n    trainer = LinearRegression()\n    trainer.fit(df[ctx.input_features], df[ctx.input_labels])\n    return trainer\n\n\namount_prediction.export()  # Export to production 🎉\n```\n\nThis will generate a bunch of artifacts in the `out` directory. The `out` directory also includes a `Makefile` that can\nbe used for integration in any CI/CD pipeline, or even invoked manually.\n\n[![Colab][colab-button-expand]][colab-url]\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n## 🥊 How does Raptor different than ___ ?\n\n### MLOps platforms (MLFlow, Kubeflow, Metaflow, Sagemaker, VertexAI, etc.)\n\nTraditional MLOps platforms are focused on managing the ML resources lifecycle and are not designed for building\noperational\nmodels and features. Raptor is designed for building operational models and features, and can be integrated with MLOps\nplatforms.\n\n### Feature Stores (Hopsworks, Feast, etc.)\n\nFeature store is a data storage system that stores pre-computed features for training and online purposes. That means\nyou need to orchestrate the pre-computation of the features, store them, connect them to your model, and write ad-hoc\nbackend code.\n\nRaptor takes a radically different approach. You focus on the model, and Raptor takes care of the rest. Raptor has a\nbuilt-in caching system that allows you to achieve similar results to a feature store but without the need to\norchestrate the data pipeline and the model deployment directly.\n\n### Model Servers (Sagemaker, BentoML, KServe, etc.)\n\nModel servers are designed for serving models in production. They are not designed for building models and features for\nproduction. In fact, Raptor integrates seamlessly with Model Servers(such as Sagemaker, BentoML, etc.) to serve your\nmodels.\n\n## 💡 How does it work?\n\nThe work with Raptor starts in your research phase in your notebook or IDE. Raptor allows you to write your ML work in a\ntranslatable way for production purposes.\n\nModels and Features in Raptor are composed of a declarative part(via Python's decorators) and a function code. This\nway, Raptor can translate the heavy-lifting engineering concerns(such as aggregations or caching) by implementing the\n\"declarative part\", and optimizing the implementation for production.\n\n![Features are composed from a declarative part and a function code][feature-py-def]\n\nAfter you are satisfied with your research results, \"export\" these definitions, and deploy it to Kubernetes using\nstandard tools; Once deployed, Raptor Core(the server-side part) is extending Kubernetes with the ability to implement\nthem. It takes care of the engineering concerns by managing and controlling Kubernetes-native resources such as\ndeployments to connect your production data sources and run your business logic at scale.\n\nYou can read more about Raptor's architecture in [the docs][docs-url].\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n## ⎈ Production Installation\n\n**Raptor installation is not required for training purposes**.\nYou only need to install Raptor *when deploying to production* (or staging).\n\nLearn more about production installation at [the docs][docs-url].\n\n### 🏗️ Prerequisites\n\n1. Kubernetes cluster (including EKS, GKE, etc.)\n2. Redis server (\u003e 2.8.9)\n3. Optional: Snowflake or S3 bucket (to record historical data for retraining purposes)\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\n\n\u003c!-- ROADMAP --\u003e\n\n## 🏔 Roadmap\n\n- [ ] S3 historical storage plugins\n    - [x] S3 storing\n    - [ ] S3 fetching data - Spark\n- [ ] Deploy models to model servers\n    - [x] Sagemaker ACK\n    - [ ] VertexAI\n    - [ ] Seldon\n    - [ ] Kubeflow\n    - [ ] KFServing\n    - [ ] Standalone\n- [ ] Large-scale training\n- [ ] Support more data sources\n    - [x] Kafka\n    - [x] GCP Pub/Sub\n    - [x] Rest\n    - [ ] Snowflake\n    - [ ] BigQuery\n    - [ ] gRPC\n    - [ ] Redis\n    - [ ] Postgres\n    - [ ] GraphQL\n\nSee the [open issues](issues-url]) for a full list of proposed features (and known issues)\n.\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\n\n\u003c!-- CONTRIBUTING --\u003e\n\n## 👷‍ Contributing\n\nContributions make the open-source community a fantastic place to learn, inspire, and create. Any contributions you make\nare **greatly appreciated** (not only code! but also documenting, blogging, or giving us feedback) 😍.\n\nPlease fork the repo and create a pull request if you have a suggestion. You can also simply open an issue and choose \"\nFeature Request\" to give us some feedback.\n\n**Don't forget to give the project [a star](#top)! ⭐️**\n\nFor more information about contributing code to the project, read the [`CONTRIBUTING.md`](./CONTRIBUTING.md) file.\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\n\n\u003c!-- LICENSE --\u003e\n\n## 📃 License\n\nDistributed under the Apache2 License. Read the `LICENSE` file for more information.\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n## 👫 Joining the community\n\nYou can join the Raptor community on [Slack][community], follow us\non [Twitter][twitter], and participate in the issues and pull requests.\n\n**Don't forget to give the project [a star](#top)! ⭐️**\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n[godoc-shield]: https://pkg.go.dev/badge/github.com/raptor-ml/raptor.svg\n\n[godoc-url]: https://pkg.go.dev/github.com/raptor-ml/raptor\n\n[contributors-shield]: https://img.shields.io/github/contributors/raptor-ml/raptor.svg?style=flat\n\n[contributors-url]: https://github.com/raptor-ml/raptor/graphs/contributors\n\n[forks-shield]: https://img.shields.io/github/forks/raptor-ml/raptor.svg?style=flat\n\n[forks-url]: https://github.com/raptor-ml/raptor/network/members\n\n[stars-shield]: https://img.shields.io/github/stars/raptor-ml/raptor.svg?style=flat\n\n[stars-url]: https://github.com/raptor-ml/raptor/stargazers\n\n[issues-shield]: https://img.shields.io/github/issues/raptor-ml/raptor.svg?style=flat\n\n[issues-url]: https://github.com/raptor-ml/raptor/issues\n\n[e2e-tests-shield]: https://img.shields.io/github/actions/workflow/status/raptor-ml/raptor/test-e2e.yml?label=tests\n\n[e2e-tests-url]: https://github.com/raptor-ml/raptor/actions/workflows/test-e2e.yml\n\n[license-shield]: https://img.shields.io/github/license/raptor-ml/raptor.svg?style=flat\n\n[license-url]: https://github.com/raptor-ml/raptor/blob/master/LICENSE.txt\n\n[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat\u0026logo=linkedin\u0026colorB=555\n\n[linkedin-url]: https://linkedin.com/company/raptor-ml\n\n[go-report-card-shield]: https://goreportcard.com/badge/github.com/raptor-ml/raptor\n\n[go-report-card-url]: https://goreportcard.com/report/github.com/raptor-ml/raptor\n\n[best-practices-shield]: https://bestpractices.coreinfrastructure.org/projects/6406/badge\n\n[best-practices-url]: https://bestpractices.coreinfrastructure.org/projects/6406\n\n[colab-button]: https://img.shields.io/badge/-Getting_started_with_Colab-blue?style=for-the-badge\u0026logo=googlecolab\n\n[colab-button-expand]: https://img.shields.io/badge/-see_advanced_example_notebook-blue?style=for-the-badge\u0026logo=googlecolab\n\n[colab-url]: https://colab.research.google.com/github/raptor-ml/docs/blob/master/docs/docs/getting-started.ipynb\n\n[docs-url]: https://raptor.ml/\n\n[product-screenshot]: .github/demo.gif\n\n[feature-py-def]: .github/feature-py-def.png\n\n[community]: https://raptor.ml/docs/community\n\n[twitter]: https://twitter.com/RaptorML\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraptor-ml%2Fraptor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraptor-ml%2Fraptor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraptor-ml%2Fraptor/lists"}