Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vmware/versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.
https://github.com/vmware/versatile-data-kit

analytics data data-engineer data-engineering data-engineering-pipeline data-lineage data-pipelines data-science data-structures data-warehouse database dataops elt etl pipeline python snowflake sql trino warehouse

Last synced: 2 days ago
JSON representation

One framework to develop, deploy and operate data workflows with Python and SQL.

Awesome Lists containing this project

README

        

![Versatile Data Kit](./support/images/versatile-data-kit.svg#gh-light-mode-only)
![Versatile Data Kit](./support/images/versatile-data-kit.svg#gh-dark-mode-only)





Last Activity


monthly download count for vdk-core


license


pre-commit


build status


twitter


YouTube Channel Subscribers

---


One framework toπŸ§‘β€πŸ’» Develop ▢️ Deploy and πŸ“Š Operate

data workflows with Python and SQL

---


🎯 Write shorter, more readable code.



πŸ”„ Ready-to-use data ETL/ELT patterns.



🧩 Lego-like extensibility.





πŸš€ Single click deployment.



πŸ›  Operate and monitor. ️

---


Intro to VDK SDK
Ingestion
Transformation
Job Deployment
Job Operations
Extensibility
Support and Contributing


Introduction to the VDK SDK




  • Framework to simplify data ingestion and data processing.

  • Write any code using Python or SQL.

  • A toolset enabling you to run data jobs.




Get started with VDK SDK:


➑ Install Quickstart VDK. Only requirement is Python 3.7+.

pip install quickstart-vdk

vdk --help

➑ Develop your First Data Job if you are impatient to start quickly.










Intro to VDK SDK
Ingestion
Transformation
Job Deployment
Job Operations
Extensibility
Support and Contributing


Data Ingestion




  • Extract data from various sources (HTTP APIs, Databases, CSV, etc.).

  • Ensure data fidelity with minimal transformations.

  • Load data to your preferred destination (database, cloud storage).




Ingestion examples:


➑ Ingesting data from REST API into Database

➑ Ingesting data from DB into Database

➑ Ingesting local CSV file into Database

➑ Incremental ingestion using Job Properties










Intro to VDK SDK
Ingestion
Transformation
Job Deployment
Job Operations
Extensibility
Support and Contributing

Data Transformation




  • SQL and Python parameterized transformations.

  • Extensible templates for data modeling.

  • Creates a dataset or table as a product.




Get started with transforming data:


➑ Data Modeling: Treating Data as a Product

➑ Processing data using SQL and local database

➑ Processing data using Kimball warehousing templates










Intro to VDK SDK
Ingestion
Transformation
Job Deployment
Job Operations
Extensibility
Support and Contributing

Data Job Deployment (build, deploy, release)



VDK Control Service provides REST API for users to create, deploy, manage, and execute data jobs in a Kubernetes runtime environment.


  • Scheduling, packaging, dependencies management, deployment.

  • Execution management and monitoring.

  • Source code versioning and tracking. Fast rollback.

  • Manage state and credentials using Properties and Secrets.




Get started with deploying jobs in control service:


➑ Install Local Control Service with vdk server --install

➑ Scheduling a Data Job for automatic execution

➑ Using VDK DAGs to orchestrate Data Jobs










Intro to VDK SDK
Ingestion
Transformation
Job Deployment
Job Operations
Extensibility
Support and Contributing

Operations and Monitoring




  • Use Operations UI to monitor, troubleshoot data workloads in production.

  • Notifications for errors during Data Job deployment or execution.

  • Route errors to correct people by classifying them into User or Platform errors.




Get started with operating and monitoring data jobs:


➑ Versatile Data Kit UI - Installation and Getting Started

➑ VDK Operations User Interface - Versatile Data Kit










Intro to VDK SDK
Ingestion
Transformation
Job Deployment
Job Operations
Extensibility
Support and Contributing


Lego like extensibility




  • Modular: use only what you need. Extensible: build what you miss.

  • Easy to install any plugins as python packages using pip.

  • Used in enhancing data processing, ingestion, job execution, command-line lifecycle




Get started with using some VDK plugins:


➑ Browse available plugins

➑ Interesting plugins to check out:

Β Β Β Β Β Β  Track Lineage of your jobs using vdk-lineage

Β Β Β Β Β Β  Import/Ingest or Export CSV files using vdk-csv

➑ Write your own plugin










Intro to VDK SDK
Ingestion
Transformation
Job Deployment
Job Operations
Extensibility
Support and Contributing

# Support and Contributing
For Support, you can join our Slack channel, create an [issue](https://github.com/vmware/versatile-data-kit/issues) or [pull request](https://github.com/vmware/versatile-data-kit/pulls) on GitHub to submit suggestions or changes.

If you are interested in contributing as a developer, visit the [contributing](https://github.com/vmware/versatile-data-kit/blob/main/CONTRIBUTING.md) page.

# Contacts
- Message us on Slack:

☝️ Join the [CNCF Slack workspace](https://communityinviter.com/apps/cloud-native/cncf).

✌️ Join the [#versatile-data-kit](https://cloud-native.slack.com/archives/C033PSLKCPR) channel.
- Join the [next Community Meeting](https://github.com/vmware/versatile-data-kit/wiki/Community-Meetings)
- Follow us on [Twitter](https://twitter.com/VDKProject).
- Subscribe to the [Versatile Data Kit YouTube Channel](https://www.youtube.com/channel/UCasf2Q7X8nF7S4VEmcTHJ0Q).
- Join our [development mailing list](mailto:[email protected]), used by developers and maintainers of VDK.

# Code of Conduct
Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels,
and mailing lists is expected to be familiar with and follow the [Code of Conduct](https://github.com/vmware/versatile-data-kit/blob/main/CODE_OF_CONDUCT.md).