{"id":15792505,"url":"https://github.com/guiferviz/tuberia","last_synced_at":"2026-03-08T22:32:18.980Z","repository":{"id":62388517,"uuid":"504828271","full_name":"guiferviz/tuberia","owner":"guiferviz","description":"Data engineering meets software engineering","archived":false,"fork":false,"pushed_at":"2023-01-12T23:31:54.000Z","size":1137,"stargazers_count":3,"open_issues_count":4,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-09-24T23:28:17.953Z","etag":null,"topics":["data","data-engineering","expectations","pipeline","python","spark"],"latest_commit_sha":null,"homepage":"https://guiferviz.com/tuberia/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guiferviz.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-18T11:44:53.000Z","updated_at":"2022-12-15T00:22:51.000Z","dependencies_parsed_at":"2023-02-09T13:46:27.571Z","dependency_job_id":null,"html_url":"https://github.com/guiferviz/tuberia","commit_stats":null,"previous_names":["aidictive/tuberia"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/guiferviz/tuberia","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiferviz%2Ftuberia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiferviz%2Ftuberia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiferviz%2Ftuberia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiferviz%2Ftuberia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guiferviz","download_url":"https://codeload.github.com/guiferviz/tuberia/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiferviz%2Ftuberia/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30275540,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-08T20:45:49.896Z","status":"ssl_error","status_checked_at":"2026-03-08T20:45:49.525Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-engineering","expectations","pipeline","python","spark"],"created_at":"2024-10-04T23:01:32.715Z","updated_at":"2026-03-08T22:32:18.962Z","avatar_url":"https://github.com/guiferviz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://aidictive.github.io/tuberia\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://aidictive.github.io/tuberia/images/logo.png\"\n             alt=\"Tuberia logo\"\n             width=\"800\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/AIdictive/tuberia/actions/workflows/cicd.yaml\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://github.com/aidictive/tuberia/actions/workflows/cicd.yaml/badge.svg\"\n             alt=\"Tuberia CI pipeline status\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://app.codecov.io/gh/AIdictive/tuberia/\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://img.shields.io/codecov/c/github/aidictive/tuberia\"\n             alt=\"Tuberia coverage status\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/AIdictive/tuberia/issues\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://img.shields.io/github/issues/AIdictive/tuberia\"\n             alt=\"Tuberia issues\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/aidictive/tuberia/graphs/contributors\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://img.shields.io/github/contributors/AIdictive/tuberia\"\n             alt=\"Tuberia contributors\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pypi.org/project/tuberia/\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://pepy.tech/badge/tuberia\"\n             alt=\"Tuberia total downloads\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pypi.org/project/tuberia/\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://pepy.tech/badge/tuberia/month\"\n             alt=\"Tuberia downloads per month\"\u003e\n    \u003c/a\u003e\n    \u003cbr /\u003e\n    Data engineering meets software engineering\n\u003c/p\u003e\n\n---\n\n:books: **Documentation**:\n\u003ca href=\"https://aidictive.github.io/tuberia\" target=\"_blank\"\u003e\n    https://aidictive.github.io/tuberia\n\u003c/a\u003e\n\n:keyboard: **Source Code**:\n\u003ca href=\"https://github.com/aidictive/tuberia\" target=\"_blank\"\u003e\n    https://github.com/aidictive/tuberia\n\u003c/a\u003e\n\n---\n\n\n## 🤔 What is this?\n\nTuberia is born from the need to bring the worlds of data and software\nengineering closer together. Here is a list of common problems in data\nprojects:\n\n* Loooooong SQL queries impossible to understand/test.\n* A lot of duplicate code due to the difficulty of reusing it in SQL queries.\n* Lack of tests, sometimes because the used framework does not facilitate\ntesting tasks.\n* Lack of documentation.\n* Discrepancies between the existing documentation and the latest deployed code.\n* A set of notebooks deployed under the Databricks Share folder.\n* A generic notebook with utility functions.\n* Use of drag-and-drop frameworks that limit the developer's creativity.\n* Months of intense work to migrate existing pipelines from one orchestrator to\nanother (e.g. from Airflow to Prefect, from Databricks Jobs to Data\nFactory...).\n\nTuberia aims to solve all these problems and many others. \n\n\n## 🤓 How it works?\n\nYou can view Tuberia as if it were a compiler. Instead of compiling a\nprogramming language, it compiles the steps necessary for your data pipeline to\nrun successfully.\n\nTuberia is not an orchestrator, but it allows you to run the code you write in\nPython in any existing orchestrator: Airflow, Prefect, Databricks Jobs, Data\nFactory....\n\nTuberia provides some abstraction of where the code is executed, but defines\nvery well what are the necessary steps to execute it. For example, this shows\nhow to create a PySpark DataFrame from the `range` function and creates a Delta\ntable.\n\n```python\nimport pyspark.sql.functions as F\n\nfrom tuberia import PySparkTable, run\n\n\nclass Range(PySparkTable):\n    \"\"\"Table with numbers from 1 to `n`.\n\n    Attribute:\n        n: Max number in table.\n\n    \"\"\"\n    n: int = 10\n\n    def df(self):\n        return self.spark.range(self.n).withColumn(\"id\", F.col(self.schema.id)\n\n\nclass DoubleRange(PySparkTable):\n    range: Range = Range()\n\n    def df(self):\n        return self.range.read().withColumn(\"id\", F.col(\"id\") * 2)\n\n\nrun(DoubleRange())\n```\n\n!!! warning\n\n    Previous code may not work yet and it can change. Please, notice this\n    project is in an early stage of its development.\n\nAll docstrings included in the code will be used to generate documentation\nabout your data pipeline. That information, together with the result of data\nexpectations/data quality rules will help you to always have complete and up to\ndate documentation.\n\nBesides that, as you have seen, Tuberia is pure Python so doing unit tests/data\ntests is very easy. Programming gurus will enjoy data engineering again!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguiferviz%2Ftuberia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguiferviz%2Ftuberia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguiferviz%2Ftuberia/lists"}