{"id":21093845,"url":"https://github.com/tosh2230/stairlight","last_synced_at":"2025-05-16T14:32:26.903Z","repository":{"id":37038651,"uuid":"420395805","full_name":"tosh2230/stairlight","owner":"tosh2230","description":"A data lineage tool detects table dependencies from rendered SQL statements.","archived":false,"fork":false,"pushed_at":"2024-09-04T00:23:53.000Z","size":2538,"stargazers_count":27,"open_issues_count":6,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-09-23T23:18:00.952Z","etag":null,"topics":["bigquery","data-catalog","data-discovery","data-engineering","data-governance","data-lineage","data-management","data-ops","dbt","gcs","lineage","redash","s3","sql"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/stairlight/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tosh2230.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-23T11:40:19.000Z","updated_at":"2024-06-22T08:14:20.000Z","dependencies_parsed_at":"2023-09-23T13:09:09.448Z","dependency_job_id":"8688c7ef-d59e-4635-a93b-8852d0affaa8","html_url":"https://github.com/tosh2230/stairlight","commit_stats":{"total_commits":616,"total_committers":2,"mean_commits":308.0,"dds":"0.025974025974025983","last_synced_commit":"c3b7e36d5094fe778aff410de2a2cd93ab1015f3"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tosh2230%2Fstairlight","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tosh2230%2Fstairlight/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tosh2230%2Fstairlight/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tosh2230%2Fstairlight/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tosh2230","download_url":"https://codeload.github.com/tosh2230/stairlight/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225265009,"owners_count":17446757,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","data-catalog","data-discovery","data-engineering","data-governance","data-lineage","data-management","data-ops","dbt","gcs","lineage","redash","s3","sql"],"created_at":"2024-11-19T22:13:01.392Z","updated_at":"2024-11-19T22:13:02.000Z","avatar_url":"https://github.com/tosh2230.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/tosh2230/stairlight/main/img/stairlight_white.png\" width=\"400\" alt=\"Stairlight\"\u003e\n\u003c/div\u003e\n\n-----------------\n\n# Stairlight\n\n[![PyPi Version](https://img.shields.io/pypi/v/stairlight.svg?style=flat-square\u0026logo=PyPi)](https://pypi.org/project/stairlight/)\n[![PyPi License](https://img.shields.io/pypi/l/stairlight.svg?style=flat-square)](https://pypi.org/project/stairlight/)\n[![PyPi Python Versions](https://img.shields.io/pypi/pyversions/stairlight.svg?style=flat-square)](https://pypi.org/project/stairlight/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/psf/black)\n[![CI](https://github.com/tosh2230/stairlight/actions/workflows/ci.yml/badge.svg)](https://github.com/tosh2230/stairlight/actions/workflows/ci.yml)\n\nStairlight is a data lineage tool, detects table dependencies from rendered SQL statements.\n\n\u003cdiv align=\"left\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/tosh2230/stairlight/main/img/drawio/concepts.drawio.png\" width=\"1080\" alt=\"concepts\"\u003e\n\u003c/div\u003e\n\n## Supported Data Sources\n\n| Data Source | Remarks |\n| --- | --- |\n| Local file system | Python Pathlib module |\n| [Amazon S3](https://aws.amazon.com/s3/) | Available for [Amazon Managed Workflows for Apache Airflow (MWAA)](https://aws.amazon.com/managed-workflows-for-apache-airflow/) |\n| [Google Cloud Storage](https://cloud.google.com/storage) | Available for [Google Cloud Composer](https://cloud.google.com/composer) |\n| [dbt](https://www.getdbt.com/) - [Google BigQuery](https://cloud.google.com/bigquery) | Using `dbt compile` command internally |\n| [Redash](https://redash.io/) | |\n\n## Installation\n\nThis package is distributed on [PyPI](https://pypi.org/project/stairlight/).\n\n```sh\n# The base package is for local file system only.\n$ pip install stairlight\n\n# Set extras when detecting from other data sources.\n# e.g. Amazon S3 and Google Cloud Storage\n$ pip install \"stairlight[s3, gcs]\"\n```\n\n| Data Source | TemplateSourceType | Extra |\n| --- | --- | --- |\n| Local file system | File | - |\n| Amazon S3 | S3 | s3 |\n| Google Cloud Storage | GCS | gcs |\n| dbt - Google Bigquery | dbt | dbt-bigquery |\n| Redash | Redash | redash |\n\n## Getting Started\n\nThere are 3 steps to use.\n\n```sh\n# 1: Initialize and set your data source settings\n$ stairlight init\n\n# 2: Map your SQL statements and tables\n$ stairlight map\n\n# 3: Get table dependencies\n$ stairlight\n```\n\n## Description\n\n### Input\n\n- SQL statements\n- Configuration YAML files\n    - stairlight.yaml: SQL statements locations and include/exclude conditions.\n    - mapping.yaml: For mapping SQL statements and tables.\n\n### Output\n\nStairlight outputs table dependencies as JSON format.\n\nTop-level keys are table names, and values represents tables that are the data source for each key's table.\n\n\u003cdetails\u003e\n\n\u003csummary\u003eExample\u003c/summary\u003e\n\n```json\n{\n  \"test_project.beam_streaming.taxirides_aggregation\": {\n    \"test_project.beam_streaming.taxirides_realtime\": {\n      \"TemplateSourceType\": \"File\",\n      \"Key\": \"tests/sql/union_same_table.sql\",\n      \"Uri\": \"/foo/bar/stairlight/tests/sql/union_same_table.sql\",\n      \"Lines\": [\n        {\n          \"LineNumber\": 6,\n          \"LineString\": \"    test_project.beam_streaming.taxirides_realtime\"\n        },\n        {\n          \"LineNumber\": 15,\n          \"LineString\": \"    test_project.beam_streaming.taxirides_realtime\"\n        }\n      ]\n    }\n  },\n  \"PROJECT_a.DATASET_b.TABLE_c\": {\n    \"PROJECT_A.DATASET_A.TABLE_A\": {\n      \"TemplateSourceType\": \"GCS\",\n      \"Key\": \"sql/one_line/one_line.sql\",\n      \"Uri\": \"gs://stairlight/sql/one_line/one_line.sql\",\n      \"Lines\": [\n        {\n          \"LineNumber\": 1,\n          \"LineString\": \"SELECT * FROM PROJECT_A.DATASET_A.TABLE_A WHERE 1 = 1\"\n        }\n      ],\n      \"BucketName\": \"stairlight\",\n      \"Labels\": {\n        \"Source\": null,\n        \"Test\": \"a\"\n      }\n    }\n  },\n  \"AggregateSales\": {\n    \"PROJECT_e.DATASET_e.TABLE_e\": {\n      \"TemplateSourceType\": \"Redash\",\n      \"Key\": 5,\n      \"Uri\": \"AggregateSales\",\n      \"Lines\": [\n        {\n          \"LineNumber\": 1,\n          \"LineString\": \"SELECT service, SUM(total_amount) FROM PROJECT_e.DATASET_e.TABLE_e GROUP BY service\"\n        }\n      ],\n      \"DataSourceName\": \"BigQuery\",\n      \"Labels\": {\n        \"Category\": \"Sales\"\n      }\n    }\n  },\n  \"dummy.dummy.example_b\": {\n    \"PROJECT_t.DATASET_t.TABLE_t\": {\n      \"TemplateSourceType\": \"dbt\",\n      \"Key\": \"tests/dbt/project_01/target/compiled/project_01/models/b/example_b.sql\",\n      \"Uri\": \"/foo/bar/stairlight/tests/dbt/project_01/target/compiled/project_01/models/b/example_b.sql\",\n      \"Lines\": [\n        {\n          \"LineNumber\": 1,\n          \"LineString\": \"select * from PROJECT_t.DATASET_t.TABLE_t where value_a = 0 and value_b = 0\"\n        }\n      ]\n    }\n  },\n  \"PROJECT_as.DATASET_bs.TABLE_cs\": {\n    \"PROJECT_A.DATASET_A.TABLE_A\": {\n      \"TemplateSourceType\": \"S3\",\n      \"Key\": \"sql/one_line/one_line.sql\",\n      \"Uri\": \"s3://stairlight/sql/one_line/one_line.sql\",\n      \"Lines\": [\n        {\n          \"LineNumber\": 1,\n          \"LineString\": \"SELECT * FROM PROJECT_A.DATASET_A.TABLE_A WHERE 1 = 1\"\n        }\n      ],\n      \"BucketName\": \"stairlight\",\n      \"Labels\": {\n        \"Source\": null,\n        \"Test\": \"a\"\n      }\n    }\n  }\n}\n```\n\n\u003c/details\u003e\n\n### Collecting patterns\n\n#### Centralization\n\n\u003cdiv align=\"left\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/tosh2230/stairlight/main/img/drawio/centralization.drawio.png\" width=\"800\" alt=\"centralization\"\u003e\n\u003c/div\u003e\n\n#### Agents\n\n\u003cdiv align=\"left\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/tosh2230/stairlight/main/img/drawio/agents.drawio.png\" width=\"800\" alt=\"agents\"\u003e\n\u003c/div\u003e\n\n## Configuration\n\nExamples can be found [here](https://github.com/tosh2230/stairlight/tree/main/tests/config), used for unit testing in CI.\n\n### stairlight.yaml\n\n'stairlight.yaml' is for setting up Stairlight itself. It is responsible for specifying SQL statements to be read.\n\n`stairlight init` creates a template of stairlight.yaml.\n\n\u003cdetails\u003e\n\n\u003csummary\u003eExample\u003c/summary\u003e\n\n```yaml\nInclude:\n  - TemplateSourceType: File\n    FileSystemPath: ./tests/sql\n    Regex: .*/*\\.sql$\n    DefaultTablePrefix: \"PROJECT_A\"\n  - TemplateSourceType: GCS\n    ProjectId: null\n    BucketName: stairlight\n    Regex: ^sql/.*/*\\.sql$\n    DefaultTablePrefix: \"PROJECT_A\"\n  - TemplateSourceType: Redash\n    DatabaseUrlEnvironmentVariable: REDASH_DATABASE_URL\n    DataSourceName: BigQuery\n    QueryIds:\n      - 1\n      - 3\n      - 5\n  - TemplateSourceType: dbt\n    ProjectDir: tests/dbt/project_01\n    ProfilesDir: tests/dbt\n    Vars:\n      key_a: value_a\n      key_b: value_b\n  - TemplateSourceType: S3\n    BucketName: stairlight\n    Regex: ^sql/.*/*\\.sql$\n    DefaultTablePrefix: \"PROJECT_A\"\nExclude:\n  - TemplateSourceType: File\n    Regex: main/exclude\\.sql$\nSettings:\n  MappingFilesRegex:\n    - .*/mapping\\_file\\.yaml$\n    - .*/mapping\\_gcs\\.yaml$\n    - .*/mapping\\_dbt\\.yaml$\n    - .*/mapping\\_s3\\.yaml$\n  # Deprecated from v0.7.2\n  MappingPrefix: \"mapping\"\n```\n\n\u003c/details\u003e\n\n### mapping.yaml\n\n'mapping.yaml' is used to define relationships between input SELECT statements and tables.\n\n`stairlight map` creates a template of mapping.yaml and attempts to read from data sources specified in stairlight.yaml.\nIf successfully read, it outputs settings that have not yet configured in an existing 'mapping.yaml' file.\n\n\u003cdetails\u003e\n\n\u003csummary\u003eExample\u003c/summary\u003e\n\n```yaml\nGlobal:\n  Parameters:\n    DESTINATION_PROJECT: stairlight\n    params:\n      PROJECT: 1234567890\n      DATASET: public\n      TABLE: taxirides\nMapping:\n  - TemplateSourceType: File\n    FileSuffix: \"tests/sql/union_same_table.sql\"\n    Tables:\n      - TableName: \"test_project.beam_streaming.taxirides_aggregation\"\n        Parameters:\n          params:\n            source_table: source\n            destination_table: destination\n        IgnoreParameters:\n          - execution_date.add(days=1).isoformat()\n  - TemplateSourceType: GCS\n    Uri: \"gs://stairlight/sql/one_line/one_line.sql\"\n    Tables:\n      - TableName: \"PROJECT_a.DATASET_b.TABLE_c\"\n  - TemplateSourceType: Redash\n    QueryId: 5\n    DataSourceName: metadata\n    Tables:\n      - TableName: New Query\n        Parameters:\n          table: dashboards\n        Labels:\n          Category: Redash test\n  - TemplateSourceType: dbt\n    ProjectName: project_01\n    FileSuffix: tests/dbt/project_01/target/compiled/project_01/models/example/my_first_dbt_model.sql\n    Tables:\n      - TableName: dummy.dummy.my_first_dbt_model\n  - TemplateSourceType: S3\n    Uri: \"s3://stairlight/sql/one_line/one_line.sql\"\n    Tables:\n      - TableName: \"PROJECT_as.DATASET_bs.TABLE_cs\"\nExtraLabels:\n  - TableName: \"PROJECT_A.DATASET_A.TABLE_A\"\n    Labels:\n      Source: Null\n      Test: a\n```\n\n\u003c/details\u003e\n\n#### Global Section\n\nThis section is for global configurations.\n\n`Parameters` is used to set common parameters. If conflicts has occurred with `Parameters` in mapping section, mapping section's parameters will be used in preference to global.\n\n#### Mapping Section\n\nMapping section is used to define relationships between input SELECT statements and tables that created as a result of query execution.\n\n`Parameters` allows you to reflect settings in [jinja](https://jinja.palletsprojects.com/) template variables embedded in statements. If multiple settings are applied to a statement using jinja template, the statement will be read as if there were the same number of queries as the number of settings.\n\nIn contrast, `IgnoreParameters` handles a list to ignore when rendering queries.\n\n#### Extra labels Section\n\nThis section sets labels to tables that appears only in queries.\n\n## Arguments and Options\n\n```txt\n$ stairlight --help\nusage: stairlight [-h] [-c CONFIG] [--save SAVE] [--load LOAD] {init,check,up,down} ...\n\nAn end-to-end data lineage tool, detects table dependencies by SQL SELECT statements.\nWithout positional arguments, return a table dependency map as JSON format.\n\npositional arguments:\n  {init,map,check,list,up,down}\n    init                create a new Stairlight configuration file\n    map (check)         create a new configuration file about undefined mappings\n    list                return all ( tables | URIs )\n    up                  return upstairs ( tables | URIs )\n    down                return downstairs ( tables | URIs )\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -c CONFIG, --config CONFIG\n                        set a Stairlight configuration directory\n  -q, --quiet           keep silence\n  --save SAVE           A file path where map results will be saved.\n                        You can choose from local file system, GCS, S3.\n  --load LOAD           A file path where map results are saved.\n                        You can choose from local file system, GCS, S3.\n                        It can be specified multiple times.\n```\n\n### init\n\n`stairlight init` creates a new Stairlight configuration file.\n\n```txt\n$ stairlight init --help\nusage: stairlight init [-h] [-c CONFIG]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -c CONFIG, --config CONFIG\n                        set a Stairlight configuration directory\n  -q, --quiet           keep silence\n```\n\n### map(check)\n\n`stairlight map` creates a new configuration file about undefined settings. `stairlight check` is an alias.\nOptions are the same as `stairlight init`.\n\n### list\n\n`stairlight list` outputs all of tables or SQL URIs.\n\n- Output option(`-o`, `--output`) determines the output type, tables or URIs.\n\n### up\n\n`stairlight up` outputs tables or SQL URIs located upstream(upstairs) from the specified table.\n\n- Use table(`-t`, `--table`) or label(`-l`, `--label`) option to specify tables to search.\n- Output option(`-o`, `--output`) is same as `stairlight list`.\n- Recursive option(`-r`, `--recursive`) is set, Stairlight will find dependencies recursively and output as a list.\n- Verbose option(`-v`, `--verbose`) is set, Stairlight will add detailed information and output it as a dict.\n\n```txt\n$ stairlight up --help\nusage: stairlight up [-h] [-c CONFIG] [--save SAVE] [--load LOAD] (-t TABLE | -l LABEL) [-o {table,uri}]\n                     [-v] [-r]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -c CONFIG, --config CONFIG\n                        set a Stairlight configuration directory\n  -q, --quiet           keep silence\n  --save SAVE           A file path where mapped results will be saved.\n                        You can choose from local file system, GCS, S3.\n  --load LOAD           A file path where mapped results are saved.\n                        You can choose from local file system, GCS, S3.\n                        It can be specified multiple times.\n  -t TABLE, --table TABLE\n                        table names that Stairlight searches for, can be specified\n                        multiple times. e.g. -t PROJECT_a.DATASET_b.TABLE_c -t\n                        PROJECT_d.DATASET_e.TABLE_f\n  -l LABEL, --label LABEL\n                        labels set for the table in mapping configuration, can be specified multiple times.\n                        The separator between key and value should be a colon(:).\n                        e.g. -l key_1:value_1 -l key_2:value_2\n  -o {table,uri}, --output {table,uri}\n                        output type\n  -v, --verbose         return verbose results\n  -r, --recursive       search recursively\n```\n\n### down\n\n`stairlight down` outputs tables or SQL URIs located downstream(downstairs) from the specified table.\nOptions are the same as `stairlight up`.\n\n## Use as a library\n\nStairlight can also be used as a library.\n\n[tosh2230/stairlight-app](https://github.com/tosh2230/stairlight-app) is a sample web application rendering table dependency graph with Stairlight, using Graphviz, Streamlit and Google Cloud Run.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftosh2230%2Fstairlight","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftosh2230%2Fstairlight","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftosh2230%2Fstairlight/lists"}