{"id":19427778,"url":"https://github.com/starrocks/dbt-starrocks","last_synced_at":"2025-04-24T17:31:46.659Z","repository":{"id":40469311,"uuid":"485725371","full_name":"StarRocks/dbt-starrocks","owner":"StarRocks","description":"dbt-starrocks contains all of the code enabling dbt to work with StarRocks","archived":false,"fork":false,"pushed_at":"2025-04-22T02:20:34.000Z","size":125,"stargazers_count":32,"open_issues_count":20,"forks_count":13,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-22T04:12:15.591Z","etag":null,"topics":["dbt","starrocks"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StarRocks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-26T09:49:17.000Z","updated_at":"2025-04-22T02:20:38.000Z","dependencies_parsed_at":"2024-04-29T08:28:30.715Z","dependency_job_id":"bf920434-5cb0-46a5-9f16-78d55698dab9","html_url":"https://github.com/StarRocks/dbt-starrocks","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StarRocks%2Fdbt-starrocks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StarRocks%2Fdbt-starrocks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StarRocks%2Fdbt-starrocks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StarRocks%2Fdbt-starrocks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StarRocks","download_url":"https://codeload.github.com/StarRocks/dbt-starrocks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250674375,"owners_count":21469214,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dbt","starrocks"],"created_at":"2024-11-10T14:12:50.710Z","updated_at":"2025-04-24T17:31:46.648Z","avatar_url":"https://github.com/StarRocks.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dbt-starrocks\n\n![PyPI](https://img.shields.io/pypi/v/dbt-starrocks)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dbt-starrocks)\n![PyPI - Downloads](https://img.shields.io/pypi/dw/dbt-starrocks)\n\nThis project is **under development**.\n\n\nThe `dbt-starrocks` package contains all the code to enable [dbt](https://getdbt.com) to work with [StarRocks](https://www.starrocks.io).\n\nThis is an experimental plugin:\n- We have not tested it extensively\n- Requires StarRocks version 2.5.0 or higher  \n  - version 3.1.x is recommended\n  - StarRocks versions 2.4 and below are no longer supported\n\n\n## Installation\n\nThis plugin can be installed via pip:\n\n```shell\n$ pip install dbt-starrocks\n```\n\n## Supported features\n| StarRocks \u003c= 2.5 | StarRocks 2.5 ~ 3.1 | StarRocks \u003e= 3.1 | StarRocks \u003e= 3.4 |              Feature              |\n|:----------------:|:-------------------:|:----------------:|:----------------:|:---------------------------------:|\n|        ✅         |          ✅          |        ✅         |        ✅         |       Table materialization       |\n|        ✅         |          ✅          |        ✅         |        ✅         |       View materialization        |\n|        ❌         |          ❌          |        ✅         |        ✅         | Materialized View materialization |\n|        ❌         |          ✅          |        ✅         |        ✅         |    Incremental materialization    |\n|        ❌         |          ✅          |        ✅         |        ✅         |         Primary Key Model         |\n|        ✅         |          ✅          |        ✅         |        ✅         |              Sources              |\n|        ✅         |          ✅          |        ✅         |        ✅         |         Custom data tests         |\n|        ✅         |          ✅          |        ✅         |        ✅         |           Docs generate           |\n|        ❌         |          ❌          |        ✅         |        ✅         |       Expression Partition        |\n|        ❌         |          ❌          |        ❌         |        ❌         |               Kafka               |\n|        ❌         |          ❌          |        ❌         |        ✅         |         Dynamic Overwrite         |\n|        ❌         |         *4          |        *4        |        ✅         |            Submit task            |\n|        ❌         |          ✅          |        ✅         |        ✅         |  Microbatch (Insert Overwrite)   |\n|        ❌         |          ❌          |        ❌         |        ✅         | Microbatch (Dynamic Overwrite)   |\n\n### Notice\n1. When StarRocks Version \u003c 2.5, `Create table as` can only set engine='OLAP' and table_type='DUPLICATE'\n2. When StarRocks Version \u003e= 2.5, `Create table as` supports table_type='PRIMARY'\n3. When StarRocks Version \u003c 3.1 distributed_by is required\n4. Verify the specific `submit task` support for your version, see [SUBMIT TASK](https://docs.starrocks.io/docs/sql-reference/sql-statements/loading_unloading/ETL/SUBMIT_TASK/). \n\n## Profile Configuration\n\n**Example entry for profiles.yml:**\n\n```\nstarrocks:\n  target: dev\n  outputs:\n    dev:\n      type: starrocks\n      host: localhost\n      port: 9030\n      schema: analytics\n      username: your_starrocks_username\n      password: your_starrocks_password\n```\n\n| Option              | Description                                                        | Required? | Example                        |\n|---------------------|--------------------------------------------------------------------|-----------|--------------------------------|\n| type                | The specific adapter to use                                        | Required  | `starrocks`                    |\n| host                | The hostname to connect to                                         | Required  | `192.168.100.28`               |\n| port                | The port to use                                                    | Required  | `9030`                         |\n| schema              | Specify the schema (database) to build models into                 | Required  | `analytics`                    |\n| username            | The username to use to connect to the server                       | Required  | `dbt_admin`                    |\n| password            | The password to use for authenticating to the server               | Required  | `correct-horse-battery-staple` |\n| version             | Let Plugin try to go to a compatible starrocks version             | Optional  | `3.1.0`                        |\n| use_pure            | set to \"true\" to use C extensions                                  | Optional  | `true`                         |\n| is_async            | \"true\" to submit suitable tasks as etl tasks.                      | Optional  | `true`                         |\n| async_query_timeout | Sets the `query_timeout` value when submitting a task to StarRocks | Optional  | `300`                            |\n\nMore details about setting `use_pure` and other connection arguments [here](https://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html)\n\n\n## Example\n\n### dbt seed properties(yml):\n#### Complete configuration:\n```\nmodels:\n  materialized: table                   // table, view, materialized_view or incremental\n  engine: 'OLAP'\n  keys: ['id', 'name', 'some_date']\n  table_type: 'PRIMARY'                 // PRIMARY or DUPLICATE or UNIQUE\n  distributed_by: ['id']\n  buckets: 3                            // leave empty for auto bucketing\n  indexs=[{ 'columns': 'idx_column' }]  \n  partition_by: ['some_date']\n  partition_by_init: [\"PARTITION p1 VALUES [('1971-01-01 00:00:00'), ('1991-01-01 00:00:00')),PARTITION p1972 VALUES [('1991-01-01 00:00:00'), ('1999-01-01 00:00:00'))\"]\n  // RANGE, LIST, or Expr partition types should be used in conjunction with partition_by configuration\n  // Expr partition type requires an expression (e.g., date_trunc) specified in partition_by\n  order_by: ['some_column']             // only for PRIMARY table_type\n  partition_type: 'RANGE'               // RANGE or LIST or Expr Need to be used in combination with partition_by configuration\n  properties: {\"replication_num\":\"1\", \"in_memory\": \"true\"}\n  refresh_method: 'async'               // only for materialized view default manual\n  \n  // For 'materialized=incremental' in version \u003e= 3.4\n  incremental_strategy: 'dynamic_overwrite' // Supported values: ['default', 'insert_overwrite', 'dynamic_overwrite']\n\n  // For 'materialized=incremental' and 'incremental_strategy=microbatch'\n  event_time: 'some_timestamp_column'     // The column name of the event time\n  begin: '2025-01-01'                     // The start time of the incremental data\n  lookback: 1                             // The lookback time of the each incremental run\n  batch_size: 'day'                       // The batch size. Supported values ['year', 'month', 'day', 'hour']\n  microbatch_use_dynamic_overwrite: true  // Whether to use dynamic_overwrite in version \u003e= 3.4\n```\n  \n### dbt run config:\n#### Example configuration:\n```\n{{ config(materialized='view') }}\n{{ config(materialized='table', engine='OLAP', buckets=32, distributed_by=['id']) }}\n{{ config(materialized='table', indexs=[{ 'columns': 'idx_column' }]) }}\n{{ config(materialized='table', partition_by=['date_trunc(\"day\", first_order)'], partition_type='Expr') }}\n{{ config(materialized='table', table_type='PRIMARY', keys=['customer_id'], order_by=['first_name', 'last_name'] }}\n{{ config(materialized='incremental', table_type='PRIMARY', engine='OLAP', buckets=32, distributed_by=['id']) }}\n{{ config(materialized='incremental', partition_by=['my_partition_key'], partition_type='Expr', incremental_strategy='dynamic_overwrite') }}\n{{ config(materialized='incremental', partition_by=['my_partition_key'], partition_type='Expr', incremental_strategy='microbatch', event_time='report_day', begin='2025-01-01', lookback=1, batch_size='day') }}\n{{ config(materialized='incremental', partition_by=['my_partition_key'], partition_type='Expr', incremental_strategy='microbatch', event_time='report_day', begin='2025-01-01', lookback=1, batch_size='day', microbatch_use_dynamic_overwrite=true) }}\n{{ config(materialized='materialized_view') }}\n{{ config(materialized='materialized_view', properties={\"storage_medium\":\"SSD\"}) }}\n{{ config(materialized='materialized_view', refresh_method=\"ASYNC START('2022-09-01 10:00:00') EVERY (interval 1 day)\") }}\n```\nFor materialized view only support partition_by、buckets、distributed_by、properties、refresh_method configuration.\n\n## Read From Catalog\nFirst you need to add this catalog to starrocks. The following is an example of hive.\n```mysql\nCREATE EXTERNAL CATALOG `hive_catalog`\nPROPERTIES (\n    \"hive.metastore.uris\"  =  \"thrift://127.0.0.1:8087\",\n    \"type\"=\"hive\"\n);\n```\nHow to add other types of catalogs can be found in the documentation.\nhttps://docs.starrocks.io/en-us/latest/data_source/catalog/catalog_overview\nThen write the sources.yaml file.\n```yaml\nsources:\n  - name: external_example\n    schema: hive_catalog.hive_db\n    tables:\n      - name: hive_table_name\n```\nFinally, you might use below marco quote \n```\n{{ source('external_example', 'hive_table_name') }}\n```\n\n## Dynamic Overwrite (StarRocks \u003e= 3.4)\nAdd a new `incremental_strategy` property that supports the following values:\n- `default` (or omitted): Standard inserts without `overwrite`.\n- `insert_overwrite`: Will apply `overwrite` with `dynamic_overwrite = false` to the inserts.\n- `dynamic_overwrite`: Will apply `overwrite` with `dynamic_overwrite = true` to the inserts.\n\nFor more details on the different behaviors, see [StarRocks' documentation for INSERT](https://docs.starrocks.io/docs/sql-reference/sql-statements/loading_unloading/INSERT).\n\n## Submittable ETL tasks\n\n\u003e The implementation of the submittable etl is located in the `impl.py` file.\n\nSetting `is_async: true` in your `profiles.yml` will enable submitting suitable ETL tasks using the `submit task` feature of StarRocks.\n\nThis will be automatically wrapped around any statement that supports submission. Setting this manually is currently not supported by the adapter.\n\nThe following statements will be submitted automatically:\n\n- `CREATE AS ... SELECT`\n- `INSERT INTO|OVERWRITE`\n- `CACHE SELECT ...`\n\n\u003e See [StarRocks' documentation on SUBMIT TASK](https://docs.starrocks.io/docs/sql-reference/sql-statements/loading_unloading/ETL/SUBMIT_TASK/)\n\n### Task Polling\n\nOnce the task has been submitted, the adapter will periodically poll StarRocks' `information_schema.task_runs` to retrieve the task status. \n\nThe polling is implemented using an exponential backoff, with a maximum delay of 10 minutes. The adapter's connection to the StarRocks' cluster will not be maintained during the waiting period. It will be re-opened right before the next status polling phase.\n\n### Controlling the task timeout\n\nUsing the `async_query_timeout` property in the `profiles.yml` will control the value of the `query_timeout` when submitting task.\n\nIt's going to be injected in the SQL query submitted to StarRocks:\n\n```sql\nsubmit /*+set_var(query_timeout={async_query_timeout})*/ task ...\n```\n\n### Example `profiles.yml` configuration\n\n```yml\nmy_profile:\n  target: dev\n  outputs:\n    dev:\n      type: starrocks\n      host: host\n      port: 9030\n      schema: schema\n      username: username\n      password: password\n      is_async: true\n      async_query_timeout: 3600 # 1 hour\n```\n\n## Test Adapter\nRun the following\n```\npython3 -m pytest tests/functional\n```\nconsult [the project](https://github.com/dbt-labs/dbt-adapter-tests)\n\n## Contributing\nWe welcome you to contribute to dbt-starrocks. Please see the [Contributing Guide](https://github.com/StarRocks/starrocks/blob/main/CONTRIBUTING.md) for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstarrocks%2Fdbt-starrocks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstarrocks%2Fdbt-starrocks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstarrocks%2Fdbt-starrocks/lists"}