{"id":20941091,"url":"https://github.com/lmc-eu/luft","last_synced_at":"2026-04-13T11:31:33.593Z","repository":{"id":81826447,"uuid":"201061542","full_name":"lmc-eu/luft","owner":"lmc-eu","description":"💨 Luft is standard operators replacement for Airflow with declarative DAGs via Yaml file.","archived":false,"fork":false,"pushed_at":"2020-03-27T11:12:40.000Z","size":137,"stargazers_count":1,"open_issues_count":2,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2026-01-01T09:08:47.703Z","etag":null,"topics":["airflow","dag","declarative","etl","luft","workflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lmc-eu.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELIST","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-07T13:59:08.000Z","updated_at":"2020-03-20T17:51:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"51be32f6-0fdf-4e99-91ad-fa2d6e8b73d2","html_url":"https://github.com/lmc-eu/luft","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lmc-eu/luft","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmc-eu%2Fluft","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmc-eu%2Fluft/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmc-eu%2Fluft/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmc-eu%2Fluft/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lmc-eu","download_url":"https://codeload.github.com/lmc-eu/luft/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lmc-eu%2Fluft/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31751232,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T09:16:15.125Z","status":"ssl_error","status_checked_at":"2026-04-13T09:16:05.023Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","dag","declarative","etl","luft","workflow"],"created_at":"2024-11-18T23:12:59.259Z","updated_at":"2026-04-13T11:31:33.574Z","avatar_url":"https://github.com/lmc-eu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Luft\n\nLuft is standard operators replacement for Airflow with declarative DAGs via Yaml file. It is basically client that helps you with everyday BI tasks.\n\nAirflow comes with batteries loaded - couple of operators that makes your BI work less painful. But after years of using it we stumbled upon to some issues with standard operators.\n\n* Operators are closely tied to Airflow. So for example if Data Scientist wants to ad-hoc download one table from MySQL database and\nsave it to BigQuery, he/she have to create new DAG, operator, jdbc credentials and run whole Airflow ecosystem on localhost. Which is usually overkill.\n* Standard loading operators (eg. MySqlToGoogleCloudStorageOperator) doesn't work in Kubernetes.\n* Standard loading operators are slow due to Python implementation and are not usable for big loads.\n* It is really hard to debug and test operators.\n* Airflow doesn't include standard principles for solving DWH problems.\n* Schema of data is not usually versioned in standard loading operators.\n* Airflow can be replaced with some alternatives in near future. E.g. Prefect, Dagster etc.\n\nLuft is solving most of those problems.\n\n## Basics\n\nLuft is ment to be running inside Docker container (but of course it can run without it). It is just a simple Python library that is wrapper of multiple services.\n_For example for paralell and fast bulk loading of data from any JDBC to BigQuery it uses Embulk, for executin BigQuery command it use standard Python BQ library, etc_.\n\n### Task\n\nEvery work is done by task which is defined in **YAML file** (example is in `examples/tasks`).\n_For example loading table Test from MySQL database into S3 is one task, loading data from GA into S3 is another task, historization script in BQ is another task etc._\n\nMandatory parameters of every task are:\n\n* _name_: name of task. In case of tables it is usually table name.\n* _source_system_: name of source system. Usually name of database in JDBC databases. Used for better organization especially on blob storage. E.g. jobs, prace, pzr. If not specified the source_system is taken from folder hierarchy. If you look into `example/tasks/world` then `world` will be source system if you do not specify it in your yaml file.\n* _source_subsystem_: name of source subsystem. Usually name of schema in JDBC databases. Used for better organization especially on blob storage. E.g. public, b2b. If not specified the source_subsystem is taken from folder hierarchy. If you look into `example/tasks/world/public` then `public` will be source subsystem if you do not specify it in your yaml file.\n* _task_type_: type of task. E.g. embulk-jdbc-load, mongo-load, etc. Luft will automatically decide which task will be used based on your cli command. So you do not have to manually specify it. But it can be useful when you want to enforce certain task regardless of cli command (e.g. you want to run BigQuery even if all other tasks in folder are responsible for loading data from MySQL to S3).\n\n### Task List\n\nTasks are organized into Task Lists that is an array of work to be done for certain period of time.\n_E.g. you want to download tables T1, T2 and T3 from MySQL database into S3 from 2018-01-01 to 2019-05-02 (and you have where condition on some date)._\n\n## Task Types\n\nLuft is currently supporting following task types:\n\n### embulk-jdbc-load\n\nRun Embulk and load data from JDBC db into S3 or GCS. Data are stored as CSV. Some other output data formats will be added later.\n\n#### Command\n\n```bash\nluft jdbc load\n```\n\n#### Command parameters\n\n* `y`, `--yml-path` (mandatory): folder or single yml file inside default tasks folder (see luft.cfg).\n* `-s`, `--start-date`: Start date in format YYYY-MM-DD for executing task in loop. If not specified yesterday date is used.\n* `-e`, `--end-date`: End date in format YYYY-MM-DD for executing task in loop. This day is not included. If not specified today date is used.\n* `-sys`, `--source-system`: override source_system parameter. See description in _Task_ section. Has to be same as name in jdbc.cfg to get right credentials for JDBC database.\n* `-sub`, `--source-subsystem`: override source_subsystem parameter. See description in _Task_ section.\n* `-b`, `--blacklist`: Name of tables/objects to be ignored during processing. E.g. --yml-path gis and -b TEST. It will process all objects in gis folder except object TEST.\n* `-w`, `--whitelist`: Name of tables/objects to be processed. E.g. --yml-path gis and -b TEST. It will process only object TEST.\n\n#### Requirements\n\n* Embulk in your docker image (see Dockerfile) or on your local.\n* Appropiriate Embulk plugins:\n  * Output - [embulk-output-gcs](https://github.com/embulk/embulk-output-gcs) or [embulk-output-s3](https://github.com/llibra/embulk-output-s3)\n  * Input - any you need of [embulk-input-jdbc](https://github.com/embulk/embulk-input-jdbc)\n* Luft installed :).\n* `jdbc.cfg` file with right configuration.\n\n### jdbc.cfg\n\nThis file contains basic jdbc configuration for all of your databases. Every database has to have `[DATABASE_NAME]` header. This has to be same as *source_system*. Supported parameters are:\n\n* *type* - type of database accourding to [embulk-input-jdbc](https://github.com/embulk/embulk-input-jdbc)\n* *uri* - uri of database\n* *port* - database port\n* *database* - database name\n* *user* - username of user who you want to log into database\n* *password* - you can specify your password even it is not recommeded way how to do that because you password can be stolen. It is good for DEV but not for PROD.\n* *password_env* - name of enviromental variable used for storing password. If you use this variant you can then pass password in docker run command e.g. if password_env is set to  `MY_DB_PASS` then `docker run -e MY_DB_PASS=Password123 luft jdbc load -y \u003cpath_to_yml\u003e` should work.\n\n#### Yaml file parameters\n\nInside yaml file, following parameters are supported:\n\n* *name* - Table name.\n* *source_system* - usually name of database - used for organizational purposes and blob storage path. Has to be same as name in jdbc.cfg to get right credentials for JDBC database.\n* *source_subsystem* - usually name of schema - used for organizational purposes and blob storage path.\n* *task_type* - `embulk-jdbc-load` by default but can be overidden. When overriden it is going to be different kind of task :).\n* *thread_name* - applicable only when used with Airflow. Thread name is automatically genereted based on number of threads. If you need this task to have totally different thread you can specify custom thread name.\nEg. I have tasks T1, T2, T3, T4 and T5 in my task list. and thread count set to 3. By default (if no task has _thread_name_ specified) it will look like this in Airflow:\n\n```text\n|T1| -\u003e |T4|\n|T2| -\u003e |T5|\n|T3|\n```\n\nWhen I specify any _thread_name_ in task T4:\n\n```text\n|T1| -\u003e |T5|\n|T2|\n|T3|\n|T4|\n```\n\n* *color* - applicable only when used with Airflow. Hex color of Task in Airflow. If not specified `#A3E9DA` is used.\n* *path_prefix* - Path prefix (path) on blob storage. You can use following templated fields:\n  * {env} - your environment (DEV/PROD...)\n  * {source_system} - name of source system (whatever you like) - in case of jdbc it is usually friendly name of db\n  * {source_subsystem} - name of source subsystem (whatever you like) - in case of jdbc it is schema name\n  * {name} - name of table\n  * {date_valid} - date of valid of export\n  * {time_valid} - time valid of export\n* *embulk_template* - path to your custom embulk template. Otherwise default from `luft.cfg` will be used.\n* *fetch_rows* - number of rows to fetch one time. Default 10000.\n* *source_table* - in case you need different name in blob storage. E.g. Table name is Test1 but you want to rename it to Test in your DWH and on your blob storage. In this case you will write Test to your _name_ parameter in yaml file and Test1 in _source_table_ parameter.\n* *where_clause* - Where condition in your SQL command. You can use `{date_valid}` parameter inside this command to print actual date valid. E.g. `where_clause: date_of_change \u003e= '{date_valid}'`. And if you execute `luft jdbc load -y \u003cpath_to_task\u003e -s 2019-01-01 -e 2019-05-01` for evey date between `2019-01-01` and `2019-05-01` it will print `WHERE date_of_change \u003e= '2019-01-01'`.\n* *columns* - list of columns to download. Column parameters:\n  * *name* - column name.\n  * *type* - column type.\n  * *mandatory* - wheter column is mandatory. Default false.\n  * *pk* - wheter column is primary key. Default false.\n  * *escape* - escape name of column with `. Some databases reqire it.\n  * *value* - fixed column value. You should never delete any of your columns from yaml file. Instead you should set `value: 'Null'`.\n\n### bq-load\n\nLoad data from BigQuery from Google Cloud Storage and historize them. Currently only CSV is supported\n\n#### Command\n\n```bash\nluft bq load\n```\n\n#### Command parameters\n\n* `y`, `--yml-path` (mandatory): folder or single yml file inside default tasks folder (see luft.cfg).\n* `-s`, `--start-date`: Start date in format YYYY-MM-DD for executing task in loop. If not specified yesterday date is used.\n* `-e`, `--end-date`: End date in format YYYY-MM-DD for executing task in loop. This day is not included. If not specified today date is used.\n* `-sys`, `--source-system`: override source_system parameter. See description in _Task_ section. Has to be same as name in jdbc.cfg to get right credentials for JDBC database.\n* `-sub`, `--source-subsystem`: override source_subsystem parameter. See description in _Task_ section.\n* `-b`, `--blacklist`: Name of tables/objects to be ignored during processing. E.g. --yml-path gis and -b TEST. It will process all objects in gis folder except object TEST.\n* `-w`, `--whitelist`: Name of tables/objects to be processed. E.g. --yml-path gis and -b TEST. It will process only object TEST.\n\n#### Requirements\n\n* Luft installed :) with BigQuery - `pip install luft[bq]`.\n* Credentials file (usually `service_account.json`) mapped into docker and configured in `luft.cfg`.\n\n#### Yaml file parameters\n\nInside yaml file, following parameters are supported:\n\n* *name* - Any name you want. Used mainly for name in Airflow UI.\n* *source_system* - only for organizational purposes. In exec has not some special role.\n* *source_subsystem* -  only for organizational purposes. In exec has not some special role.\n* *task_type* - `bq-load` by default but can be overidden. When overriden it is going to be different kind of task :).\n* *thread_name* - applicable only when used with Airflow. Thread name is automatically genereted based on number of threads. If you need this task to have totally different thread you can specify custom thread name.\n    Eg. I have tasks T1, T2, T3, T4 and T5 in my task list. and thread count set to 3. By default (if no task has _thread_name_ specified) it will look like this in Airflow:\n\n    ```text\n    |T1| -\u003e |T4|\n    |T2| -\u003e |T5|\n    |T3|\n    ```\n\n    When I specify any _thread_name_ in task T4:\n\n    ```text\n    |T1| -\u003e |T5|\n    |T2|\n    |T3|\n    |T4|\n    ```\n\n* *color* - applicable only when used with Airflow. Hex color of Task in Airflow. If not specified `#03A0F3` is used.\n* *project_id* = BigQuery project id. Default from `luft.cfg`.\n* *location* = BigQuery location. Default from `location.cfg`.\n* *columns* - list of columns to download. Column parameters:\n  * *name* - column name.\n  * *type* - column type.\n  * *mandatory* - wheter column is mandatory. Default false.\n  * *pk* - wheter column is primary key. Default false.\n  * *escape* - escape name of column with `. Some databases reqire it.\n  * *value* - fixed column value. You should never delete any of your columns from yaml file. Instead you should set `value: 'Null'`.\n* *dataset_id* - Google BigQuery dataset name. If not specified, source_system name is used. It will be created if does not exists.\n* *path_prefix* - Path prefix (path) on blob storage. You can use following templated fields:\n  * {env} - your environment (DEV/PROD...)\n  * {source_system} - name of source system (whatever you like) - in case of jdbc it is usually friendly name of db\n  * {source_subsystem} - name of source subsystem (whatever you like) - in case of jdbc it is schema name\n  * {name} - name of table\n  * {date_valid} - date of valid of export\n  * {time_valid} - time valid of export\n* *skip_leading_rows* - whether first row of CSV should be considered header and not loaded. Default True.\n* *allow_quoted_newlines* - quoted data sections that contain newline characters in a CSV file are allowed. Defaults to True.\n* *field_delimiter* - how the fields are delimited. Default '\\t' (tab).\n* *disable_check* - by default, the check for number of loader rows into stage schema is enabled. If no data are loaded the error will appear. \nIf you need to disable this check, set this flag to True. Default False.\n\n\n---------\n\n### bq-exec\n\nRun BigQuery sql command from file.\n\n#### Command\n\n```bash\nluft bq exec\n```\n\n#### Command parameters\n\n* `y`, `--yml-path` (mandatory): folder or single yml file inside default tasks folder (see luft.cfg).\n* `-s`, `--start-date`: Start date in format YYYY-MM-DD for executing task in loop. If not specified yesterday date is used.\n* `-e`, `--end-date`: End date in format YYYY-MM-DD for executing task in loop. This day is not included. If not specified today date is used.\n* `-sys`, `--source-system`: override source_system parameter. See description in _Task_ section. Has to be same as name in jdbc.cfg to get right credentials for JDBC database.\n* `-sub`, `--source-subsystem`: override source_subsystem parameter. See description in _Task_ section.\n* `-b`, `--blacklist`: Name of tables/objects to be ignored during processing. E.g. --yml-path gis and -b TEST. It will process all objects in gis folder except object TEST.\n* `-w`, `--whitelist`: Name of tables/objects to be processed. E.g. --yml-path gis and -b TEST. It will process only object TEST.\n\n#### Requirements\n\n* Luft installed :) with BigQuery - `pip install luft[bq]`.\n* Credentials file (usually `service_account.json`) mapped into docker and configured in `luft.cfg`.\n\n#### Yaml file parameters\n\nInside yaml file, following parameters are supported:\n\n* *name* - Any name you want. Used mainly for name in Airflow UI.\n* *source_system* - only for organizational purposes. In exec has not some special role.\n* *source_subsystem* -  only for organizational purposes. In exec has not some special role.\n* *task_type* - `bq-load` by default but can be overidden. When overriden it is going to be different kind of task :).\n* *thread_name* - applicable only when used with Airflow. Thread name is automatically genereted based on number of threads. If you need this task to have totally different thread you can specify custom thread name.\n    Eg. I have tasks T1, T2, T3, T4 and T5 in my task list. and thread count set to 3. By default (if no task has _thread_name_ specified) it will look like this in Airflow:\n\n    ```text\n    |T1| -\u003e |T4|\n    |T2| -\u003e |T5|\n    |T3|\n    ```\n\n    When I specify any _thread_name_ in task T4:\n\n    ```text\n    |T1| -\u003e |T5|\n    |T2|\n    |T3|\n    |T4|\n    ```\n\n* *color* - applicable only when used with Airflow. Hex color of Task in Airflow. If not specified `#73DBF5` is used.\n* *sql_folder* - path of folder where your SQL are located.\n* *sql_files* - list of SQL files to be executed.\n* *project_id* = BigQuery project id. Default from `luft.cfg`.\n* *location* = BigQuery location. Default from `location.cfg`.\n\n#### Templating in SQL\n\nInside of SQL you can use shortcuts for some useful variables:\n\n* *ENV*: Environment. E.g. PROD.\n* *TASK_TYPE*: Task type. `bq-exec`.\n* *NAME*: Name from yaml param.\n* *SOURCE_SYSTEM*: Source system.\n* *SOURCE_SUBSYSTEM*: Source subsystem.\n* *DATE_VALID*: Date valid of current run.\n* *TIME_VALID*: Time valid.\n* *TASK_ID*: Id of task.\n* *THREAD_NAME*: Thread name of task.\n* *YAML_FILE*: Yaml file location.\n* *BQ_PROJECT_ID*: BigQuery project id.\n* *BQ_LOCATION*: BigQuery location.\n\nExample:\n\n```yml\n-- Example of templating\nSELECT '{{ BQ_LOCATION }}';\nSELECT '{{ BQ_PROJECT_ID }}';\nSELECT '{{ DATE_VALID }}';\nSELECT '{{ SOURCE_SYSTEM }}';\nSELECT '{{ ENV }}';\n```\n\n### qlik-cloud-upload\n\nExport application from Qlik Sense Enterprise, upload it to Qlik Sense cloud and publish it into certain stream.\n\n#### Command\n\n```bash\nluft qlik-cloud upload\n```\n\n#### Command parameters\n\n* `y`, `--yml-path` (mandatory): folder or single yml file inside default tasks folder (see luft.cfg).\n* `-s`, `--start-date`: Start date in format YYYY-MM-DD for executing task in loop. If not specified yesterday date is used.\n* `-e`, `--end-date`: End date in format YYYY-MM-DD for executing task in loop. This day is not included. If not specified today date is used.\n* `-sys`, `--source-system`: override source_system parameter. See description in _Task_ section. Has to be same as name in jdbc.cfg to get right credentials for JDBC database.\n* `-sub`, `--source-subsystem`: override source_subsystem parameter. See description in _Task_ section.\n* `-b`, `--blacklist`: Name of tables/objects to be ignored during processing. E.g. --yml-path gis and -b TEST. It will process all objects in gis folder except object TEST.\n* `-w`, `--whitelist`: Name of tables/objects to be processed. E.g. --yml-path gis and -b TEST. It will process only object TEST.\n\n#### Requirements\n\n* Luft installed :) with Qlik Sense CLoud - `pip install luft[qlik-cloud]`.\n* Installed `google-chrome` and `chromedriver` in your Docker image or localhost - see [Python Selenium Installation](https://selenium-python.readthedocs.io/installation.html).\n* Credentials files (`client_key.pem`, `client.pem` and `root.pem`) mapped into docker and configured in `luft.cfg` in `[qlik_enterprise]` section.\n* Set all other configs in `luft.cfg` in sections `[qlik_enterprise]` and `[qlik_cloud]`.\n\n#### Yaml file parameters\n\nInside yaml file, following parameters are supported:\n\n* *name* - Any name you want. Used mainly for name in Airflow UI.\n* *group_id*: Qlik cloud Group ID.\n* *source_system* - only for organizational purposes. In exec has not some special role.\n* *source_subsystem* -  only for organizational purposes. In exec has not some special role.\n* *task_type* - `bq-load` by default but can be overidden. When overriden it is going to be different kind of task :).\n* *thread_name* - applicable only when used with Airflow. Thread name is automatically genereted based on number of threads. If you need this task to have totally different thread you can specify custom thread name.\n    Eg. I have tasks T1, T2, T3, T4 and T5 in my task list. and thread count set to 3. By default (if no task has _thread_name_ specified) it will look like this in Airflow:\n\n    ```text\n    |T1| -\u003e |T4|\n    |T2| -\u003e |T5|\n    |T3|\n    ```\n\n    When I specify any _thread_name_ in task T4:\n\n    ```text\n    |T1| -\u003e |T5|\n    |T2|\n    |T3|\n    |T4|\n    ```\n\n* *color* - applicable only when used with Airflow. Hex color of Task in Airflow. If not specified `#009845` is used.\n* *apps* - list of applications for loading from QSE into certain account on Qlik Sense Cloud. Has following sublists:\n  * *name*: name to show in Airflow.\n  * *filename*: name of file on file on filesystem.\n  * *qse_id*: Qlik Sense Enterprise application id.\n  * *qsc_stream*: Qlik Sense Cloud stream name.\n\n### qlik-metric-load\n\nLoad data from Qlik metric, convert them to json and upload to blob storage.\n\n#### Command\n\n```bash\nluft qlik-metric load\n```\n\n#### Command parameters\n\n* `y`, `--yml-path` (mandatory): folder or single yml file inside default tasks folder (see luft.cfg).\n* `-s`, `--start-date`: Start date in format YYYY-MM-DD for executing task in loop. If not specified yesterday date is used.\n* `-e`, `--end-date`: End date in format YYYY-MM-DD for executing task in loop. This day is not included. If not specified today date is used.\n* `-sys`, `--source-system`: override source_system parameter. See description in _Task_ section. Has to be same as name in jdbc.cfg to get right credentials for JDBC database.\n* `-sub`, `--source-subsystem`: override source_subsystem parameter. See description in _Task_ section.\n* `-b`, `--blacklist`: Name of tables/objects to be ignored during processing. E.g. --yml-path gis and -b TEST. It will process all objects in gis folder except object TEST.\n* `-w`, `--whitelist`: Name of tables/objects to be processed. E.g. --yml-path gis and -b TEST. It will process only object TEST.\n\n#### Requirements\n\n* Luft installed :) with Qlik Sense CLoud - `pip install luft[qlik-metric]`.\n* Credentials files (`client_key.pem`, `client.pem` and `root.pem`) mapped into docker and configured in `luft.cfg` in `[qlik_enterprise]` section.\n* Set all other configs in `luft.cfg` in sections `[qlik_enterprise]`.\n\n#### Yaml file parameters\n\nInside yaml file, following parameters are supported:\n\n* *name* - Any name you want. Used mainly for name in Airflow UI.\n* *source_system* - only for organizational purposes. In exec has not some special role.\n* *source_subsystem* -  only for organizational purposes. In exec has not some special role.\n* *task_type* - `bq-load` by default but can be overidden. When overriden it is going to be different kind of task :).\n* *thread_name* - applicable only when used with Airflow. Thread name is automatically genereted based on number of threads. If you need this task to have totally different thread you can specify custom thread name.\n    Eg. I have tasks T1, T2, T3, T4 and T5 in my task list. and thread count set to 3. By default (if no task has _thread_name_ specified) it will look like this in Airflow:\n\n    ```text\n    |T1| -\u003e |T4|\n    |T2| -\u003e |T5|\n    |T3|\n    ```\n\n    When I specify any _thread_name_ in task T4:\n\n    ```text\n    |T1| -\u003e |T5|\n    |T2|\n    |T3|\n    |T4|\n    ```\n\n* *color* - applicable only when used with Airflow. Hex color of Task in Airflow. If not specified `#009845` is used.\n* *app_id* - Qlik application id.\n* *dimensions* - list. List of field names.\n* *measures* - list. List of Master Measure names.\n* *selections* - List of selection dictionaries to filter data.\n\n## Running example\n\n### 1. Creating `luft.cfg`\n\nFirst you need to create config file `luft.cfg` according to example in `example/config/luft.cfg` and place it into root folder. If you want to use BigQuery and Google Cloud Storage you of course need credentials for it - [GC authentication](https://cloud.google.com/docs/authentication/getting-started). In case of AWS S3 you need to get `AWS Access Key ID` and `AWS Secret Access Key`.\n\nCredentials (GCS, AWS, BigQuery) can be specified by three ways:\n\n#### 1) Directly in `luft.cfg` file\n\n| WARNING: this possibility is recommended only for local development. Because if you publish image to public repository, everybody will know your secrets  |\n| --- |\n\n#### 2) In .env file. You can create .env file\n\n```env\nEMBULK_COMMAND=embulk\nLUFT_CONFIG=example/config/luft.cfg\nJDBC_CONFIG=example/config/jdbc.cfg\nTASKS_FOLDER=example/tasks\nBLOB_STORAGE=gcs\nGCS_BUCKET=\nGCS_AUTH_METHOD=json_key\nGCS_JSON_KEYFILE=\nBQ_PROJECT_ID=\nBQ_CREDENTIALS_FILE=\nBQ_LOCATION=US\nAWS_BUCKET=\nAWS_ENDPOINT=\nAWS_ACCESS_KEY_ID=\nAWS_SECRET_ACCESS_KEY=\n```\n\nAnd then run your Docker command with this enviroment file:\n\n```bash\ndocker run -it -rm --env-file .env luft\n```\n\n#### 3) Directly specifying env in command\n\nThis variant is prefered.\n\n```bash\ndocker run -it -rm -e BLOB_STORAGE=gcs luft\n```\n\n### 2. Creating `jcbc.cfg`\n\nFor example purposes just copy `jdbc.cfg` from `example/config/` into root folder or set `JDBC_CONFIG` in your `.env` file or by `-e` parameter.\n\n### 3. Build Docker image\n\nJust run:\n\n```bash\ndocker build -t luft .\n```\n\n### 4. Run example postgres database\n\n```bash\ndocker run -d -p 5432:5432 aa8y/postgres-dataset:world\n```\n\n### 5. Run Luft to download data\n\nStore example data from postgres database in S3 or GCS.\n\n```bash\ndocker run -rm luft jdbc load -y world\n```\n\n### Run BQ exec example\n\nOptionally if you have configured BigQuery in your `luft.cfg` you can run:\n\n```bash\ndocker run -rm luft bq exec -y bq\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flmc-eu%2Fluft","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flmc-eu%2Fluft","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flmc-eu%2Fluft/lists"}