{"id":27627188,"url":"https://github.com/treasure-data/embulk-output-td","last_synced_at":"2025-04-23T13:53:27.385Z","repository":{"id":28839426,"uuid":"32363156","full_name":"treasure-data/embulk-output-td","owner":"treasure-data","description":"Embulk output plugin for Treasure Data","archived":false,"fork":false,"pushed_at":"2024-11-08T06:44:07.000Z","size":562,"stargazers_count":10,"open_issues_count":5,"forks_count":7,"subscribers_count":85,"default_branch":"master","last_synced_at":"2025-04-12T11:55:59.126Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://www.embulk.org/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"agragregra/agragregra.github.com","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/treasure-data.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-03-17T01:11:39.000Z","updated_at":"2024-11-08T06:44:08.000Z","dependencies_parsed_at":"2024-07-26T08:33:37.047Z","dependency_job_id":null,"html_url":"https://github.com/treasure-data/embulk-output-td","commit_stats":{"total_commits":256,"total_committers":13,"mean_commits":"19.692307692307693","dds":0.44921875,"last_synced_commit":"bd52377204eca8d2acbebdd41986bde27e895b76"},"previous_names":[],"tags_count":43,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treasure-data%2Fembulk-output-td","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treasure-data%2Fembulk-output-td/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treasure-data%2Fembulk-output-td/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treasure-data%2Fembulk-output-td/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/treasure-data","download_url":"https://codeload.github.com/treasure-data/embulk-output-td/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250444188,"owners_count":21431598,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-23T13:53:26.461Z","updated_at":"2025-04-23T13:53:27.378Z","avatar_url":"https://github.com/treasure-data.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TD output plugin for Embulk\n\n[Treasure Data Service](https://www.treasuredata.com/) output plugin for [Embulk](https://github.com/embulk/embulk)\n\n**NOTICE**:\n  * embulk-output-td v0.8.0+ only supports **Embulk v0.9.24**. Embulk v0.9.23 does not work.\n  * embulk-output-td v0.5.0+ requires Java 1.8 or higher.\n  * embulk-output-td v0.4.0+ only supports **Embulk v0.8.22+**.\n\n## Overview\n\n* **Plugin type**: output\n* **Load all or nothing**: yes\n* **Resume supported**: no\n\n## Configuration\n\n- **apikey**: apikey (string, required)\n- **endpoint**: hostname (string, default='api.treasuredata.com')\n- **http_proxy**: http proxy configuration (tuple of host, port, useSsl, user, and password. default is null)\n- **use_ssl**: the flag (boolean, default=true)\n- **auto_create_table**: the flag for creating the database and/or the table if they don't exist (boolean, default=true)\n- **mode**: 'append', 'replace' and 'truncate' (string, default='append')\n- **database**: database name (string, required)\n- **table**: table name (string, required)\n- **session**: bulk_import session name (string, optional)\n- **pool_name**: bulk_import session pool name (string, optional)\n- **time_column**: user-defined time column (string, optional)\n- **unix_timestamp_unit**: if type of \"time\" or **time_column** is long, it's considered unix timestamp. This option specify its unit in sec, milli, micro or nano (enum, default: `sec`)\n- **tmpdir**: temporal directory (string, optional) if set to null, plugin will use directory that could get from System.property\n- **upload_concurrency**: upload concurrency (int, default=2). max concurrency is 8.\n- **file_split_size**: split size (long, default=16384 (16MB)).\n- **stop_on_invalid_record**: stop bulk load transaction if a file includes invalid record (such as invalid timestamp) (boolean, default=false).\n- **displayed_error_records_count_limit**: limit the count of the shown error records skipped by the perform job (int, default=10).\n- **default_timestamp_type_convert_to**: configure output type of timestamp columns. Available options are \"sec\" (convert timestamp to UNIX timestamp in seconds) and \"string\" (convert timestamp to string). (string, default: `\"string\"`)\n- **default_timezone**: default timezone (string, default='UTC')\n- **default_timestamp_format**: default timestamp format (string, default=`%Y-%m-%d %H:%M:%S.%6N`)\n- **column_options**: advanced: a key-value pairs where key is a column name and value is options for the column.\n  - **type**: The type of column when this plugin adds a new column to a TD's table (e.g. `array\u003cstring\u003e`). Available options are: `int`, `long`, `float`, `double`, `string`, `array\u003cint\u003e`, `array\u003clong\u003e`, `array\u003cdouble\u003e`, `array\u003cstring\u003e`, `array\u003carray\u003cint\u003e\u003e`. More information can be found: https://tddocs.atlassian.net/wiki/spaces/PD/pages/1083743/Schema+Management. (string, optional)\n  - **value_type**: This plugin converts Embulk input data type to msgpack data type that is uploaded to TD. This option controls the msgpack data type which Embulk data in the column is converted to. Available options are: `boolean`, `long`, `double`, `string`, `timestamp`, `array`, `map`. (string, optional)\n  - **timezone**: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp value into a SQL string. In this cases, this timezone option is used to control the timezone. (string, value of default_timezone option is used by default)\n  - **format**: If input column type (embulk type) is timestamp, this plugin needs to format the timestamp value into a string. This timestamp_format option is used to control the format of the timestamp. (string, value of default_timestamp_format option is used by default)\n- **retry_limit**: indicates how many retries are allowed (int, default: 20)\n- **retry_initial_interval_millis**: the initial intervals (int, default: 1000)\n- **retry_max_interval_millis**: the maximum intervals. The interval doubles every retry until retry_max_interval_millis is reached. (int, default: 90000)\n- **additional_http_headers**: add additional headers to the requests (a key \u0026 value map, default: null)\n- **port**: set port for Http requests. By default will connect to port 443 or 80 if `use_ssl: false` (int, optional)\n- **ignore_alternative_time_if_time_exists**: ignore `time_column` and `time_value` in the configuration if a `time` column exists in the input schema. (boolean, default: false)\n- **default_boolean_type_convert_to**: configure output TD's type from Embulk's BOOLEAN columns. Available options are \"long\" (convert Embulk's BOOLEAN to TD's long) and \"string\" (convert Embulk's BOOLEAN to TD's string). (string, default: `\"long\"`)\n\n## Modes\n* **append**:\n  - Uploads data to existing table directly.\n* **replace**:\n  - Creates new temp table and uploads data to the temp table first.\n  - After uploading finished, the table specified as 'table' option is replaced with the temp table.\n  - Schema in existing table is not migrated to the replaced table.\n* **truncate**:\n  - Creates new temp table and uploads data to the temp table first.\n  - After uploading finished, the table specified as 'table' option is replaced with the temp table.\n  - Schema in existing table is added to the replaced table.\n\n## Example\nHere is sample configuration for TD output plugin.\n```yaml\nout:\n  type: td\n  apikey: \u003cyour apikey\u003e\n  endpoint: api.treasuredata.com\n  database: my_db\n  table: my_table\n  time_column: created_at\n  auto_create_table: true\n  mode: append\n```\n\n### Http Proxy Configuration\nIf you want to add your Http Proxy configuration, you can use `http_proxy` parameter:\n```yaml\nout:\n  type: td\n  apikey: \u003cyour apikey\u003e\n  endpoint: api.treasuredata.com\n  http_proxy: {host: localhost, port: 8080, use_ssl: false, user: \"proxyuser\", password: \"PASSWORD\"}\n  database: my_db\n  table: my_table\n  time_column: created_at\n  auto_create_table: true\n  mode: append\n```\n\n### Additional Http headers\n```yaml\nout:\n  type: td\n  apikey: \u003cyour apikey\u003e\n  endpoint: api.treasuredata.com\n  database: my_db\n  table: my_table\n  time_column: created_at\n  auto_create_table: true\n  mode: append\n  additional_http_headers:\n    Content_Type: 'application/json'\n    foo: bar\n```\n\n### Column options\n```yaml\nout:\n  type: td\n  apikey: \u003cyour apikey\u003e\n  endpoint: api.treasuredata.com\n  database: my_db\n  table: my_table\n  time_column: created_at\n  auto_create_table: true\n  mode: append\n  column_options:\n    col_array:\n      type: array\u003cstring\u003e\n      value_type: array\n    col_long:\n      type: string\n      value_type: long\n    col_timestamp:\n      type: string\n      value_type: timestamp\n      timestamp_format: `%Y-%m-%d %H:%M:%S %z`\n      timezone: '-0700'\n```\n\n## Install\n\n```\n$ embulk gem install embulk-output-td\n```\n\n## Build\n\n### Build by Gradle\n```\n$ git clone https://github.com/treasure-data/embulk-output-td.git\n$ cd embulk-output-td\n$ ./gradlew gem classpath\n```\n\n### Run on Embulk\n$ bin/embulk run -I embulk-output-td/lib/ config.yml\n\n## Release\n\n### Upload gem to Rubygems.org\n\n```\n$ ./gradlew gem     # create .gem file under pkg/ directory\n$ ./gradlew gemPush # create and publish .gem file\n```\n\nRepo URL: https://rubygems.org/gems/embulk-output-td\n\n### Upload jars to Bintray.com\n\n```\n$ ./gradlew bintrayUpload\n```\n\nRepo URL: https://bintray.com/embulk-output-td/maven/embulk-output-td\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftreasure-data%2Fembulk-output-td","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftreasure-data%2Fembulk-output-td","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftreasure-data%2Fembulk-output-td/lists"}