{"id":19651332,"url":"https://github.com/embulk/embulk-input-script","last_synced_at":"2025-04-28T16:31:24.486Z","repository":{"id":56588811,"uuid":"170834115","full_name":"embulk/embulk-input-script","owner":"embulk","description":null,"archived":false,"fork":false,"pushed_at":"2020-10-29T23:34:14.000Z","size":128,"stargazers_count":7,"open_issues_count":4,"forks_count":1,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-05T09:51:05.372Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/embulk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-15T09:02:52.000Z","updated_at":"2022-10-13T07:26:31.000Z","dependencies_parsed_at":"2022-08-15T21:31:06.182Z","dependency_job_id":null,"html_url":"https://github.com/embulk/embulk-input-script","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-script","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-script/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-script/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-script/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/embulk","download_url":"https://codeload.github.com/embulk/embulk-input-script/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251345925,"owners_count":21574807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T15:06:05.067Z","updated_at":"2025-04-28T16:31:23.984Z","avatar_url":"https://github.com/embulk.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Script input plugin for Embulk\n\nEnable any developers to build Embulk input plugins using any languages.\n\nYou don't have to learn Embulk API. Your script writes a CSV file and Embulk takes it.\n\n## Configuration\n\n- **run**: a shell command to run (string, optional)\n- **config**: contents of config.yml passed to the 1st argument of `setup` (config, optional)\n- **cwd**: change path to this directory if set (string, optional)\n- **env**: environment variables for the command (key-value pairs, default: `{}`)\n- **try_named_pipe**: set `false` to disable named-pipe optimization (string, default: `true`)\n\n## Developing a script\n\nThis plugin runs a given command and reads CSV file output of it.\n\nFirst, you write a embulk configuration file as following:\n\n```yaml\nin:\n  type: script\n  run: python my_script.py\n  config:\n    my_config_1: value_1\nout:\n  type: stdout\n```\n\nWith this configuration, this plugin executes your command (`python my_script.py`) as following:\n\n1. python my_script.py **setup** config.yml _setup.yml_\n2. python my_script.py **run** setup.yml _output.csv_ **N**\n3. python my_script.py **finish** setup.yml _next.yml_\n\nAs you see, your script runs 3 times (_italic_ is paths for write (your script writes to the paths). The others are for read).\n\nAt step 1, your script is called with **setup** as the first argument. Your script should read a config file (`config.yml`) from the path of 2nd argument, and write a YAML file (`setup.yml`) to the 3rd argument. Config file (`config.yml`) includes the contents you give in the `config:` section of the Embulk config file (`my_config_1: value_1`). Setup file (`setup.yml`) must include `tasks: N` (N is an integer) and `columns: SCHEMA` at least. See \"The setup file\" section bellow for details.\n\nAt step 2, your script is called with **run** as the first argument, and the YAML file written by step 1 (`setup.yml`) as the 2nd argument. Your script should write a CSV file to the 3rd argument (`output.csv`). This step runs multiple times with sequence number starting from 0 as the 4th argument (`N`). You specify number of the repeat to the `tasks` field in the setup file.\n\nAt step 3, your script is called with **finish** as the first argument, and the YAML file written by step 1 (`setup.yml`) as the 2nd argument. Your script optionally write a YAML file for the next execution to the 3rd argument.\n\n### The setup file\n\nAt step 1, the \"setup\" step, your script writes a setup file (`setup.yml`) as following:\n\n```\ntasks: 1\ncolumns:\n  - {name: my_col_1, type: string}\n  - {name: foo_bar, type: double}\n  - {name: my_time, type: timestamp, format: \"%Y-%m-%d %H:%M:%S\"}\nsome_other_fields: anything_here\n```\n\n`tasks` gives number of tasks to run at step 2, the \"run\" step. If it's 3, for example, step 2 runs your script with 0, 1, and 2 as the 3rd argument.\n\n`columns` gives schema of the data. It's necessary for embulk to be able to read the CSV file. Syntax of this field is same with the `columns` field of embulk-input-csv. You can find more details in (embulk-input-csv documents)[https://www.embulk.org/docs/built-in.html#id4].\n\nAs long as it includes `tasks` and `columns` fields, it can include any fields for your convenience.\n\n### CSV format\n\nCSV file written by your script must follow RFC 4180 CSV file format **without header line**.\n\n### Example scripts\n\nYou can find script examples at [embulk/embulk-input-script/examples](https://github.com/embulk/embulk-input-script/tree/master/examples).\n\n## Overview\n\n* **Plugin type**: input\n* **Resume supported**: no\n* **Cleanup supported**: no\n* **Guess supported**: no\n\n\n## Build\n\n```\n$ ./gradlew gem  # -t to watch change of files and rebuild continuously\n```\n\n## Development\n\n### TODOs\n\n* Packaging of script is wanted. It's something to make following action possible:\n\n```\n$ embulk-input-script-packaging --files=./ --run=\"./take_data.py\" --output=~/embulk-input-take_data.gem\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-input-script","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembulk%2Fembulk-input-script","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-input-script/lists"}