{"id":19651304,"url":"https://github.com/embulk/embulk-input-sftp","last_synced_at":"2025-10-06T22:28:27.583Z","repository":{"id":50945619,"uuid":"53126314","full_name":"embulk/embulk-input-sftp","owner":"embulk","description":"Reads files stored on remote server using SFTP","archived":false,"fork":false,"pushed_at":"2021-05-27T04:21:41.000Z","size":273,"stargazers_count":4,"open_issues_count":1,"forks_count":3,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-15T14:51:08.185Z","etag":null,"topics":["embulk","embulk-input-plugin","embulk-plugin","sftp"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/embulk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-03-04T10:06:46.000Z","updated_at":"2024-01-13T23:55:00.000Z","dependencies_parsed_at":"2022-08-28T15:33:50.203Z","dependency_job_id":null,"html_url":"https://github.com/embulk/embulk-input-sftp","commit_stats":null,"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-sftp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-sftp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-sftp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-sftp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/embulk","download_url":"https://codeload.github.com/embulk/embulk-input-sftp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251345929,"owners_count":21574807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embulk","embulk-input-plugin","embulk-plugin","sftp"],"created_at":"2024-11-11T15:06:00.576Z","updated_at":"2025-10-06T22:28:22.525Z","avatar_url":"https://github.com/embulk.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SFTP file input plugin for Embulk\n[![Build Status](https://travis-ci.org/embulk/embulk-input-sftp.svg?branch=master)](https://travis-ci.org/embulk/embulk-input-sftp)\n\nReads files stored on remote server using SFTP\n\nembulk-input-sftp v0.3.0+ requires Embulk v0.9.12+\n\n## Overview\n\n* **Plugin type**: file input\n* **Resume supported**: yes\n* **Cleanup supported**: yes\n\n## Configuration\n\n- **host**: (string, required)\n- **port**: (string, default: `22`)\n- **user**: (string, required)\n- **password**: (string, default: `null`)\n- **secret_key_file**: (string, default: `null`). **OpenSSH** format is required.\n- **secret_key_passphrase**: (string, default: `\"\"`)\n- **user_directory_is_root**: (boolean, default: `true`)\n- **timeout**: sftp connection timeout seconds (integer, default: `600`)\n- **path_prefix**: Prefix of output paths (string, required)\n- **incremental**: enables incremental loading(boolean, optional. default: `true`). If incremental loading is enabled, config diff for the next execution will include `last_path` parameter so that next execution skips files before the path. Otherwise, `last_path` will not be included.\n- **path_match_pattern**: regexp to match file paths. If a file path doesn't match with this pattern, the file will be skipped (regexp string, optional)\n- **total_file_count_limit**: maximum number of files to read (integer, optional)\n- **min_task_size (experimental)**: minimum size of a task. If this is larger than 0, one task includes multiple input files. This is useful if too many number of tasks impacts performance of output or executor plugins badly. (integer, optional)\n\n### Proxy configuration\n\n- **proxy**:\n    - **type**: (string(http | socks | stream), required, default: `null`)\n        - **http**: use HTTP Proxy\n        - **socks**: use SOCKS Proxy\n        - **stream**: Connects to the SFTP server through a remote host reached by SSH\n    - **host**: (string, required)\n    - **port**: (int, default: `22`)\n    - **user**: (string, optional)\n    - **password**: (string, optional, default: `null`)\n    - **command**: (string, optional)\n\n### Example\n\n```yaml\nin:\n  type: sftp\n  host: 127.0.0.1\n  port: 22\n  user: embulk\n  secret_key_file: /Users/embulk/.ssh/id_rsa\n  secret_key_passphrase: secret_pass\n  user_directory_is_root: false\n  timeout: 600\n  path_prefix: /data/sftp\n```\n\nTo filter files using regexp:\n\n```yaml\nin:\n  type: sftp\n  path_prefix: logs/csv-\n  ...\n  path_match_pattern: \\.csv$   # a file will be skipped if its path doesn't match with this pattern\n\n  ## some examples of regexp:\n  #path_match_pattern: /archive/         # match files in .../archive/... directory\n  #path_match_pattern: /data1/|/data2/   # match files in .../data1/... or .../data2/... directory\n  #path_match_pattern: .csv$|.csv.gz$    # match files whose suffix is .csv or .csv.gz\n```\n\nWith proxy\n```yaml\nin:\n  type: sftp\n  host: 127.0.0.1\n  port: 22\n  user: embulk\n  secret_key_file: /Users/embulk/.ssh/id_rsa\n  secret_key_passphrase: secret_pass\n  user_directory_is_root: false\n  timeout: 600\n  path_prefix: /data/sftp\n  proxy:\n    type: http\n    host: proxy_host\n    port: 8080\n    user: proxy_user\n    password: proxy_secret_pass\n    command:\n```\n\n## Proxy settings\n\n### Example\n```yaml\nin:\n  type: sftp\n  host: 127.0.0.1\n  port: 22\n  user: embulk\n  secret_key_file: /Users/embulk/.ssh/id_rsa\n  secret_key_passphrase: secret_pass\n  user_directory_is_root: false\n  timeout: 600\n  path_prefix: /data/sftp\n```\n\n### Secret Keyfile configuration\n\nPlease set path of secret_key_file as follows.\n```yaml\nin:\n  type: sftp\n  ...\n  secret_key_file: /path/to/id_rsa\n  ...\n```\n\nYou can also embed contents of secret_key_file at config.yml.\n```yaml\nin:\n  type: sftp\n  ...\n  secret_key_file:\n    content: |\n      -----BEGIN RSA PRIVATE KEY-----\n      ABCDEFG...\n      HIJKLMN...\n      OPQRSTU...\n      -----END RSA PRIVATE KEY-----\n  ...\n```\n\n## Build\n\n```\n$ ./gradlew gem  # -t to watch change of files and rebuild continuously\n$ ./gradlew bintrayUpload # release embulk-input-sftp to Bintray maven repo\n```\n\n## Test\n\n```\n$ ./gradlew test  # -t to watch change of files and rebuild continuously\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-input-sftp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembulk%2Fembulk-input-sftp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-input-sftp/lists"}