{"id":19651269,"url":"https://github.com/embulk/embulk-input-azure_blob_storage","last_synced_at":"2025-04-28T16:31:20.505Z","repository":{"id":41081517,"uuid":"43898053","full_name":"embulk/embulk-input-azure_blob_storage","owner":"embulk","description":"Microsoft Azure Blob Storage file input plugin for Embulk","archived":false,"fork":false,"pushed_at":"2023-02-21T02:53:03.000Z","size":219,"stargazers_count":2,"open_issues_count":1,"forks_count":3,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-18T01:37:48.563Z","etag":null,"topics":["azure","azure-storage","embulk","embulk-input-plugin","embulk-plugin"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/embulk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-10-08T15:21:44.000Z","updated_at":"2023-04-11T15:18:17.000Z","dependencies_parsed_at":"2022-09-03T04:42:17.819Z","dependency_job_id":null,"html_url":"https://github.com/embulk/embulk-input-azure_blob_storage","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-azure_blob_storage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-azure_blob_storage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-azure_blob_storage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-input-azure_blob_storage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/embulk","download_url":"https://codeload.github.com/embulk/embulk-input-azure_blob_storage/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251345893,"owners_count":21574804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","azure-storage","embulk","embulk-input-plugin","embulk-plugin"],"created_at":"2024-11-11T15:05:49.887Z","updated_at":"2025-04-28T16:31:18.633Z","avatar_url":"https://github.com/embulk.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Azure Blob Storage file input plugin for Embulk\n[![Build Status](https://travis-ci.org/embulk/embulk-input-azure_blob_storage.svg?branch=master)](https://travis-ci.org/embulk/embulk-input-azure_blob_storage)\n\n[Embulk](http://www.embulk.org/) file input plugin read files stored on [Microsoft Azure](https://azure.microsoft.com/) [Blob Storage](https://azure.microsoft.com/en-us/documentation/articles/storage-introduction/#blob-storage)\n\nembulk-input-azure_blog_storage v0.2.0+ requires Embulk v0.9.12+\n\n## Overview\n\n* **Plugin type**: file input\n* **Resume supported**: no\n* **Cleanup supported**: yes\n\n## Configuration\n\nFirst, create Azure [Storage Account](https://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/).\n\n- **account_name**: storage account name (string, required)\n- **account_key**: primary access key (string, required)\n- **container**: container name data stored (string, required)\n- **path_prefix**: prefix of target keys (string, required) (string, required)\n- **incremental**: enables incremental loading(boolean, optional. default: true). If incremental loading is enabled, config diff for the next execution will include `last_path` parameter so that next execution skips files before the path. Otherwise, `last_path` will not be included.\n- **path_match_pattern**: regexp to match file paths. If a file path doesn't match with this pattern, the file will be skipped (regexp string, optional)\n- **total_file_count_limit**: maximum number of files to read (integer, optional)\n\n### Proxy configuration\n\n- **proxy**:\n    - **type**: (string, required, default: `null`)\n        - **http**: use HTTP Proxy\n    - **host**: (string, required)\n    - **port**: (int, required, default: `8080`)\n    - **user**: (string, optional)\n    - **password**: (string, optional)\n\n## Example\n\n```yaml\nin:\n  type: azure_blob_storage\n  account_name: myaccount\n  account_key: myaccount_key\n  container: my-container\n  path_prefix: logs/csv-\n```\n\nExample for \"sample_01.csv.gz\" , generated by [embulk example](https://github.com/embulk/embulk#trying-examples)\n\n```yaml\nin:\n  type: azure_blob_storage\n  account_name: myaccount\n  account_key: myaccount_key\n  container: my-container\n  path_prefix: logs/csv-\n  decoders:\n  - {type: gzip}\n  parser:\n    charset: UTF-8\n    newline: CRLF\n    type: csv\n    delimiter: ','\n    quote: '\"'\n    header_line: true\n    columns:\n    - {name: id, type: long}\n    - {name: account, type: long}\n    - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}\n    - {name: purchase, type: timestamp, format: '%Y%m%d'}\n    - {name: comment, type: string}\nout: {type: stdout}\n```\n\nTo filter files using regexp:\n\n```yaml\nin:\n  type: sftp\n  path_prefix: logs/csv-\n  ...\n  path_match_pattern: \\.csv$   # a file will be skipped if its path doesn't match with this pattern\n\n  ## some examples of regexp:\n  #path_match_pattern: /archive/         # match files in .../archive/... directory\n  #path_match_pattern: /data1/|/data2/   # match files in .../data1/... or .../data2/... directory\n  #path_match_pattern: .csv$|.csv.gz$    # match files whose suffix is .csv or .csv.gz\n```\n\nWith proxy\n```yaml\nin:\n  type: azure_blob_storage\n  ...\n  proxy:\n      type: http\n      host: proxy_host\n      port: 8080\n      user: proxy_user\n      password: proxy_secret_pass\n```\n## Build\n\n```\n$ ./gradlew gem  # -t to watch change of files and rebuild continuously\n```\n\n## Test\n\n```\n$ ./gradlew test  # -t to watch change of files and rebuild continuously\n```\n\nTo run unit tests, we need to configure the following environment variables.\n\nAdditionally, following files will be needed to upload to existing GCS bucket.\n\n* [sample_01.csv](src/test/resources/sample_01.csv)\n* [sample_02.csv](src/test/resources/sample_02.csv)\n* [missing_02.csv](src/test/resources/missing_02.csv)\n* [missing_03.csv](src/test/resources/missing_03.csv)\n\nWhen environment variables are not set, skip some test cases.\n\n```\nAZURE_ACCOUNT_NAME\nAZURE_ACCOUNT_KEY\nAZURE_CONTAINER\nAZURE_CONTAINER_IMPORT_DIRECTORY (optional, if needed)\n```\n\nIf you're using Mac OS X El Capitan and GUI Applications(IDE), like as follows.\n```xml\n$ vi ~/Library/LaunchAgents/environment.plist\n\u003c?xml version=\"1.0\" encoding=\"UTF-8\"?\u003e\n\u003c!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\"\u003e\n\u003cplist version=\"1.0\"\u003e\n\u003cdict\u003e\n  \u003ckey\u003eLabel\u003c/key\u003e\n  \u003cstring\u003emy.startup\u003c/string\u003e\n  \u003ckey\u003eProgramArguments\u003c/key\u003e\n  \u003carray\u003e\n    \u003cstring\u003esh\u003c/string\u003e\n    \u003cstring\u003e-c\u003c/string\u003e\n    \u003cstring\u003e\n      launchctl setenv AZURE_ACCOUNT_NAME my-account-name\n      launchctl setenv AZURE_ACCOUNT_KEY my-account-key\n      launchctl setenv AZURE_CONTAINER my-container\n      launchctl setenv AZURE_CONTAINER_IMPORT_DIRECTORY unittests\n    \u003c/string\u003e\n  \u003c/array\u003e\n  \u003ckey\u003eRunAtLoad\u003c/key\u003e\n  \u003ctrue/\u003e\n\u003c/dict\u003e\n\u003c/plist\u003e\n\n$ launchctl load ~/Library/LaunchAgents/environment.plist\n$ launchctl getenv AZURE_ACCOUNT_NAME //try to get value.\n\nThen start your applications.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-input-azure_blob_storage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembulk%2Fembulk-input-azure_blob_storage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-input-azure_blob_storage/lists"}