Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/embulk/embulk-input-azure_blob_storage
Microsoft Azure Blob Storage file input plugin for Embulk
https://github.com/embulk/embulk-input-azure_blob_storage
azure azure-storage embulk embulk-input-plugin embulk-plugin
Last synced: about 1 month ago
JSON representation
Microsoft Azure Blob Storage file input plugin for Embulk
- Host: GitHub
- URL: https://github.com/embulk/embulk-input-azure_blob_storage
- Owner: embulk
- Created: 2015-10-08T15:21:44.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2023-02-21T02:53:03.000Z (almost 2 years ago)
- Last Synced: 2024-10-05T08:17:51.955Z (3 months ago)
- Topics: azure, azure-storage, embulk, embulk-input-plugin, embulk-plugin
- Language: Java
- Homepage:
- Size: 214 KB
- Stars: 2
- Watchers: 11
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# Azure Blob Storage file input plugin for Embulk
[![Build Status](https://travis-ci.org/embulk/embulk-input-azure_blob_storage.svg?branch=master)](https://travis-ci.org/embulk/embulk-input-azure_blob_storage)[Embulk](http://www.embulk.org/) file input plugin read files stored on [Microsoft Azure](https://azure.microsoft.com/) [Blob Storage](https://azure.microsoft.com/en-us/documentation/articles/storage-introduction/#blob-storage)
embulk-input-azure_blog_storage v0.2.0+ requires Embulk v0.9.12+
## Overview
* **Plugin type**: file input
* **Resume supported**: no
* **Cleanup supported**: yes## Configuration
First, create Azure [Storage Account](https://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/).
- **account_name**: storage account name (string, required)
- **account_key**: primary access key (string, required)
- **container**: container name data stored (string, required)
- **path_prefix**: prefix of target keys (string, required) (string, required)
- **incremental**: enables incremental loading(boolean, optional. default: true). If incremental loading is enabled, config diff for the next execution will include `last_path` parameter so that next execution skips files before the path. Otherwise, `last_path` will not be included.
- **path_match_pattern**: regexp to match file paths. If a file path doesn't match with this pattern, the file will be skipped (regexp string, optional)
- **total_file_count_limit**: maximum number of files to read (integer, optional)### Proxy configuration
- **proxy**:
- **type**: (string, required, default: `null`)
- **http**: use HTTP Proxy
- **host**: (string, required)
- **port**: (int, required, default: `8080`)
- **user**: (string, optional)
- **password**: (string, optional)## Example
```yaml
in:
type: azure_blob_storage
account_name: myaccount
account_key: myaccount_key
container: my-container
path_prefix: logs/csv-
```Example for "sample_01.csv.gz" , generated by [embulk example](https://github.com/embulk/embulk#trying-examples)
```yaml
in:
type: azure_blob_storage
account_name: myaccount
account_key: myaccount_key
container: my-container
path_prefix: logs/csv-
decoders:
- {type: gzip}
parser:
charset: UTF-8
newline: CRLF
type: csv
delimiter: ','
quote: '"'
header_line: true
columns:
- {name: id, type: long}
- {name: account, type: long}
- {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
- {name: purchase, type: timestamp, format: '%Y%m%d'}
- {name: comment, type: string}
out: {type: stdout}
```To filter files using regexp:
```yaml
in:
type: sftp
path_prefix: logs/csv-
...
path_match_pattern: \.csv$ # a file will be skipped if its path doesn't match with this pattern## some examples of regexp:
#path_match_pattern: /archive/ # match files in .../archive/... directory
#path_match_pattern: /data1/|/data2/ # match files in .../data1/... or .../data2/... directory
#path_match_pattern: .csv$|.csv.gz$ # match files whose suffix is .csv or .csv.gz
```With proxy
```yaml
in:
type: azure_blob_storage
...
proxy:
type: http
host: proxy_host
port: 8080
user: proxy_user
password: proxy_secret_pass
```
## Build```
$ ./gradlew gem # -t to watch change of files and rebuild continuously
```## Test
```
$ ./gradlew test # -t to watch change of files and rebuild continuously
```To run unit tests, we need to configure the following environment variables.
Additionally, following files will be needed to upload to existing GCS bucket.
* [sample_01.csv](src/test/resources/sample_01.csv)
* [sample_02.csv](src/test/resources/sample_02.csv)
* [missing_02.csv](src/test/resources/missing_02.csv)
* [missing_03.csv](src/test/resources/missing_03.csv)When environment variables are not set, skip some test cases.
```
AZURE_ACCOUNT_NAME
AZURE_ACCOUNT_KEY
AZURE_CONTAINER
AZURE_CONTAINER_IMPORT_DIRECTORY (optional, if needed)
```If you're using Mac OS X El Capitan and GUI Applications(IDE), like as follows.
```xml
$ vi ~/Library/LaunchAgents/environment.plistLabel
my.startup
ProgramArguments
sh
-c
launchctl setenv AZURE_ACCOUNT_NAME my-account-name
launchctl setenv AZURE_ACCOUNT_KEY my-account-key
launchctl setenv AZURE_CONTAINER my-container
launchctl setenv AZURE_CONTAINER_IMPORT_DIRECTORY unittests
RunAtLoad
$ launchctl load ~/Library/LaunchAgents/environment.plist
$ launchctl getenv AZURE_ACCOUNT_NAME //try to get value.Then start your applications.
```