Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/mooso/azure-tables-hadoop

A Hadoop input format and a Hive storage handler so that you can access data stored in Windows Azure Storage tables from within a Hadoop (or HdInsight) cluster.
https://github.com/mooso/azure-tables-hadoop

Last synced: about 1 month ago
JSON representation

A Hadoop input format and a Hive storage handler so that you can access data stored in Windows Azure Storage tables from within a Hadoop (or HdInsight) cluster.

Lists

README

        

# azure-tables-hadoop

## Description

A Hadoop input format and a Hive storage handler so that you can access data stored in Windows Azure Storage tables from within a Hadoop (or HdInsight) cluster.

## Usage

### From Hive

Make sure the jar is in hive.aux.jars.path (add it in your hive-site.xml). Example:


hive.aux.jars.path
file:///c:/azure-tables-hadoop/target/microsoft-hadoop-azure-0.0.1.jar

Create the table like this:

CREATE EXTERNAL TABLE az_table(intField int, stringField string)
STORED BY 'com.microsoft.hadoop.azure.hive.AzureTableHiveStorageHandler'
TBLPROPERTIES(
"azure.table.name" = "",
"azure.table.account.uri" = "http://.table.core.windows.net",
"azure.table.storage.key" = ""
);

#### Caveats

* Column names should match (case doesn't matter)
* Only {string, int, bigint, double, boolean} types are supported
* Storing data into the table is not supported

### From Map-Reduce

There's an example in the code, SixCounter, that you can use to guide you. Short story: use AzureTableConfiguration.configureInputTable() to configure the input table, and set your input format as AzureTableInputFormat.