https://github.com/prateek/wasb-parcel
https://github.com/prateek/wasb-parcel
Last synced: over 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/prateek/wasb-parcel
- Owner: prateek
- Created: 2014-08-08T07:06:18.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2014-08-16T21:13:39.000Z (almost 12 years ago)
- Last Synced: 2025-01-09T08:38:21.720Z (over 1 year ago)
- Language: Shell
- Size: 1.31 MB
- Stars: 0
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
HADOOP WASB Parcel
==================
This repository provides a parcel(https://github.com/cloudera/cm_ext) to install the required jars for Azure's Blob ( `wasb` ) bindings for Hadoop, to be used with Cloudera Manager.
# Install Steps
0. Install Prerequisites: `cloudera/cm_ext`
```sh
cd /tmp
git clone https://github.com/cloudera/cm_ext
cd cm_ext/validator
mvn install
```
1. Create parcel:
```sh
cd /tmp
git clone http://github.com/prateek/wasb-parcel
cd wasb-parcel
POINT_VERSION=5 VALIDATOR_DIR=/tmp/cm_ext ./build-parcel.sh
cd build
python -m SimpleHTTPServer 14641
```
2. The commands above create a local directory and webserver to serve this parcel as a repository. Follow these [detailed instructions](http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Installation-Guide/cm5ig_create_local_parcel_repo.html) to add the repository to Cloudera Manager.
3. Install, Activate & Distribute the HADOOP_WASB parcel.
4. Add the following to the `core-site.xml` safety-valve for the HDFS service:
```xml
fs.azure.account.key.[STORAGE_ACCOUNT].blob.core.windows.net
[SECRET_KEY]
```
*Note* Multiple such entries may be added
5. Deploy Client Configs and Restart the cluster.
6. Example usage:
```sh
# HDFS access for CLI or M/R
$ hdfs dfs -ls wasb://[CONTAINER]@[STORAGE_ACCOUNT].blob.core.windows.net/
# Create sample directory
$ seq 1 10000 | paste - - - - - - - - -d',' > sample.txt
$ hdfs dfs -mkdir wasb://[CONTAINER]@[STORAGE_ACCOUNT].blob.core.windows.net/sample-dir
$ hdfs dfs -put sample.txt wasb://[CONTAINER]@[STORAGE_ACCOUNT].blob.core.windows.net/sample-dir
# Access using Hive
$ hive << EOF
create external table wasb_sample
( a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7 int, a8 int )
row format delimited
fields terminated by ','
location 'wasb://[CONTAINER]@[STORAGE_ACCOUNT].blob.core.windows.net/sample-dir';
EOF
$ hive -e 'select count(*) from wasb_sample'
# Access using Spark
$ spark-shell <