Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jacobstanley/hadoop-tools
Tools for working with Hadoop, written with performance in mind.
https://github.com/jacobstanley/hadoop-tools
hadoop haskell hdfs
Last synced: about 1 month ago
JSON representation
Tools for working with Hadoop, written with performance in mind.
- Host: GitHub
- URL: https://github.com/jacobstanley/hadoop-tools
- Owner: jacobstanley
- License: other
- Created: 2014-09-01T05:20:06.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2017-12-08T16:48:51.000Z (about 7 years ago)
- Last Synced: 2024-04-26T00:42:01.721Z (8 months ago)
- Topics: hadoop, haskell, hdfs
- Language: Haskell
- Size: 265 KB
- Stars: 37
- Watchers: 9
- Forks: 15
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Hadoop Tools [![Hackage][hackage-shield]][hackage] [![Travis][travis-shield]][travis] [![Circle CI][circleci-shield]][circleci]
Tools for working with Hadoop written with performance in mind.
*This has been tested with the HDFS protocol used by CDH 5.x*
## Where can I get it?
See our latest release [v1.0.1](https://github.com/jystic/hadoop-tools/releases/tag/v1.0.1)!
## Configuration
By default, `hh` will behave the same as `hdfs dfs` or `hadoop fs` in
terms of which user name to use for HDFS, or which namenodes to use.### User
The default is to use your current unix username when accessing HDFS.
This can be overridden either by using the `HADOOP_USER_NAME`
environment variable:```bash
# This trick also works with `hdfs dfs` and `hadoop fs`
export HADOOP_USER_NAME=amber
```or by adding the following to your `~/.hh` configuration file:
```config
hdfs {
user = "amber"
}
```### Namenode
The default is to lookup the namenode configuration from
`/etc/hadoop/conf/core-site.xml` and `/etc/hadoop/conf/hdfs-site.xml`.This can be overridden by adding the following to your `~/.hh`
configuration file:```config
namenode {
host = "hostname or ip address"
}
```or if you're using a non-standard namenode port:
```config
namenode {
host = "hostname or ip address"
port = 7020 # defaults to 8020
}
```*NOTE: You cannot currently specify multiple namenodes using the `~/.hh`
config file, but this would be easy to add. If you would like this
feature then please add an
[issue](https://github.com/jystic/hadoop-tools/issues).*### SOCKS Proxy
Sometimes it can be convenient to access HDFS over a SOCKS proxy. The
easiest way to get this to work is to connect to a server which can
access the namenode using `ssh -D1080`. This sets up a SOCKS
proxy locally on port `1080` which can access everything that ``
can access.To get `hh` to make use of this proxy, add the following to your `~/.hh`
configuration file:```config
proxy {
host = "127.0.0.1"
port = 1080
}
```### Kerberos / SASL
In order to use Kerberos authentication you must supply information about
the `principal` for both your user and your namenode. These are looked up
in `/etc/hadoop/conf/core-site.xml` and `/etc/hadoop/conf/hdfs-site.xml` by
default.```config
namenode {
principal = "hdfs/[email protected]"
}auth {
user = "[email protected]"
}
```If you don't provide an `auth.user` field it will assume it is
`[email protected]`, where `REALM.COM` cames from the principal of
one of the namenodes.[hackage]: http://hackage.haskell.org/package/hadoop-tools
[hackage-shield]: http://img.shields.io/hackage/v/hadoop-tools.svg?style=flat[travis]: https://travis-ci.org/jystic/hadoop-tools
[travis-shield]: https://travis-ci.org/jystic/hadoop-tools.svg?branch=master[circleci]: https://circleci.com/gh/jystic/hadoop-tools
[circleci-shield]: http://img.shields.io/circleci/project/jystic/hadoop-tools.svg?style=flat