https://github.com/implydata/druid-hadoop-inputformat
Hadoop InputFormat for http://druid.io/
https://github.com/implydata/druid-hadoop-inputformat
Last synced: 4 months ago
JSON representation
Hadoop InputFormat for http://druid.io/
- Host: GitHub
- URL: https://github.com/implydata/druid-hadoop-inputformat
- Owner: implydata
- License: apache-2.0
- Created: 2016-10-26T14:56:49.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-10-26T14:57:15.000Z (over 9 years ago)
- Last Synced: 2024-04-14T20:22:57.915Z (about 2 years ago)
- Language: Java
- Size: 11.7 KB
- Stars: 10
- Watchers: 6
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Druid Hadoop InputFormat
This is a Hadoop InputFormat that can be used to load Druid data from deep storage.
### Installation
To install this library, run `mvn install`. You can then include it in projects with Maven by using the dependency:
```xml
io.imply
druid-hadoop-inputformat
0.1-SNAPSHOT
```
### Example
Here's an example of creating an RDD in Spark:
```java
final JobConf jobConf = new JobConf();
final String coordinatorHost = "localhost:8081";
final String dataSource = "wikiticker";
final List intervals = null; // null to include all time
final DimFilter filter = null; // null to include all rows
final List columns = null; // null to include all columns
DruidInputFormat.setInputs(
jobConf,
coordinatorHost,
dataSource,
intervals,
filter,
columns
);
final JavaPairRDD rdd = jsc.newAPIHadoopRDD(
jobConf,
DruidInputFormat.class,
NullWritable.class,
InputRow.class
);
```