https://github.com/anskarl/druid-docker-cluster
Dockerized Apache Druid for testing and development
https://github.com/anskarl/druid-docker-cluster
docker docker-compose docker-image druid
Last synced: 10 months ago
JSON representation
Dockerized Apache Druid for testing and development
- Host: GitHub
- URL: https://github.com/anskarl/druid-docker-cluster
- Owner: anskarl
- License: apache-2.0
- Created: 2019-05-25T13:13:32.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-01-24T15:39:15.000Z (over 6 years ago)
- Last Synced: 2025-04-12T14:22:45.648Z (about 1 year ago)
- Topics: docker, docker-compose, docker-image, druid
- Language: Shell
- Size: 1.82 MB
- Stars: 23
- Watchers: 2
- Forks: 9
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.TXT
Awesome Lists containing this project
README
# Dockerized Apache Druid cluster
This project demostrates how you can setup a Dockerized example/development [Apache Druid](http://druid.io/) cluster.
The cluster is being composed of the following components:
- S3 Compatible Object Storage [**MinIO**](https://min.io) for Deep storage
- [**PostgreSQL**](https://www.postgresql.org/) for metadata storage
- [**Zookeeper**](https://zookeeper.apache.org/) for internal service discovery, coordination, and leader election
- [**Apache Druid**](http://druid.io/) platform:
* **Middle Manager** to handle the ingestion of data into the cluster
* **Historical** to handle the storage and querying on “historical” data
* **Broker** to receive queries from external clients
* **Coordinator** to assign segments to Historical nodes
* **Overlord** to assign ingestion tasks to Middle Managers and to coordinate segment publishing
* **Router** provides a unified API gateway in front of Brokers, Overlords and Coordinators
### Instructions to build Druid image
```
make image
```
or by using docker-compose
```
docker-compose build
```
You can also specify the version of Druid to build, for example:
```
make DRUID_VERSION=0.14.1-incubating image
```
or by using docker-compose
```
docker-compose build --build-arg ARG_DRUID_VERSION=0.14.1-incubating
```
### Run the cluster
```
docker-compose up
```
or to run in the backgroumd:
```
docker-compose up -d
```
After a while the Druid console should be available in [http://localhost:8888](http://localhost:8888)
### Load example data
For example data we are using a subset of the [NYC Taxi & Limousine Commission - Trip Record Data](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page), specifically from months 2015-01 to 2015-03.
```
cd dataset
./03-load_to_druid.sh
```
Please note that you can download data for different months and adjust the sample size by adjusting the parameters of `./dataset/01-download.sh` and `./dataset/02-create_sample_tripdata.sh`.
The schema of the dataset and the indexing task is being defined in `./dataset/yellow_tripdata-index.json`
...enjoy :)