Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dremio/iceland
https://github.com/dremio/iceland
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/dremio/iceland
- Owner: dremio
- License: apache-2.0
- Created: 2023-10-19T17:22:03.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-05T13:30:06.000Z (11 months ago)
- Last Synced: 2024-04-16T17:21:28.317Z (8 months ago)
- Size: 9.77 KB
- Stars: 1
- Watchers: 6
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Iceland, Apache Iceberg playground
Iceland is a playground project for Apache Iceberg.
It contains several components used to easily run, test, benchmark Apache Iceberg, with different flavors (different catalogs, different engines, ...).
The objectives are:
1. Provide resources (docker images, helm charts, notebooks, ...) to easily start with Apache Iceberg using different flavors
2. Easily run use cases with concrete datasets, representing concrete usage
3. Compare the performance on the use cases with the different flavorsThe components are:
* `datasets` contains concrete data used in the samples and tests
* `usecases` contains samples and examples using the datasets
* `benchmark` contains use cases benchmark
* `icekube` contains Docker images, HELM charts, ... Basically everything needed to start with Apache Iceberg## Datasets
### GDELT
GDELT Project stores all new articles as "events": http://data.gdeltproject.org/events/index.html
Daily, a zip file is created, containing a CSV file with all events using the following format:
```
545037848 20150530 201505 2015 2015.4110 JPN TOKYO JPN 1 046 046 04 1 7.0 15 1 15 -1.06163552535792 0 4 Tokyo, Tokyo, Japan JA JA40 35.685 139.751 -246227 4 Tokyo, Tokyo, Japan JA JA40 35.685 139.751 -246227 20160529 http://deadline.com/print-article/1201764227/
```The format is described here: http://data.gdeltproject.org/documentation/GDELT-Data_Format_Codebook.pdf
### TPCDS
## Use cases
### Data Ingestion to Iceberg using Spark
* `gdelt/spark/di`
### Q1: Events By Location
* `gdelt/spark/q1`
This query extracts all events for a specific location, using Spark engine.
## Performance Benchmark
## Icekube
* `icekube`
### Flavored Docker Images
### Notebook
### Helm Charts