https://github.com/almogtavor/iceberg-applications
Out-of-the-box applications for implementing common Apache Iceberg tasks
https://github.com/almogtavor/iceberg-applications
Last synced: about 2 months ago
JSON representation
Out-of-the-box applications for implementing common Apache Iceberg tasks
- Host: GitHub
- URL: https://github.com/almogtavor/iceberg-applications
- Owner: almogtavor
- Created: 2022-11-19T22:41:14.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-08-13T19:11:24.000Z (about 1 year ago)
- Last Synced: 2025-05-25T08:03:59.698Z (5 months ago)
- Language: Java
- Size: 160 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Iceberg Applications 🧊
A collection of out-of-the-box Spring Boot based Apache Spark applications that perform common tasks regarding Apache Iceberg.
Currently, the existing applications are:
* `kafka2iceberg` - A pipeline that reads data from Kafka and writes to Iceberg.
* `iceberg-maintainer` - A program that executes Iceberg maintenance tasks.For Local Usage & Development:
## Local Usage & Development
### Step 1: Set Up the Environment Using Docker Compose
To run iceberg-application locally, you need to set up the required environment using Docker Compose.
#### General Environment:
Use the Docker Compose file located at environment/compose/environment-docker-compose.yaml.
This setup includes MinIO S3, Kafka, and Zookeeper (with Kafka UI).
#### Iceberg Catalog Setup:Depending on your Iceberg catalog configuration, bring up one of the following Docker Compose files:
* `environment/compose/nessie-docker-compose.yaml` (for Nessie catalog)
* `environment/compose/postgres-docker-compose.yaml` (for Postgres-based catalog)
* If you are using an S3-based catalog (e.g., Hadoop catalog), no additional containers are required.#### Configuration:
Configure each application in the Spring `application.yaml` file. Set the catalog type using `spring.iceberg.catalog-type={hadoop/hive/jdbc}`.
### Step 2: Produce Data to Kafka
Run the [DevSamplePojoKafkaProducer.java](kafka2iceberg%2Fsrc%2Fmain%2Fjava%2Fio%2Fgithub%2Falmogtavor%2FDevSamplePojoKafkaProducer.java) script to produce sample data to Kafka.### Step 3: Execute the Kafka2Iceberg Service
#### 1. Hadoop Setup:
Download the [Hadoop Binaries](https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin) and place them locally at `C:/hadoop`.
Ensure the binaries are located at `C:/hadoop/hadoop-2.7.1`.
Environment Variables:In your IntelliJ run configurations, set the following environment variables:
`HADOOP_HOME=C:\hadoop\hadoop-2.7.1;PATH=C:\hadoop\hadoop-2.7.1\bin`#### 2. Spring Boot Profile:
Set the Spring Boot profile to either `jdbc` or `nessie`, depending on your catalog type.
#### 3. VM Options:
Set the VM options to: `--add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --enable-preview`.
### Step 4: View your Iceberg Table at the Minio console
Enter `locahost:9001`, and checkout your bucket to verify the Kafka2Iceberg have successfully created an Iceberg table:
### Step 5: Run the Iceberg Maintainer
* Run the iceberg-maintainer application in the same manner as Kafka2Iceberg.
* After the files have been merged, check your MinIO bucket again to see the changes.