https://github.com/treeverse/lakefs-iceberg
A custom Iceberg catalog implementation for lakeFS
https://github.com/treeverse/lakefs-iceberg
Last synced: 14 days ago
JSON representation
A custom Iceberg catalog implementation for lakeFS
- Host: GitHub
- URL: https://github.com/treeverse/lakefs-iceberg
- Owner: treeverse
- License: apache-2.0
- Created: 2023-05-26T13:57:55.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-11-07T14:27:41.000Z (over 2 years ago)
- Last Synced: 2025-02-27T17:31:36.210Z (over 1 year ago)
- Language: Java
- Homepage:
- Size: 807 KB
- Stars: 1
- Watchers: 8
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

## lakeFS Iceberg Catalog
lakeFS enriches your Iceberg tables with Git capabilities: create a branch and make your changes in isolation, without affecting other team members.
See the instructions below on how to use it, and check out the integration in action in the [lakeFS samples repository](https://github.com/treeverse/lakeFS-samples/).
### Install
Use the following Maven dependency to install the lakeFS custom catalog:
```xml
io.lakefs
lakefs-iceberg
0.1.4
```
### Configure
Here is how to configure the lakeFS custom catalog in Spark:
```scala
conf.set("spark.sql.catalog.lakefs", "org.apache.iceberg.spark.SparkCatalog");
conf.set("spark.sql.catalog.lakefs.catalog-impl", "io.lakefs.iceberg.LakeFSCatalog");
conf.set("spark.sql.catalog.lakefs.warehouse", "lakefs://example-repo");
```
You will also need to configure the S3A Hadoop FileSystem to interact with lakeFS:
```scala
conf.set("fs.s3a.access.key", "AKIAlakefs12345EXAMPLE")
conf.set("fs.s3a.secret.key", "abc/lakefs/1234567bPxRfiCYEXAMPLEKEY")
conf.set("fs.s3a.endpoint", "https://example-org.us-east-1.lakefscloud.io")
conf.set("fs.s3a.path.style.access", "true")
```
### Create a table
To create a table on your main branch, use the following syntax:
```sql
CREATE TABLE lakefs.main.table1 (id int, data string);
```
### Create a branch
We can now commit the creation of the table to the main branch:
```
lakectl commit lakefs://example-repo/main -m "my first iceberg commit"
```
Then, create a branch:
```
lakectl branch create lakefs://example-repo/dev -s lakefs://example-repo/main
```
### Make changes on the branch
We can now make changes on the branch:
```sql
INSERT INTO lakefs.dev.table1 VALUES (3, 'data3');
```
### Query the table
If we query the table on the branch, we will see the data we inserted:
```sql
SELECT * FROM lakefs.dev.table1;
```
Results in:
```
+----+------+
| id | data |
+----+------+
| 1 | data1|
| 2 | data2|
| 3 | data3|
+----+------+
```
However, if we query the table on the main branch, we will not see the new changes:
```sql
SELECT * FROM lakefs.main.table1;
```
Results in:
```
+----+------+
| id | data |
+----+------+
| 1 | data1|
| 2 | data2|
+----+------+
```