https://github.com/treeverse/lakefs-iceberg

A custom Iceberg catalog implementation for lakeFS
https://github.com/treeverse/lakefs-iceberg

Last synced: 14 days ago
JSON representation

A custom Iceberg catalog implementation for lakeFS

Host: GitHub
URL: https://github.com/treeverse/lakefs-iceberg
Owner: treeverse
License: apache-2.0
Created: 2023-05-26T13:57:55.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-11-07T14:27:41.000Z (over 2 years ago)
Last Synced: 2025-02-27T17:31:36.210Z (over 1 year ago)
Language: Java
Homepage:
Size: 807 KB
Stars: 1
Watchers: 8
Forks: 1
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

                    

## lakeFS Iceberg Catalog

lakeFS enriches your Iceberg tables with Git capabilities: create a branch and make your changes in isolation, without affecting other team members.

See the instructions below on how to use it, and check out the integration in action in the [lakeFS samples repository](https://github.com/treeverse/lakeFS-samples/).

### Install

Use the following Maven dependency to install the lakeFS custom catalog:

```xml

  io.lakefs

  lakefs-iceberg

  0.1.4

```

### Configure

Here is how to configure the lakeFS custom catalog in Spark:

```scala

conf.set("spark.sql.catalog.lakefs", "org.apache.iceberg.spark.SparkCatalog");

conf.set("spark.sql.catalog.lakefs.catalog-impl", "io.lakefs.iceberg.LakeFSCatalog");

conf.set("spark.sql.catalog.lakefs.warehouse", "lakefs://example-repo");

```

You will also need to configure the S3A Hadoop FileSystem to interact with lakeFS:

```scala

conf.set("fs.s3a.access.key", "AKIAlakefs12345EXAMPLE")

conf.set("fs.s3a.secret.key", "abc/lakefs/1234567bPxRfiCYEXAMPLEKEY")

conf.set("fs.s3a.endpoint", "https://example-org.us-east-1.lakefscloud.io")

conf.set("fs.s3a.path.style.access", "true")

```

### Create a table

To create a table on your main branch, use the following syntax:

```sql

CREATE TABLE lakefs.main.table1 (id int, data string);

```

### Create a branch

We can now commit the creation of the table to the main branch:

```

lakectl commit lakefs://example-repo/main -m "my first iceberg commit"

```

Then, create a branch:

```

lakectl branch create lakefs://example-repo/dev -s lakefs://example-repo/main

```

### Make changes on the branch

We can now make changes on the branch:

```sql

INSERT INTO lakefs.dev.table1 VALUES (3, 'data3');

```

### Query the table

If we query the table on the branch, we will see the data we inserted:

```sql

SELECT * FROM lakefs.dev.table1;

```

Results in:

```

+----+------+

| id | data |

+----+------+

| 1  | data1|

| 2  | data2|

| 3  | data3|

+----+------+

```

However, if we query the table on the main branch, we will not see the new changes:

```sql

SELECT * FROM lakefs.main.table1;

```

Results in:

```

+----+------+

| id | data |

+----+------+

| 1  | data1|

| 2  | data2|

+----+------+

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/treeverse/lakefs-iceberg

Awesome Lists containing this project

README