https://github.com/yanivzalach/icegraph
Interactive metadata visualizer for Apache Iceberg. Trace snapshots, manifests, and file lineages through a real-time, graph-based UI.
https://github.com/yanivzalach/icegraph
iceberg iceberg-visualization python python3 spark spark-connect spark-sql visualization
Last synced: about 1 month ago
JSON representation
Interactive metadata visualizer for Apache Iceberg. Trace snapshots, manifests, and file lineages through a real-time, graph-based UI.
- Host: GitHub
- URL: https://github.com/yanivzalach/icegraph
- Owner: YanivZalach
- License: mit
- Created: 2026-02-21T16:45:12.000Z (3 months ago)
- Default Branch: master
- Last Pushed: 2026-04-10T19:45:38.000Z (about 1 month ago)
- Last Synced: 2026-04-10T21:35:49.806Z (about 1 month ago)
- Topics: iceberg, iceberg-visualization, python, python3, spark, spark-connect, spark-sql, visualization
- Language: JavaScript
- Homepage:
- Size: 5.73 MB
- Stars: 3
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
#
IceGraph
**IceGraph** provides an interactive, hierarchical view of **Apache Iceberg** metadata. It maps the DNA of your tables—from root metadata down to individual data and delete files.
Look at Live Demo! [https://yanivzalach.github.io/IceGraph/](https://yanivzalach.github.io/IceGraph/)
> **Opinionated Design**: IceGraph is built exclusively for **Spark Connect** backends.
> **Table Version**: Currently IceGraph officially supports Table Version 2.
## đź› Features
* **Read-Only**: The application is read-only and does not modify the table.
* **Time-Travel**: View the physical state of your table as of any `datetime`.
* **Metadata Inspector**: Displaying record counts, stats, and file paths.
* **Table History**: Trace every metadata evolution, from schema changes to snapshot writes, across the full lifetime of the table.
* **Table File Browser**: See your table's files group by partition, just like you use to.
* **Branches**: View all the branches of the table, even when detached from the main branch.
> **Recommended**: In production, use a user with read-only permissions for the Spark Connect server, for extra peace of mind.
## Mock Data Example Using Docker
Clone the repo, and in it, go to:
```
cd docker_demo
```
Run the docker compose:
```
docker compose up
```
Go to `http://localhost:5000` and explore table `default.events` and table `default.logging`.
Recommended: Change the `TIMEZONE` variable in the docker compose to your timezone name.
## Quick Start Using Docker
The easiest way to run IceGraph is via [DockerHub](https://hub.docker.com/r/yanivzalach/icegraph)
### Spark connect 3.5.4
```bash
docker run -e SPARK_REMOTE=sc://:15002 -e TIMEZONE=my/timezone -p 5000:5000 yanivzalach/icegraph:latest
```
### Other Spark Connect versions
Clone the repo, update the Spark Connect version in `backend/pyproject.toml`, then build from the project root:
```bash
docker build -t icegraph .
```
Then run with the same command:
```bash
docker run -e SPARK_REMOTE=sc://:15002 -e TIMEZONE=my/timezone -p 5000:5000 icegraph
```
## Start Using Source Code
### Prerequisites
- npm
- UV (python)
- Python 3.9
### 1. Setup
Sync the environments:
```bash
cd backend
uv sync
```
```bash
cd frontend
npm i
```
### 2. Setup your Envs
We will create an `.env` file in the root of the backend directory:
```bash
TIMEZONE=my/timezone # Put your local timezone name
SPARK_REMOTE=sc://localhost:15002 # Our local testing spark, If you use docker, change it to your ip.
```
### 3. Run
Open one terminal in the backend directory and run:
```bash
uv run python main.py
```
Open a second terminal in the front end directory and run:
```bash
npm run dev
```
Go to `http://localhost:3000` and explore your tables.