Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/polaris-catalog/polaris
The interoperable, open source catalog for Apache Iceberg
https://github.com/polaris-catalog/polaris
apache2 catalog iceberg
Last synced: 3 months ago
JSON representation
The interoperable, open source catalog for Apache Iceberg
- Host: GitHub
- URL: https://github.com/polaris-catalog/polaris
- Owner: polaris-catalog
- License: apache-2.0
- Created: 2024-05-29T18:44:27.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-06T07:51:00.000Z (3 months ago)
- Last Synced: 2024-08-06T08:06:40.972Z (3 months ago)
- Topics: apache2, catalog, iceberg
- Language: Python
- Homepage: http://polaris.io/
- Size: 2.74 MB
- Stars: 606
- Watchers: 114
- Forks: 58
- Open Issues: 40
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
- awesome-datalake - Polaris Catalog - Polaris Catalog is an open source catalog for Apache Iceberg. Polaris Catalog implements Iceberg’s open REST API for multi-engine interoperability with Apache Doris, Apache Flink, Apache Spark, PyIceberg, StarRocks and Trino. (Data Catalog)
- awesome-datalake - Polaris Catalog - Polaris Catalog is an open source catalog for Apache Iceberg. Polaris Catalog implements Iceberg’s open REST API for multi-engine interoperability with Apache Doris, Apache Flink, Apache Spark, PyIceberg, StarRocks and Trino. (Data Catalog)
README
![Polaris Catalog Header](docs/img/logos/Polaris-Catalog-BLOG-symmetrical-subhead.png)
Polaris is an open-source, fully-featured catalog for Apache Iceberg™. It implements Iceberg's
[REST API](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml),
enabling seamless multi-engine interoperability across a wide range of platforms, including Apache Doris™, Apache Flink®,
Apache Spark™, StarRocks, and Trino.Documentation is available at https://polaris.io, including
[Polaris management API doc](https://polaris.io/index.html#tag/polaris-management-service_other)
and [Apache Iceberg REST API doc](https://polaris.io/index.html#tag/Configuration-API).## Status
Polaris Catalog is open source under an Apache 2.0 license.- ⭐ Star this repo if you’d like to bookmark and come back to it!
- 📖 Read the announcement blog post for more details!## Building and Running
Polaris is organized into the following modules:
- `polaris-core` - The main Polaris entity definitions and core business logic
- `polaris-server` - The Polaris REST API server
- `polaris-eclipselink` - The Eclipselink implementation of the MetaStoreManager interface
Polaris is built using Gradle with Java 21+ and Docker 27+.
- `./gradlew build` - To build and run tests. Make sure Docker is running, as the integration tests depend on it.
- `./gradlew assemble` - To skip tests.
- `./gradlew test` - To run unit tests and integration tests.
- `./gradlew runApp` - To run the Polaris server locally on localhost:8181.
- The server starts with the in-memory mode, and it prints the auto-generated credentials to STDOUT in a message like this `realm: default-realm root principal credentials: :`
- These credentials can be used as "Client ID" and "Client Secret" in OAuth2 requests (e.g. the `curl` command below).
- `./regtests/run.sh` - To run regression tests or end-to-end tests in another terminal.Running in Docker
- `docker build -t localhost:5001/polaris:latest .` - To build the image.
- `docker run -p 8181:8181 localhost:5001/polaris:latest` - To run the image in standalone mode.
- `docker compose up --build --exit-code-from regtest` - To run regression tests in a Docker environment.Running in Kubernetes
- `./setup.sh` - To run Polaris as a mini-deployment locally. This will create two pods that bind themselves to port `8181`.
- `kubectl get pods` - To check the status of the pods.
- `kubectl get deployment` - To check the status of the deployment.
- `kubectl describe deployment polaris-deployment` - To troubleshoot if things aren't working as expected.Building docs
- Docs are generated using [Redocly](https://redocly.com/docs/cli/installation). To regenerate them, run the following
commands from the project root directory.
```bash
docker run -p 8080:80 -v ${PWD}:/spec docker.io/redocly/cli join spec/docs.yaml spec/polaris-management-service.yml spec/rest-catalog-open-api.yaml -o spec/index.yaml --prefix-components-with-info-prop title
docker run -p 8080:80 -v ${PWD}:/spec docker.io/redocly/cli build-docs spec/index.yaml --output=docs/index.html --config=spec/redocly.yaml
```## Connecting from an Engine
To connect from an engine like Spark, first create a catalog with these steps:
```bash
# Generate a token for the root principal, replacing and with
# the values from the Polaris server output.
export PRINCIPAL_TOKEN=$(curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \
-d 'grant_type=client_credentials&client_id=&client_secret=&scope=PRINCIPAL_ROLE:ALL' \
| jq -r '.access_token')
# Create a catalog named `polaris`
curl -i -X POST -H "Authorization: Bearer $PRINCIPAL_TOKEN" -H 'Accept: application/json' -H 'Content-Type: application/json' \
http://localhost:8181/api/management/v1/catalogs \
-d '{"name": "polaris", "id": 100, "type": "INTERNAL", "readOnly": false, "storageConfigInfo": {"storageType": "FILE"}, "properties": {"default-base-location": "file:///tmp/polaris"}}'
```From here, you can use Spark to create namespaces, tables, etc. More details can be found in the
[Quick Start Guide](https://polaris.io/#section/Quick-Start/Using-Iceberg-and-Polarise).### Trademark Attribution
_Apache Iceberg, Iceberg, Apache Spark, Spark, Apache Flink, Flink, Apache Doris, Doris, Apache, the Apache feather logo, the Apache Iceberg project logo, the Apache Spark project logo, the Apache Flink project logo, and the Apache Doris project logo are either registered trademarks or trademarks of The Apache Software Foundation. Copyright © 2024 The Apache Software Foundation._