https://github.com/apache/Gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://github.com/apache/Gravitino
ai-catalog data-catalog datalake federated-query lakehouse metadata metalake model-catalog opendatacatalog skycomputing stratosphere
Last synced: 6 days ago
JSON representation
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
- Host: GitHub
- URL: https://github.com/apache/Gravitino
- Owner: apache
- License: apache-2.0
- Created: 2023-04-23T02:09:00.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-29T08:05:40.000Z (11 months ago)
- Last Synced: 2024-10-29T09:23:17.639Z (11 months ago)
- Topics: ai-catalog, data-catalog, datalake, federated-query, lakehouse, metadata, metalake, model-catalog, opendatacatalog, skycomputing, stratosphere
- Language: Java
- Homepage: https://gravitino.apache.org
- Size: 35.2 MB
- Stars: 1,022
- Watchers: 29
- Forks: 317
- Open Issues: 504
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
- Governance: GOVERNANCE.md
- Roadmap: ROADMAP.md
Awesome Lists containing this project
- awesome-datalake - Apache Gravitino - Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the metadata directly in different sources, types, and regions. It also provides users with unified metadata access for data and AI assets. (Data Catalog)
- awesome-datalake - Apache Gravitino - Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the metadata directly in different sources, types, and regions. It also provides users with unified metadata access for data and AI assets. (Data Catalog)
README
# Apache Gravitino™
[](https://github.com/apache/gravitino/actions/workflows/build.yml)
[](https://github.com/apache/gravitino/actions/workflows/integration-test.yml)
[](https://github.com/apache/gravitino/blob/main/LICENSE)
[](https://github.com/apache/gravitino/graphs/contributors)
[](https://github.com/apache/gravitino/releases)
[](https://github.com/apache/gravitino/issues)
[](https://github.com/apache/gravitino/commits/main/)
[](https://www.bestpractices.dev/projects/8358)## Introduction
Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.

## 🚀 Key Features
- **Unified Metadata Management**: Manage diverse metadata sources through a single model and API (e.g., Hive, MySQL, HDFS, S3).
- **End-to-End Data Governance**: Features like access control, auditing, and discovery across all metadata assets.
- **Direct Metadata Integration**: Changes in underlying systems are immediately reflected via Gravitino’s connectors.
- **Geo-Distribution Support**: Share metadata across regions and clouds to support global architectures.
- **Multi-Engine Compatibility**: Seamlessly integrates with query engines without modifying SQL dialects.
- **AI Asset Management (WIP)**: Support for AI model and feature tracking.## 🌐 Common Use Cases
- Federated metadata discovery across data lakes and data warehouses
- Multi-region metadata synchronization for hybrid or multi-cloud setups
- Data and AI asset governance with unified audit and access control
- Plug-and-play access for engines like Trino or Spark
- Support for evolving metadata standards, including AI model lineage## 📚 Documentation
The latest Gravitino documentation is available at [gravitino.apache.org/docs/latest](https://gravitino.apache.org/docs/latest/).
This README provides a basic overview; visit the site for full installation, configuration, and development documentation.
## 🧪 Quick Start
### Use Gravitino Playground (Recommended)
Gravitino provides a Docker Compose–based playground for a full-stack experience.
Clone or download the [Gravitino Playground repository](https://github.com/apache/gravitino-playground) and follow its [README](https://github.com/apache/gravitino-playground/blob/main/README.md).### Run Gravitino Locally
1. [Download](https://gravitino.apache.org/downloads) and extract a binary release.
2. Edit `conf/gravitino.conf` to configure settings.
3. Start the server:```bash
./bin/gravitino.sh start
```4. To stop:
```bash
./bin/gravitino.sh stop
```Press `CTRL+C` to stop.
## 🧊 Iceberg REST Catalog
Gravitino provides a native Iceberg REST catalog service.
See: [Iceberg REST catalog service](https://gravitino.apache.org/docs/latest/iceberg-rest-service/)## 🔌 Trino Integration
Gravitino includes a Trino connector for federated metadata access.
See: [Using Trino with Gravitino](https://gravitino.apache.org/docs/latest/trino-connector/index/)## 🛠️ Building from Source
Gravitino uses Gradle. Windows is not currently supported.
Clean build without tests:
```bash
./gradlew clean build -x test
```Build a distribution:
```bash
./gradlew compileDistribution -x test
```Or compressed package:
```bash
./gradlew assembleDistribution -x test
```Artifacts are output to the `distribution/` directory.
More build options: [How to build Gravitino](https://gravitino.apache.org/docs/latest/how-to-build/)
## 👨💻 Developer Resources
- [How to build Gravitino](https://gravitino.apache.org/docs/latest/how-to-build/)
- [How to test Gravitino](https://gravitino.apache.org/docs/latest/how-to-test/)
- [Publish Docker images](https://gravitino.apache.org/docs/latest/publish-docker-images)## 🤝 Contributing
We welcome all kinds of contributions—code, documentation, testing, connectors, and more!
To get started, please read our [CONTRIBUTING.md](CONTRIBUTING.md) guide.
## 🔗 ASF Resources
- 📬 Mailing List: [dev@gravitino.apache.org](mailto:dev@gravitino.apache.org) ([subscribe](mailto:dev-subscribe@gravitino.apache.org))
- 🐞 Issue Tracker: [GitHub Issues](https://github.com/apache/gravitino/issues)## 🪪 License
Apache Gravitino is licensed under the Apache License, Version 2.0.
See the [LICENSE](LICENSE) file for details.Apache®, Apache Gravitino™, Apache Hadoop®, Apache Hive™, Apache Iceberg™, Apache Kafka®, Apache Spark™, Apache Submarine™, Apache Thrift™, and Apache Zeppelin™ are trademarks of the Apache Software Foundation in the United States and/or other countries.
![]()