https://github.com/recap-build/recap
Work with your web service, database, and streaming schemas in a single format.
https://github.com/recap-build/recap
data-catalog data-discovery data-engineering data-integration data-pipelines etl metadata recap
Last synced: 5 months ago
JSON representation
Work with your web service, database, and streaming schemas in a single format.
- Host: GitHub
- URL: https://github.com/recap-build/recap
- Owner: gabledata
- License: mit
- Created: 2022-12-07T17:50:44.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-28T21:23:41.000Z (about 1 year ago)
- Last Synced: 2024-12-09T21:35:34.169Z (5 months ago)
- Topics: data-catalog, data-discovery, data-engineering, data-integration, data-pipelines, etl, metadata, recap
- Language: Python
- Homepage: https://recap.build
- Size: 1.41 MB
- Stars: 334
- Watchers: 10
- Forks: 24
- Open Issues: 21
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![]()
## What is Recap?
Recap reads and writes schemas from web services, databases, and schema registries in a standard format.
⭐️ _If you like this project, please give it a star! It helps the project get more visibility._
## Table of Contents
* [What is Recap?](#what-is-recap)
* [Supported Formats](#supported-formats)
* [Install](#install)
* [Usage](#usage)
* [CLI](#cli)
* [Gateway](#gateway)
* [Registry](#registry)
* [API](#api)
* [Docker](#docker)
* [Schema](#schema)
* [Documentation](#documentation)## Supported Formats
| Format | Read | Write |
| :---------- | :-: | :-: |
| [Avro](https://recap.build/docs/integrations/avro/) | ✅ | ✅ |
| [BigQuery](https://recap.build/docs/integrations/bigquery/) | ✅ | |
| [Confluent Schema Registry](https://recap.build/docs/integrations/confluent-schema-registry/) | ✅ | |
| [Hive Metastore](https://recap.build/docs/integrations/hive-metastore/) | ✅ | |
| [JSON Schema](https://recap.build/docs/integrations/json-schema/) | ✅ | ✅ |
| [MySQL](https://recap.build/docs/integrations/mysql/) | ✅ | |
| [PostgreSQL](https://recap.build/docs/integrations/postgresql/) | ✅ | |
| [Protobuf](https://recap.build/docs/integrations/protobuf/) | ✅ | ✅ |
| [Snowflake](https://recap.build/docs/integrations/snowflake/) | ✅ | |
| [SQLite](https://recap.build/docs/integrations/sqlite/) | ✅ | |## Install
Install Recap and all of its optional dependencies:
```bash
pip install 'recap-core[all]'
```You can also select specific dependencies:
```bash
pip install 'recap-core[avro,kafka]'
```See `pyproject.toml` for a list of optional dependencies.
## Usage
### CLI
Recap comes with a command line interface that can list and read schemas from external systems.
List the children of a URL:
```bash
recap ls postgresql://user:pass@host:port/testdb
``````json
[
"pg_toast",
"pg_catalog",
"public",
"information_schema"
]
```Keep drilling down:
```bash
recap ls postgresql://user:pass@host:port/testdb/public
``````json
[
"test_types"
]
```Read the schema for the `test_types` table as a Recap struct:
```bash
recap schema postgresql://user:pass@host:port/testdb/public/test_types
``````json
{
"type": "struct",
"fields": [
{
"type": "int64",
"name": "test_bigint",
"optional": true
}
]
}
```### Gateway
Recap comes with a stateless HTTP/JSON gateway that can list and read schemas from data catalogs and databases.
Start the server at [http://localhost:8000](http://localhost:8000):
```bash
recap serve
```List the schemas in a PostgreSQL database:
```bash
curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb
``````json
["pg_toast","pg_catalog","public","information_schema"]
```And read a schema:
```bash
curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types
``````json
{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}
```The gateway fetches schemas from external systems in realtime and returns them as Recap schemas.
An OpenAPI schema is available at [http://localhost:8000/docs](http://localhost:8000/docs).
### Registry
You can store schemas in Recap's schema registry.
Start the server at [http://localhost:8000](http://localhost:8000):
```bash
recap serve
```Put a schema in the registry:
```bash
curl -X POST \
-H "Content-Type: application/x-recap+json" \
-d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
http://localhost:8000/registry/some_schema
```Get the schema (and version) from the registry:
```bash
curl http://localhost:8000/registry/some_schema
``````json
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
```Put a new version of the schema in the registry:
```bash
curl -X POST \
-H "Content-Type: application/x-recap+json" \
-d '{"type":"struct","fields":[{"type":"int32","name":"test_int","optional":true}]}' \
http://localhost:8000/registry/some_schema
```List schema versions:
```bash
curl http://localhost:8000/registry/some_schema/versions
``````json
[1,2]
```Get a specific version of the schema:
```bash
curl http://localhost:8000/registry/some_schema/versions/1
``````json
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]
```The registry uses [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the [registry](https://recap.build/docs/registry/) docs for more details.
An OpenAPI schema is available at [http://localhost:8000/docs](http://localhost:8000/docs).
### API
Recap has `recap.converters` and `recap.clients` packages.
- Converters convert schemas to and from Recap schemas.
- Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.Read a schema from PostgreSQL:
```python
from recap.clients import create_clientwith create_client("postgresql://user:pass@host:port/testdb") as c:
c.schema("testdb", "public", "test_types")
```Convert the schema to Avro, Protobuf, and JSON schemas:
```python
from recap.converters.avro import AvroConverter
from recap.converters.protobuf import ProtobufConverter
from recap.converters.json_schema import JSONSchemaConverteravro_schema = AvroConverter().from_recap(struct)
protobuf_schema = ProtobufConverter().from_recap(struct)
json_schema = JSONSchemaConverter().from_recap(struct)
```Transpile schemas from one format to another:
```python
from recap.converters.json_schema import JSONSchemaConverter
from recap.converters.avro import AvroConverterjson_schema = """
{
"type": "object",
"$id": "https://recap.build/person.schema.json",
"properties": {
"name": {"type": "string"}
}
}
"""# Use Recap as an intermediate format to convert JSON schema to Avro
struct = JSONSchemaConverter().to_recap(json_schema)
avro_schema = AvroConverter().from_recap(struct)
```Store schemas in Recap's schema registry:
```python
from recap.storage.registry import RegistryStorage
from recap.types import StructType, IntTypestorage = RegistryStorage("file:///tmp/recap-registry-storage")
version = storage.put(
"postgresql://localhost:5432/testdb/public/test_table",
StructType(fields=[IntType(32)])
)
storage.get("postgresql://localhost:5432/testdb/public/test_table")# Get all versions of a schema
versions = storage.versions("postgresql://localhost:5432/testdb/public/test_table")# List all schemas in the registry
schemas = storage.ls()
```### Docker
Recap's gateway and registry are also available as a Docker image:
```bash
docker run \
-p 8000:8000 \
-e RECAP_URLS=["postgresql://user:pass@localhost:5432/testdb"]' \
ghcr.io/recap-build/recap:latest
```See [Recap's Docker documentation](https://recap.build/docs/gateway/docker) for more details.
## Schema
See [Recap's type spec](https://recap.build/specs/type) for details on Recap's type system.
## Documentation
Recap's documentation is available at [recap.build](https://recap.build).