Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nanit/j2g
convert json schema to aws glue schema for terraform
https://github.com/nanit/j2g
Last synced: 8 days ago
JSON representation
convert json schema to aws glue schema for terraform
- Host: GitHub
- URL: https://github.com/nanit/j2g
- Owner: nanit
- License: mit
- Created: 2022-05-24T07:39:02.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-10-26T10:47:54.000Z (about 1 year ago)
- Last Synced: 2024-08-13T07:08:38.599Z (4 months ago)
- Language: Python
- Size: 5.86 KB
- Stars: 4
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - nanit/j2g - convert json schema to aws glue schema for terraform (Python)
README
# JSON Schema to AWS Glue schema converter
## Installation
```bash
pip install git+https://github.com/nanit/j2g.git
```## What?
Converts `pydantic` schemas to `json schema` and then to `AWS glue schema`, so in theory anything that can be converted to JSON Schema *could* also work.
## Why?
When using `AWS Kinesis Firehose` in a configuration that receives JSONs and writes `parquet` files on S3, one needs to define a `AWS Glue` table so Firehose knows what schema to use when creating the parquet files.
AWS Glue let's you define a schema using `Avro` or `JSON Schema` and then to create a table from that schema, but as of *May 2022` there's a limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.
https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform
This is also confirmed by AWS support.
What one could do is create a table set the columns manually, but this means you now have two sources of truth to maintain.
This tool allows you to define a table in `pydantic` and generate a JSON with column types that can be used with `terraform` to create a Glue table.
## Example
Take the following pydantic class
```python
from pydantic import BaseModel
from typing import Listclass Bar(BaseModel):
name: str
age: intclass Foo(BaseModel):
nums: List[int]
bars: List[Bar]
other: str
```Running `j2g`
```bash
python j2g example.py Foo
```you get this JSON
```json
{
"//": "Generated by j2g at 2022-05-25 12:35:55.333570. DO NOT EDIT",
"columns": {
"nums": "array",
"bars": "array>",
"other": "string"
}
}
```and can be used in terraform like that
```terraform
locals {
columns = jsondecode(file("${path.module}/glue_schema.json")).columns
}resource "aws_glue_catalog_table" "table" {
name = "table_name"
database_name = "db_name"storage_descriptor {
dynamic "columns" {
for_each = local.columnscontent {
name = columns.key
type = columns.value
}
}
}
}
```## How it works?
* `pydantic` gets converted to JSON Schema
* the JSON Schema types get mapped to Glue types recursively## Future work
* Not all types are supported, I just add types as I need them, but adding types is very easy, feel free to open issues or send a PR if you stumbled upon an non-supported use case
* the tool could be easily extended to working with JSON Schema directly
* thus anything that can be converted to a JSON Schema should also work.