{"id":20325454,"url":"https://github.com/getyourguide/parquet-json","last_synced_at":"2026-06-07T17:32:06.571Z","repository":{"id":49043087,"uuid":"274942276","full_name":"getyourguide/parquet-json","owner":"getyourguide","description":"Apache Parquet JSON integration","archived":false,"fork":false,"pushed_at":"2023-11-14T09:59:20.000Z","size":71,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":60,"default_branch":"master","last_synced_at":"2025-01-14T14:26:39.016Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getyourguide.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-25T14:48:28.000Z","updated_at":"2023-11-14T11:59:37.000Z","dependencies_parsed_at":"2023-11-14T10:51:40.846Z","dependency_job_id":null,"html_url":"https://github.com/getyourguide/parquet-json","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getyourguide%2Fparquet-json","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getyourguide%2Fparquet-json/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getyourguide%2Fparquet-json/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getyourguide%2Fparquet-json/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getyourguide","download_url":"https://codeload.github.com/getyourguide/parquet-json/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241828990,"owners_count":20027002,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T19:39:52.224Z","updated_at":"2026-06-07T17:32:06.425Z","avatar_url":"https://github.com/getyourguide.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# parquet-json\nApache Parquet JSON integration\n\n## Description\n\nThis project is a spin-off of the [parquet-mr](https://github.com/apache/parquet-mr) project.\nWe propose to implement a converter to write JsonNode objects to parquet directly without\nintermediately format. To do so, this project implements the `WriteSupport` interface for Jackson\n`JsonNode` objects, and relies on a OpenAPI based schema definition.\n\nThis project is mostly based on the ProtocolBuffer and Avro converters implementations.\n\n## Data mapping\n\n| OpenAPI Type | OpenAPI format | Parquet   | Comment                    |\n|--------------|----------------|-----------|----------------------------|\n| integer      | int16          | int16     | not a valid OAPI type      |\n| integer      | int32          | int32     |                            |\n| integer      | int64          | int64     |                            |\n| integer      | -              | int32     | default format int32       |\n| number       | float          | float     |                            |\n| number       | double         | double    |                            |\n| number       | -              | float     | default format float       |\n| string       | -              | String    | logical type               |\n| string       | password       | String    | logical type               |\n| string       | email          | String    | logical type               |\n| string       | UUID           | String    | to be improved             |\n| string       | byte           | String    | base64 encoded bytes string|\n| string       | binary         | binary    | not supported              |\n| string       | date           | date      | logical type               |\n| string       | date-time      | timestamp | MILLIS precision           |\n| boolean      | -              | boolean   |                            |\n| arrays       | -              | list      | logical type, array of maps not implemented|\n| object       | -              | GroupType |                            |\n| oneOf        | -              | Union     | not implemented            |\n| allOf        | -              |           | not supported              |\n| map          | -              | map       | keys as string only, \"free form\" objects and \"Fixed Keys\" not supported         |\n| enum         | -              | enum      | only string type supported |\n\n## How to use the converter\n\nGiven for example a schema definition in a file `openapi.yaml` as:\n\n```yaml\nopenapi: 3.0.1\ninfo:\n  title: Some schemas\n  description: Some schemas for parquet-json usage example\n  version: 1.0.0\nservers:\n  - url: 'https://getyourguide.com'\npaths: {}\ncomponents:\n  schemas:\n    MyObject:\n      title: MyObject\n      type: object\n      properties:\n        key_string:\n          type: string\n          nullable: false\n          default: 'a string'\n        key_int32:\n          type: integer\n          format: int32\n          nullable: true\n          default: 1\n        is_true:\n          type: boolean\n          nullable: true\n          default: true\n```\n\nthe converter can be used to write a parquet file on the local FS with:\n\n```java\n    Configuration conf = new Configuration();\n    conf.set(\"fs.file.impl\", org.apache.hadoop.fs.LocalFileSystem.class.getName());\n\n    OpenAPI openAPI = new OpenAPIV3Parser().read(\"openapi.yaml\");\n\n    ObjectSchema schema = (ObjectSchema) openAPI.getComponents().getSchemas().get(\"MyObject\");\n\n    ObjectMapper mapper = new ObjectMapper();\n\n    String output = \"./example.parquet\";\n    Path path = new Path(output);\n\n    ParquetWriter\u003cJsonNode\u003e writer =\n        JsonParquetWriter.Builder(path)\n            .withSchema(schema)\n            .withConf(conf)\n            .withCompressionCodec(CompressionCodecName.SNAPPY)\n            .withDictionaryEncoding(true)\n            .withPageSize(1024 * 1024)\n            .build();\n\n    String json =\n        \"{\\\"key_string\\\":\\\"hello\\\",\\\"key_int32\\\":32,\\\"is_true\\\":true}\";\n    JsonNode payload = mapper.readTree(json);\n    \n    writer.write(payload);\n    \n    writer.close();\n```\n\n## Known limitations\n\n- Currently works only with schemas of type `OpenAPI` (https://github.com/swagger-api/swagger-parser/) and data payload of type `JsonNode` (Jackson library).\n- The schema must be fully resolved (no internal or external `ref`)\n- Union types (`oneOf`) not implemented yet\n- Readers (from Parquet to JsonNode/OpenAPI) are not implemented (we don't need this part here at GetYourGuide)\n\n## Contributing\n\nWe welcome pull requests; if you are planning to perform bigger changes then it makes sense to file an issue first.\n\n## Security\nFor sensitive security matters please contact [security@getyourguide.com](mailto:security@getyourguide.com).\n\n## Legal\nCopyright 2020 GetYourGuide GmbH.\n\n`parquet-json` is licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for the full text.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetyourguide%2Fparquet-json","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetyourguide%2Fparquet-json","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetyourguide%2Fparquet-json/lists"}