{"id":19651271,"url":"https://github.com/embulk/embulk-filter-expand_json","last_synced_at":"2025-04-28T16:31:22.742Z","repository":{"id":1759924,"uuid":"44046486","full_name":"embulk/embulk-filter-expand_json","owner":"embulk","description":null,"archived":false,"fork":false,"pushed_at":"2022-10-19T10:23:16.000Z","size":358,"stargazers_count":13,"open_issues_count":0,"forks_count":12,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-16T03:17:42.921Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/embulk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null}},"created_at":"2015-10-11T09:36:40.000Z","updated_at":"2022-07-25T09:00:08.000Z","dependencies_parsed_at":"2022-09-14T14:03:39.601Z","dependency_job_id":null,"html_url":"https://github.com/embulk/embulk-filter-expand_json","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-filter-expand_json","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-filter-expand_json/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-filter-expand_json/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-filter-expand_json/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/embulk","download_url":"https://codeload.github.com/embulk/embulk-filter-expand_json/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251345916,"owners_count":21574806,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T15:05:54.572Z","updated_at":"2025-04-28T16:31:22.051Z","avatar_url":"https://github.com/embulk.png","language":"Java","readme":"# Expand Json filter plugin for Embulk\n\n![Release Status](https://github.com/embulk/embulk-filter-expand_json/actions/workflows/publish.yml/badge.svg?branch=main)\n![Build Status](https://github.com/embulk/embulk-filter-expand_json/actions/workflows/check.yml/badge.svg?branch=main)\n\nexpand columns having json into multiple columns\n\n## Overview\n\n* **Plugin type**: filter\n\n## Configuration\n\n- **json_column_name**: a column name having json to be expanded (string, required)\n- **root**: root property to start fetching each entries, specify in [JsonPath](http://goessner.net/articles/JsonPath/) style (string, default: `\"$.\"`)\n- **expanded_columns**: columns expanded into multiple columns (array of hash, required)\n  - **name**: name of the column. you can define [JsonPath](http://goessner.net/articles/JsonPath/) style.\n  - **type**: type of the column (see below)\n  - **format**: format of the timestamp if type is timestamp\n  - **timezone**: Time zone of each timestamp columns if values don’t include time zone description (`UTC` by default)\n- **keep_expanding_json_column**: Not remove the expanding json column from input schema if it's true (false by default)\n- **default_timezone**: Time zone of timestamp columns if values don’t include time zone description (`UTC` by default)\n- **stop_on_invalid_record**: Stop bulk load transaction if an invalid record is included (false by default)\n- **cache_provider**: Cache provider name for JsonPath. `\"LRU\"` and `\"NOOP\"` are built-in. You can specify user defined class. (string, default: `\"LRU\"`)\n  - `\"NOOP\"` becomes default in the future.\n\n---\n\n**type of the column**\n\n|name|description|\n|:---|:---|\n|boolean|true or false|\n|long|64-bit signed integers|\n|timestamp|Date and time with nano-seconds precision|\n|double|64-bit floating point numbers|\n|string|Strings|\n\n## Example\n\n```yaml\nfilters:\n  - type: expand_json\n    json_column_name: json_payload\n    root: \"$.\"\n    expanded_columns:\n      - {name: \"phone_numbers\", type: string}\n      - {name: \"app_id\", type: long}\n      - {name: \"point\", type: double}\n      - {name: \"created_at\", type: timestamp, format: \"%Y-%m-%d\", timezone: \"UTC\"}\n      - {name: \"profile.anniversary.et\", type: string}\n      - {name: \"profile.anniversary.voluptatem\", type: string}\n      - {name: \"profile.like_words[1]\", type: string}\n      - {name: \"profile.like_words[2]\", type: string}\n      - {name: \"profile.like_words[0]\", type: string}\n```\n\n## Note\n- If the value evaluated by JsonPath is Array or Hash, the value is written as JSON.\n\n## Dependencies\n- https://github.com/jayway/JsonPath\n  - use to evaluate [JsonPath](http://goessner.net/articles/JsonPath/)\n  - [Apache License Version 2.0](https://github.com/jayway/JsonPath/blob/master/LICENSE)\n\n## Development\n\n### Run Example\n\n```\n$ ./gradlew gem\n$ embulk run -Ibuild/gemContents/lib ./example/config.yml\n```\n\n### Build\n\n```\n$ ./gradlew gem  # -t to watch change of files and rebuild continuously\n```\n\n## Benchmark for `cache_provider` option\n\nIn some cases, `cache_provider: NOOP` improves the performance of this plugin by 3 times (https://github.com/civitaspo/embulk-filter-expand_json/pull/41/).\nSo we do a benchmark about `cache_provider`. In our case, `cache_provider: noop` improves the performance by 1.5 times.\n\n|use `expand_json` filter|cache_provider|Time took|records/s|\n|:---|:---|:---|:---|\n|`false`|none|7.62s|1,325,459/s|\n|`true`|`\"LRU\"`|2m9s|78,025/s|\n|`true`|`\"NOOP\"`|1m25s|118,476/s|\n\n\nYou can reproduce the bench by the below way.\n\n```\n./gradlew gem\n./bench/run.sh\n```\n\nFor Maintainers\n----------------\n\n### Release\n\nModify `version` in `build.gradle` at a detached commit, and then tag the commit with an annotation.\n\n```\ngit checkout --detach master\n\n(Edit: Remove \"-SNAPSHOT\" in \"version\" in build.gradle.)\n\ngit add build.gradle\n\ngit commit -m \"Release vX.Y.Z\"\n\ngit tag -a vX.Y.Z\n\n(Edit: Write a tag annotation in the changelog format.)\n```\n\nSee [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) for the changelog format. We adopt a part of it for Git's tag annotation like below.\n\n```\n## [X.Y.Z] - YYYY-MM-DD\n\n### Added\n- Added a feature.\n\n### Changed\n- Changed something.\n\n### Fixed\n- Fixed a bug.\n```\n\nPush the annotated tag, then. It triggers a release operation on GitHub Actions after approval.\n\n```\ngit push -u origin vX.Y.Z\n```\n\n## Contributor\n- @Civitaspo\n- @muga\n- @sakama\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-filter-expand_json","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembulk%2Fembulk-filter-expand_json","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-filter-expand_json/lists"}