{"id":19651272,"url":"https://github.com/embulk/embulk-filter-column","last_synced_at":"2025-05-07T10:41:13.745Z","repository":{"id":47122891,"uuid":"37898603","full_name":"embulk/embulk-filter-column","owner":"embulk","description":"A filter plugin for Embulk to filter out columns","archived":false,"fork":false,"pushed_at":"2023-08-22T04:59:41.000Z","size":475,"stargazers_count":44,"open_issues_count":1,"forks_count":10,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-05-06T11:10:20.492Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/embulk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null}},"created_at":"2015-06-23T05:12:45.000Z","updated_at":"2023-01-08T15:44:28.000Z","dependencies_parsed_at":"2022-09-04T09:22:39.261Z","dependency_job_id":"5e038beb-a8c5-4b30-aa1e-9082bbdb8395","html_url":"https://github.com/embulk/embulk-filter-column","commit_stats":{"total_commits":139,"total_committers":8,"mean_commits":17.375,"dds":0.2230215827338129,"last_synced_commit":"0ac1d5e8804e1e0ab1312d4e072879cc17ffbd0e"},"previous_names":["sonots/embulk-filter-column"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-filter-column","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-filter-column/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-filter-column/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embulk%2Fembulk-filter-column/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/embulk","download_url":"https://codeload.github.com/embulk/embulk-filter-column/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252861703,"owners_count":21815732,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T15:05:54.703Z","updated_at":"2025-05-07T10:41:13.724Z","avatar_url":"https://github.com/embulk.png","language":"Java","funding_links":[],"categories":["Java"],"sub_categories":[],"readme":"# Column filter plugin for Embulk\n\n![Build Status](https://github.com/embulk/embulk-filter-column/actions/workflows/check.yml/badge.svg?branch=master)\n\nA filter plugin for Embulk to filter out columns\n\n## Configuration\n\n- **columns**: columns to retain (array of hash)\n  - **name**: name of column (required)\n  - **src**: src column name to be copied (optional, default is `name`)\n  - **default**: default value used if input is null (optional)\n  - **type**: type of the default value (required for `default`)\n  - **format**: special option for timestamp column, specify the format of the default timestamp (string, default is `default_timestamp_format`)\n  - **timezone**: special option for timestamp column, specify the timezone of the default timestamp (string, default is `default_timezone`)\n- **add_columns**: columns to add (array of hash)\n  - **name**: name of column (required)\n  - **src**: src column name to be copied (either of `src` or `default` is required)\n  - **default**: value of column (either of `src` or `default` is required)\n  - **type**: type of the default value (required for `default`)\n  - **format**: special option for timestamp column, specify the format of the default timestamp (string, default is `default_timestamp_format`)\n  - **timezone**: special option for timestamp column, specify the timezone of the default timestamp (string, default is `default_timezone`)\n- **drop_columns**: columns to drop (array of hash)\n  - **name**: name of column (required)\n- **default_timestamp_format**: default timestamp format for timestamp columns (string, default is `%Y-%m-%d %H:%M:%S.%N %z`)\n- **default_timezone**: default timezone for timestamp columns (string, default is `UTC`)\n\n## Example - columns\n\nSay input.csv is as follows:\n\n```\ntime,id,key,score\n2015-07-13,0,Vqjht6YE,1370\n2015-07-13,1,VmjbjAA0,3962\n2015-07-13,2,C40P5H1W,7323\n```\n\n```yaml\nfilters:\n  - type: column\n    columns:\n      - {name: time, default: \"2015-07-13\", format: \"%Y-%m-%d\"}\n      - {name: id}\n      - {name: key, default: \"foo\"}\n```\n\nreduces columns to only `time`, `id`, and `key` columns as:\n\n```\ntime,id,key\n2015-07-13,0,Vqjht6YE\n2015-07-13,1,VmjbjAA0\n2015-07-13,2,C40P5H1W\n```\n\nNote that column types are automatically retrieved from input data (inputSchema).\n\n## Example - add_columns\n\nSay input.csv is as follows:\n\n```\ntime,id,key,score\n2015-07-13,0,Vqjht6YE,1370\n2015-07-13,1,VmjbjAA0,3962\n2015-07-13,2,C40P5H1W,7323\n```\n\n```yaml\nfilters:\n  - type: column\n    add_columns:\n      - {name: d, type: timestamp, default: \"2015-07-13\", format: \"%Y-%m-%d\"}\n      - {name: copy_id, src: id}\n```\n\nadd `d` column, and `copy_id` column which is a copy of `id` column as:\n\n```\ntime,id,key,score,d,copy_id\n2015-07-13,0,Vqjht6YE,1370,2015-07-13,0\n2015-07-13,1,VmjbjAA0,3962,2015-07-13,1\n2015-07-13,2,C40P5H1W,7323,2015-07,13,2\n```\n\n## Example - drop_columns\n\nSay input.csv is as follows:\n\n```\ntime,id,key,score\n2015-07-13,0,Vqjht6YE,1370\n2015-07-13,1,VmjbjAA0,3962\n2015-07-13,2,C40P5H1W,7323\n```\n\n```yaml\nfilters:\n  - type: column\n    drop_columns:\n      - {name: time}\n      - {name: id}\n```\n\ndrop `time` and `id` columns as:\n\n```\nkey,score\nVqjht6YE,1370\nVmjbjAA0,3962\nC40P5H1W,7323\n```\n\n## JSONPath\n\nFor type: json column, you can specify [JSONPath](http://goessner.net/articles/JsonPath/) for column's name as:\n\n```\n- {name: $.payload.key1}\n- {name: \"$.payload.array[0]\"}\n- {name: \"$.payload.array[*]\"}\n- {name: $['payload']['key1.key2']}\n```\n\nEXAMPLE:\n\n* [example/columns.yml](example/columns.yml)\n* [example/add_columns.yml](example/add_columns.yml)\n* [example/drop_columns.yml](example/drop_columns.yml)\n\nFollowing operators of JSONPath are not supported:\n\n* Multiple properties such as `['name','name']`\n* Multiple array indexes such as `[1,2]`\n* Array slice such as `[1:2]`\n* Filter expression such as `[?(\u003cexpression\u003e)]`\n\nNote that `type: timesatmp` for `add_columns` or `columns` is not available because Embulk's `type: json` cannot have timestamp column inside.\n\nAlso note that renameing or copying of json paths by `src` option is only partially supported yet. The parent json path must be same like:\n\n```\n- {name: $.payload.foo.dest, src: $.payload.foo.src}\n```\n\nI mean that below example does not work yet (`$.payload.foo` and `$.payload.bar`)\n\n```\n- {name: $.payload.foo.dest, src: $.payload.bar.src}\n```\n\n## Development\n\nRun example:\n\n```\n$ ./gradlew gem\n$ embulk preview -I build/gemContents/lib example/example.yml\n```\n\nRun test:\n\n```\n$ ./gradlew test\n```\n\nRun test with coverage reports:\n\n```\n$ ./gradlew test jacocoTestReport\n```\n\nopen build/reports/jacoco/test/html/index.html\n\nRun checkstyle:\n\n```\n$ ./gradlew check\n```\n\nRun only checkstyle:\n\n```\n$ ./gradlew checkstyleMain\n$ ./gradlew checkstyleTest\n```\n\nFor Maintainers\n----------------\n\n### Release\n\nModify `version` in `build.gradle` at a detached commit, and then tag the commit with an annotation.\n\n```\ngit checkout --detach master\n\n(Edit: Remove \"-SNAPSHOT\" in \"version\" in build.gradle.)\n\ngit add build.gradle\n\ngit commit -m \"Release vX.Y.Z\"\n\ngit tag -a vX.Y.Z\n\n(Edit: Write a tag annotation in the changelog format.)\n```\n\nSee [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) for the changelog format. We adopt a part of it for Git's tag annotation like below.\n\n```\n## [X.Y.Z] - YYYY-MM-DD\n\n### Added\n- Added a feature.\n\n### Changed\n- Changed something.\n\n### Fixed\n- Fixed a bug.\n```\n\nPush the annotated tag, then. It triggers a release operation on GitHub Actions after approval.\n\n```\ngit push -u origin vX.Y.Z\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-filter-column","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembulk%2Fembulk-filter-column","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembulk%2Fembulk-filter-column/lists"}