{"id":19439319,"url":"https://github.com/hishidama/embulk-parser-poi_excel","last_synced_at":"2025-04-24T22:32:27.993Z","repository":{"id":56844575,"uuid":"43692685","full_name":"hishidama/embulk-parser-poi_excel","owner":"hishidama","description":"Apache POI Excel parser plugin for Embulk","archived":false,"fork":false,"pushed_at":"2023-08-11T00:24:56.000Z","size":278,"stargazers_count":10,"open_issues_count":1,"forks_count":5,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-19T03:53:52.839Z","etag":null,"topics":["embulk","embulk-parser-plugin","poi"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hishidama.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-10-05T15:20:58.000Z","updated_at":"2023-08-20T12:15:47.000Z","dependencies_parsed_at":"2024-11-10T15:28:00.474Z","dependency_job_id":"da23a006-956b-4abc-8a47-9ef5f468b8df","html_url":"https://github.com/hishidama/embulk-parser-poi_excel","commit_stats":{"total_commits":63,"total_committers":1,"mean_commits":63.0,"dds":0.0,"last_synced_commit":"fea7ed4b31223b874881088eca2f8f984a58e1f3"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hishidama%2Fembulk-parser-poi_excel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hishidama%2Fembulk-parser-poi_excel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hishidama%2Fembulk-parser-poi_excel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hishidama%2Fembulk-parser-poi_excel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hishidama","download_url":"https://codeload.github.com/hishidama/embulk-parser-poi_excel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250719797,"owners_count":21476136,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embulk","embulk-parser-plugin","poi"],"created_at":"2024-11-10T15:22:33.036Z","updated_at":"2025-04-24T22:32:27.726Z","avatar_url":"https://github.com/hishidama.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Apache POI Excel parser plugin for Embulk\n\nParses Microsoft Excel files(xls, xlsx) read by other file input plugins.  \nThis plugin uses Apache POI.\n\n## Overview\n\n* **Plugin type**: parser\n* **Guess supported**: no\n* Embulk 0.9 or earlier (refer to https://github.com/hishidama/embulk-parser-excel-poi for 0.10 and later)\n\n\n## Example\n\n```yaml\nin:\n  type: any file input plugin type\n  parser:\n    type: poi_excel\n    sheets: [\"DQ10-orb\"]\n    skip_header_lines: 1\t# first row is header.\n    columns:\n    - {name: row, type: long, value: row_number}\n    - {name: get_date, type: timestamp, cell_column: A, value: cell_value}\n    - {name: orb_type, type: string}\n    - {name: orb_name, type: string}\n    - {name: orb_shape, type: long}\n    - {name: drop_monster_name, type: string}\n```\n\nif omit **value**, specified `cell_value`.  \nif omit **cell_column** when **value** is `cell_value`, specified next column.  \n\n\n## Configuration\n\n* **sheets**: sheet name. can use wildcards `*`, `?`. (list of string, required)\n* **record_type**: record type.  (`row`, `column` or `sheet`. default: `row`)\n* **skip_header_lines**: skip rows when **record_type**=`row` (skip columns when **record_type**=`column`). ignored when **record_type**=`sheet`. (integer, default: `0`)\n* **columns**: column definition. see below. (hash, required)\n* **sheet_options**: sheet option. see below. (hash, default: null)\n\n### columns\n\n* **name**: Embulk column name. (string, required)\n* **type**: Embulk column type. (string, required)\n* **value**: value type. see below. (string, default: `cell_value`)\n* **column_number**: same as **cell_column**.\n* **cell_column**: Excel column number. see below. (string, default: next column when **record_type**=`row`)\n* **cell_row**: Excel row number. see below. (integer, default: next row when **record_type**=`column`)\n* **cell_address**: Excel cell address such as `A1`, `Sheet1!B3`. (string, not required)\n* **numeric_format**: format of numeric(double) to string such as `%4.2f`. (default: Java's Double.toString())\n* **attribute_name**: use with value `cell_style`, `cell_font`, etc. see below. (list of string)\n* **on_cell_error**: processing method of Cell error. see below. (string, default: `constant`)\n* **formula_handling**: processing method of formula. see below. (`evaluate` or `cashed_value`. default: `evaluate`)\n* **on_evaluate_error**: processing method of evaluate formula error. see below. (string, default: `exception`)\n* **formula_replace**: replace formula before evaluate. see below.\n* **on_convert_error**: processing method of convert error. see below. (string, default: `exception`)\n* **search_merged_cell**: search merged cell when cell is BLANK. (`none`, `linear_search`, `tree_search` or `hash_search`, default: `hash_search`)\n\n### value\n\n* `cell_value`: value in cell.\n* `cell_formula`: formula in cell. (if cell is not formula, same `cell_value`.)\n* `cell_style`: all cell style attributes. returned json string. see **attribute_name**. (**type** required `string`)\n* `cell_font`: all cell font attributes. returned json string. see **attribute_name**. (**type** required `string`)\n* `cell_comment`: all cell comment attributes. returned json string. see **attribute_name**. (**type** required `string`)\n* `cell_type`: cell type. returned Cell.getCellType() of POI.\n* `cell_cached_type`: cell cached formula result type. returned Cell.getCachedFormulaResultType() of POI when CellType==FORMULA, otherwise same as `cell_type` (returned Cell.getCellType()).\n* `sheet_name`: sheet name.\n* `row_number`: row number(1 origin).\n* `column_number`: column number(1 origin).\n* `constant`: constant value.\n\n  * `constant.`*value*: specified value.\n  * `constant`: null.\n\n### cell_column\n\nBasically used for **record_type**=`row`.\n\n* `A`,`B`,`C`,...: column number of \"A1 format\".\n* *number*: column number (1 origin).\n* `+`: next column.\n* `+`*name*: next column of name.\n* `+`*number*: number next column.\n* `-`: previous column.\n* `-`*name*: previous column of name.\n* `-`*number*: number previous column.\n* `=`: same column.\n* `=`*name*: same column of name.\n\n### cell_row\n\nBasically used for **record_type**=`column`.\n\n* *number*: row number (1 origin).\n\n### attribute_name\n\n**value**が`cell_style`, `cell_font`, `cell_comment`のとき、デフォルトでは、全属性を取得してJSON文字列に変換します。  \n（JSON文字列を返すので、**type**は`string`である必要があります）\n\n```yaml\n    columns:\n    - {name: foo, type: string, cell_column: A, value: cell_style}\n```\n\n\nattribute_nameを指定することで、指定された属性だけを取得してJSON文字列に変換します。\n\n* **attribute_name**: attribute names. (list of string)\n\n```yaml\n    columns:\n    - {name: foo, type: string, cell_column: A, value: cell_style, attribute_name: [border_top, border_bottom, border_left, border_right]}\n```\n\n\nまた、`cell_style`や`cell_font`の直後にピリオドを付けて属性名を指定することにより、その属性だけを取得することが出来ます。  \nこの場合はJSON文字列にはならず、属性の型に合う**type**を指定する必要があります。\n\n```yaml\n    columns:\n    - {name: foo, type: long, value: cell_style.border}\n    - {name: bar, type: long, value: cell_font.color}\n```\n\nなお、`cell_style`や`cell_font`では、**cell_column**を省略した場合は直前と同じ列を対象とします。  \n（`cell_value`では、**cell_column**を省略すると次の列に移る）\n\n\n### on_cell_error\n\nProcessing method of Cell error (`#DIV/0!`, `#REF!`, etc).\n\n```yaml\n    columns:\n    - {name: foo, type: string, cell_column: A, value: cell_value, on_cell_error: error_code}\n```\n\n* `constant`: set null. (default)\n* `constant.`*value*: set specified value.\n* `error_code`: set error code.\n* `exception`: throw exception.\n\n\n### formula_handling\n\nProcessing method of formula.\n\n```yaml\n    columns:\n    - {name: foo, type: string, cell_column: A, value: cell_value, formula_handling: cashed_value}\n```\n\n* `evaluate`: evaluate formula. (default)\n* `cashed_value`: cashed value in cell.\n\n\n### on_evaluate_error\n\nProcessing method of evaluate formula error.\n\n```yaml\n    columns:\n    - {name: foo, type: string, cell_column: A, value: cell_value, on_evaluate_error: constant}\n```\n\n* `constant`: set null.\n* `constant.`*value*: set specified value.\n* `exception`: throw exception. (default)\n\n\n### formula_replace\n\nReplace formula before evaluate.\n\n```yaml\n    columns:\n    - {name: foo, type: string, cell_column: A, value: cell_value, formula_replace: [{regex: aaa, to: \"A${row}\"}, {regex: bbb, to: \"B${row}\"}]}\n```\n\n`${row}` is replaced with the current row number.\n`${column}` is replaced with the current column string.\n\n\n### on_convert_error\n\nProcessing method of convert error. ex) Excel boolean to Embulk timestamp\n\n```yaml\n    columns:\n    - {name: foo, type: timestamp, format: \"%Y/%m/%d\", cell_column: A, value: cell_value, on_convert_error: constant.9999/12/31}\n```\n\n* `constant`: set null.\n* `constant.`*value*: set specified value.\n* `exception`: throw exception. (default)\n\n\n### sheet_options\n\nOptions of individual sheet.\n\n```yaml\n  parser:\n    type: poi_excel\n    sheets: [Sheet1, Sheet2]\n    columns:\n    - {name: date, type: timestamp, cell_column: A}\n    - {name: foo, type: string}\n    - {name: bar, type: long}\n    sheet_options:\n      Sheet1:\n        skip_header_lines: 1\n        columns:\n          foo: {cell_column: B}\n          bar: {cell_column: C}\n      Sheet2:\n        skip_header_lines: 0\n        columns:\n          foo: {cell_column: D}\n          bar: {value: constant.0}\n```\n\n**sheet_options** is map of sheet name.  \nMap values are **skip_header_lines**, **columns**.\n\n**columns** is map of column name.  \nMap values are same **columns** in **parser** (excluding `name`, `type`).\n\n\n## Install\n\n```\n$ embulk gem install embulk-parser-poi_excel\n```\n\n\n## Build\n\n```\n$ ./gradlew test\n$ ./gradlew package\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhishidama%2Fembulk-parser-poi_excel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhishidama%2Fembulk-parser-poi_excel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhishidama%2Fembulk-parser-poi_excel/lists"}