{"id":19439316,"url":"https://github.com/hishidama/embulk-parser-hadoop-seqfile","last_synced_at":"2025-02-25T07:22:37.537Z","repository":{"id":197515937,"uuid":"698788842","full_name":"hishidama/embulk-parser-hadoop-seqfile","owner":"hishidama","description":"Hadoop SequenceFile parser plugin for Embulk","archived":false,"fork":false,"pushed_at":"2023-10-08T01:05:52.000Z","size":93,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-07T21:37:44.334Z","etag":null,"topics":["embulk-parser-plugin","embulk-plugin","hadoop","java-8","sequencefile"],"latest_commit_sha":null,"homepage":"https://www.ne.jp/asahi/hishidama/home/tech/embulk/parser-sequencefile.html","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hishidama.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-10-01T01:13:10.000Z","updated_at":"2023-10-19T23:22:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"85e01ee2-a07f-4c70-aaf6-170309488c3e","html_url":"https://github.com/hishidama/embulk-parser-hadoop-seqfile","commit_stats":null,"previous_names":["hishidama/embulk-parser-hadoop-seqfile"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hishidama%2Fembulk-parser-hadoop-seqfile","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hishidama%2Fembulk-parser-hadoop-seqfile/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hishidama%2Fembulk-parser-hadoop-seqfile/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hishidama%2Fembulk-parser-hadoop-seqfile/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hishidama","download_url":"https://codeload.github.com/hishidama/embulk-parser-hadoop-seqfile/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240619804,"owners_count":19830270,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embulk-parser-plugin","embulk-plugin","hadoop","java-8","sequencefile"],"created_at":"2024-11-10T15:22:32.854Z","updated_at":"2025-02-25T07:22:37.470Z","avatar_url":"https://github.com/hishidama.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hadoop SequenceFile parser plugin for Embulk\n\nParses Hadoop SequenceFile read by other file input plugins.\n\n## Overview\n\n* **Plugin type**: parser\n* **Guess supported**: no\n* Embulk 0.10 or later\n* jdk1.8 (jre1.8 is not supported) or Java9 later\n\n\n## Example\n\n### SequenceFile(key: Text, value: IntWritable)\n\n```yaml\nin:\n  type: any file input plugin type\n  parser:\n    type: hadoop_seqfile\n    key_class:   org.apache.hadoop.io.Text\n    value_class: org.apache.hadoop.io.IntWritable\n    columns:\n    - {name: word, type: string, key: true, wtype: text}\n    - {name: count, type: long, key: false, wtype: int}\n```\n\n### SequenceFile(key: NullWritable, value: Asakusa Framework DataModel)\n\n```yaml\nin:\n  type: any file input plugin type\n  parser:\n    type: hadoop_seqfile\n    value_class: com.example.asakusafw.dmdl.model.WordCount\n    columns:\n    - {name: word, type: string, wtype: stringOption}\n    - {name: count, type: long, wtype: intOption}\n```\n\nSee [asakusafw-helper.xlsx](asakusafw-helper.xlsx) as a tool to assist in generating columns from dmdl.\n\n\n## Configuration\n\n* **key_class**: key class name. (string, defualt: `org.apache.hadoop.io.NullWritable`)\n* **value_class**: value class name. (string, defualt: `org.apache.hadoop.io.NullWritable`)\n* **columns**: column definition. see below. (hash, required)\n* **default_timezone**: default time zone. (string, default: `UTC`)\n* **default_timestamp_format**: default timestemp format. (string, default: `%Y-%m-%d %H:%M:%S.%N %z`)\n* **flush_count**: flush count. (int, default: `100`)\n\n### columns\n\n* **name**: Embulk column name. (string, required)\n* **type**: Embulk column type. (string, required)\n* **key**: key or value (`true` for key, `false` for value). (boolean, default: `false`)\n* **wtype**: Writable type. (string, required)\n* **timezone**: time zone. (string, default: **default_timezone**)\n* **format**: timestemp format. (string, default: **default_timestamp_format**)\n\n#### wtype (Writable type)\n\n| wtype            | software          | Writable class                             |\n|------------------|-------------------|--------------------------------------------|\n| `null`           | Hadoop            | org.apache.hadoop.io.NullWritable          |\n| `boolean`        | Hadoop            | org.apache.hadoop.io.BooleanWritable       |\n| `byte`           | Hadoop            | org.apache.hadoop.io.ByteWritable          |\n| `short`          | Hadoop            | org.apache.hadoop.io.ShortWritable         |\n| `int`            | Hadoop            | org.apache.hadoop.io.IntWritable           |\n| `long`           | Hadoop            | org.apache.hadoop.io.LongWritable          |\n| `float`          | Hadoop            | org.apache.hadoop.io.FloatWritable         |\n| `double`         | Hadoop            | org.apache.hadoop.io.DoubleWritable        |\n| `vint`           | Hadoop            | org.apache.hadoop.io.VIntWritable          |\n| `vlong`          | Hadoop            | org.apache.hadoop.io.VLongWritable         |\n| `text`           | Hadoop            | org.apache.hadoop.io.Text                  |\n| `booleanOption`  | Asakusa Framework | com.asakusafw.runtime.value.BooleanOption  |\n| `byteOption`     | Asakusa Framework | com.asakusafw.runtime.value.ByteOption     |\n| `shortOption`    | Asakusa Framework | com.asakusafw.runtime.value.ShortOption    |\n| `intOption`      | Asakusa Framework | com.asakusafw.runtime.value.IntOption      |\n| `longOption`     | Asakusa Framework | com.asakusafw.runtime.value.LongOption     |\n| `floatOption`    | Asakusa Framework | com.asakusafw.runtime.value.FloatOption    |\n| `doubleOption`   | Asakusa Framework | com.asakusafw.runtime.value.DoubleOption   |\n| `decimalOption`  | Asakusa Framework | com.asakusafw.runtime.value.DecimalOption  |\n| `stringOption`   | Asakusa Framework | com.asakusafw.runtime.value.StringOption   |\n| `dateOption`     | Asakusa Framework | com.asakusafw.runtime.value.DateOption     |\n| `datetimeOption` | Asakusa Framework | com.asakusafw.runtime.value.DateTimeOption |\n\n\n## Install\n\n1. install plugin\n   ```\n   $ mvn dependency:get -Dartifact=io.github.hishidama.embulk:embulk-parser-hadoop-seqfile:0.1.0\n   ```\n\n2. add setting to $HOME/.embulk/embulk.properties\n   ```\n   plugins.parser.hadoop_seqfile=maven:io.github.hishidama.embulk:hadoop-seqfile:0.1.0\n   ```\n\n\n## Build\n\n```\n$ ./gradlew test\n```\n\n### Build to local Maven repository\n\n```\n./gradlew generatePomFileForMavenJavaPublication\nmvn install -f build/publications/mavenJava/pom-default.xml\n./gradlew publishToMavenLocal\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhishidama%2Fembulk-parser-hadoop-seqfile","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhishidama%2Fembulk-parser-hadoop-seqfile","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhishidama%2Fembulk-parser-hadoop-seqfile/lists"}