https://github.com/yahoojapan/embulk-parser-xml2
https://github.com/yahoojapan/embulk-parser-xml2
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/yahoojapan/embulk-parser-xml2
- Owner: yahoojapan
- License: other
- Created: 2016-08-09T02:54:34.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2016-08-12T06:35:55.000Z (about 9 years ago)
- Last Synced: 2025-02-04T17:18:02.403Z (8 months ago)
- Language: Java
- Size: 67.4 KB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Xml2 parser plugin for Embulk
Embulk parser plugin for parsing xml data. this plugin uses SAX parser, so you can parse very huge XML data with this plugin. also, support parsing sub-element under the root element which you specified. so you can parse and expand data more flexibly.
## Overview
* **Plugin type**: parser
* **Guess supported**: no## Configuration
- **type**: specify this plugin as `"xml2"` (string, required)
- **root**: root element to start fetching each entries (integer, required)
- **schema**: specify the attribute of table and data type (required)## Example
```yaml
parser:
type: xml2
root: mediawiki/page
schema:
- { name: id, type: long }
- { name: title, type: string }
- { name: revision/timestamp, type: timestamp, format: '%Y-%m-%dT%H:%M:%SZ' }
- { name: revision/text, type: string }
```Then you can fetch entries from the following xml (wikipedia archive xml format.) :
```xml
1
title 1
2004-04-30T14:46:00Z
body text
2
title 2
2004-04-30T14:46:00Z
body text
```
## Build
```
$ ./gradlew gem
```## How to send Pull Request
If you would like to send a patch or Pull Request to this repository, please agree with our CLA before that. Please check following steps.
1. You send Pull Request to our Yahoo! JAPAN OSS.
2. We send you CLA to get agreement from you.
- Yahoo! JAPAN CLA https://gist.github.com/ydnjp/3095832f100d5c3d2592
3. You agree with the CLA.
4. We review your Pull Request and merge it.