Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/takumakanari/embulk-parser-xml
Embulk parser plugin for xml
https://github.com/takumakanari/embulk-parser-xml
embulk embulk-parser-plugin ruby xml xpath
Last synced: 3 months ago
JSON representation
Embulk parser plugin for xml
- Host: GitHub
- URL: https://github.com/takumakanari/embulk-parser-xml
- Owner: takumakanari
- License: mit
- Created: 2015-03-14T11:52:54.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2019-11-19T00:26:34.000Z (about 5 years ago)
- Last Synced: 2024-04-27T01:20:56.504Z (10 months ago)
- Topics: embulk, embulk-parser-plugin, ruby, xml, xpath
- Language: Ruby
- Homepage:
- Size: 23.4 KB
- Stars: 11
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# XML parser plugin for Embulk
Parser plugin for [Embulk](https://github.com/embulk/embulk).
Read data from input as xml and fetch each entries to output.
## Overview
* **Plugin type**: parser
* **Load all or nothing**: yes
* **Resume supported**: no## Types
- **xml**: Find rows by SAX.
- **xpath**: Find finds rows by Xpath, so you can process XML by more complex condition than *xml* type.## Configuration
### XML
```yaml
parser:
type: xml
root: data/students/student
schema:
- {name: name, type: string}
- {name: age, type: long}
```- **type**: specify this plugin as `xml` .
- **root**: root property to start fetching each entries, specify in *path/to/node* style, required.
- **schema**: specify the attribute of table and data type, required.If you need to parse column as timestamp type, *schema* supports 2 optional parameters:
```yaml
schema:
- {name: timestamp_column, type: timestamp, format: "%Y-%m-%d", timezone: "+0000"}
```- **format**: timestamp format to parse, required.
- **timezone**: timestamp will be parsing in this timezone, `"+0900"` is used by default.### Xpath
```yaml
parser:
type: xpath
root: //data/students/student
schema:
- {path: name, type: string, name: name}
- {path: age, type: long, name: age}
- {path: hobbies/hobby, type: json, name: hobbies}
```- **type**: specify this plugin as `xpath` .
- **root**: root property to start fetching each entries, specify in Xpath, *'/''* is used by default.
- **schema**: specify the attribute of table and data type, required.
- **namespaces**: xml namespacesIf you need to parse column as timestamp type, *schema* supports 2 optional parameters:
```yaml
schema:
- {name: timestamp_column, type: timestamp, format: "%Y-%m-%d", timezone: "+0000"}
```- **format**: timestamp format to parse, required.
- **timezone**: timestamp will be parsing in this timezone, `"+0900"` is used by default.Here is XML for xample:
```xml
true
John
10
music
movie
Paul
16
game
George
17
Ringo
18
```