Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/takumakanari/embulk-parser-xml

Embulk parser plugin for xml
https://github.com/takumakanari/embulk-parser-xml

embulk embulk-parser-plugin ruby xml xpath

Last synced: 3 months ago
JSON representation

Embulk parser plugin for xml

Awesome Lists containing this project

README

        

# XML parser plugin for Embulk

Parser plugin for [Embulk](https://github.com/embulk/embulk).

Read data from input as xml and fetch each entries to output.

## Overview

* **Plugin type**: parser
* **Load all or nothing**: yes
* **Resume supported**: no

## Types

- **xml**: Find rows by SAX.
- **xpath**: Find finds rows by Xpath, so you can process XML by more complex condition than *xml* type.

## Configuration

### XML

```yaml
parser:
type: xml
root: data/students/student
schema:
- {name: name, type: string}
- {name: age, type: long}
```

- **type**: specify this plugin as `xml` .
- **root**: root property to start fetching each entries, specify in *path/to/node* style, required.
- **schema**: specify the attribute of table and data type, required.

If you need to parse column as timestamp type, *schema* supports 2 optional parameters:

```yaml
schema:
- {name: timestamp_column, type: timestamp, format: "%Y-%m-%d", timezone: "+0000"}
```

- **format**: timestamp format to parse, required.
- **timezone**: timestamp will be parsing in this timezone, `"+0900"` is used by default.

### Xpath

```yaml
parser:
type: xpath
root: //data/students/student
schema:
- {path: name, type: string, name: name}
- {path: age, type: long, name: age}
- {path: hobbies/hobby, type: json, name: hobbies}
```

- **type**: specify this plugin as `xpath` .
- **root**: root property to start fetching each entries, specify in Xpath, *'/''* is used by default.
- **schema**: specify the attribute of table and data type, required.
- **namespaces**: xml namespaces

If you need to parse column as timestamp type, *schema* supports 2 optional parameters:

```yaml
schema:
- {name: timestamp_column, type: timestamp, format: "%Y-%m-%d", timezone: "+0000"}
```

- **format**: timestamp format to parse, required.
- **timezone**: timestamp will be parsing in this timezone, `"+0900"` is used by default.

Here is XML for xample:

```xml

true


John
10

music
movie



Paul
16

game



George
17


Ringo
18

```