https://github.com/michelin/avro-xml-mapper
Avro XML Mapper is a Java library that converts XML formatted data to Apache Avro format
https://github.com/michelin/avro-xml-mapper
avro java kafka kafka-streams
Last synced: 9 months ago
JSON representation
Avro XML Mapper is a Java library that converts XML formatted data to Apache Avro format
- Host: GitHub
- URL: https://github.com/michelin/avro-xml-mapper
- Owner: michelin
- License: apache-2.0
- Created: 2023-06-02T11:22:54.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-01-07T17:04:06.000Z (over 1 year ago)
- Last Synced: 2025-01-07T17:53:48.477Z (over 1 year ago)
- Topics: avro, java, kafka, kafka-streams
- Language: Java
- Homepage:
- Size: 192 KB
- Stars: 15
- Watchers: 8
- Forks: 6
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# Avro XML Mapper
[](https://img.shields.io/github/actions/workflow/status/michelin/avro-xml-mapper/build.yml)
[](https://central.sonatype.com/search?q=com.michelin.avro-xml-mapper&sort=name)

[](https://github.com/michelin/avro-xml-mapper)
[](https://sonarcloud.io/component_measures?id=michelin_avro-xml-mapper&metric=coverage&view=list)
[](https://sonarcloud.io/component_measures?metric=tests&view=list&id=michelin_avro-xml-mapper)
[](https://opensource.org/licenses/Apache-2.0)
[Getting Started](#getting-started) • [Usage](#usage)
Turn XML into Avro and vice versa.
## Table of Contents
* [Getting Started](#getting-started)
* [Usage](#usage)
* [XPath](#xpath)
* [Structure](#structure)
* [Single Element](#single-element)
* [List](#list)
* [Map](#map)
* [Logical Type](#logical-type)
* [Date](#date)
* [Big Decimal](#big-decimal)
* [XML Namespace](#xml-namespace)
* [Keywords](#keywords)
* [keepEmptyTag](#keepemptytag)
* [Custom Implementations](#custom-implementations)
* [Contribution](#contribution)
## Getting Started
To get started, add the following dependency:
```xml
com.michelin
avro-xml-mapper
${avro-xml-mapper.version}
```
## Usage
### XPath
The XPath attribute is used to specify the path of the element in the XML file.
### Structure
#### Single Element
A single element is represented as follows:
AVSCXML
```avro schema
{
"name": "Object",
"type": "record",
"namespace": "com.example",
"xpath": "/objectRoot",
"fields": [
{"name": "element", "type": "string", "xpath": "element"}
]
}
```
```xml
content
```
#### List
Lists can be applied to any repeating element in the XML file. The XPath attribute should point to the repeating element.
AVSCXML
```avro schema
{
"name": "Object",
"type": "record",
"namespace": "com.example",
"xpath": "/objectRoot",
"fields": [
{
"name": "stringList",
"xpath": "child",
"type": {"type": "array", "items": "string"},
"default": {}
}
]
}
```
```xml
content1
content2
```
Complex types can also be defined as follows:
AVSCXML
```avro schema
{
"name": "Object",
"type": "record",
"namespace": "com.example",
"xpath": "/objectRoot",
"fields": [
{
"name": "recordList",
"xpath": "recordList/listItem",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "SubXMLTestModelMultipleXpath",
"fields": [
{"name": "subStringField", "type" : ["null","string"], "default": null, "customXpath1": "subStringField", "customXpath2": "altSubStringField"},
{"name": "subIntField", "type" : ["null","int"], "default": null, "customXpath1": "subIntField", "customXpath2": "altSubIntField"},
{"name": "subStringFieldFromAttribute", "type" : ["null","string"], "default": null, "customXpath1": "subIntField/@attribute", "customXpath2": "altSubIntField/@attribute"}
],
"default": {}
}
},
"default": []
}
]
}
```
```xml
item1
1
item2
2
item3
3
```
#### Map
Maps have two accepted formats:
- A list of elements with a key attribute
AVSCXML
```avro schema
{
"name": "Object",
"type": "record",
"namespace": "com.example",
"xpath": "/objectRoot",
"fields": [
{
"name": "stringMapFormat1",
"xpath": { "rootXpath": "element", "keyXpath": "@key", "valueXpath": "." },
"type": { "type": "map", "values": "string" },
"default": {}
}
]
}
```
```xml
content1
content2
```
- A list of nodes with a key element and a value element
AVSCXML
```avro schema
{
"name": "Object",
"type": "record",
"namespace": "com.example",
"xpath": "/objectRoot",
"fields": [
{
"name": "stringMapFormat2",
"xpath": { "rootXpath": "element", "keyXpath": "key", "valueXpath": "value" },
"type": { "type": "map", "values": "string" },
"default": {}
}
]
}
```
```xml
key1
content1
key2
content2
```
In both cases, the `rootXpath` attribute always points to the repeating element of the list.
### Logical Type
#### Date
Only the `timestamp-millis` Long logical type is handled and has multiple accepted formats:
- ISO8601 date-time
- ISO8601 date
- Flat date (yyyyMMddz) which gets the UTC 12:00:00.000 time to avoid timezone issues
- Flat date-time (yyyyMMddHHmmssz) which gets the UTC timezone assigned
- ISO8601 date-time without offset
- ISO8601 date without offset
- Flat date without offset (yyyyMMdd) which gets the UTC 12:00:00.000 time to avoid timezone issues
- Flat date-time without offset (yyyyMMdd HHmmss) which gets the UTC timezone assigned
- Flat date-time without offset and without timezone (yyyy-MM-dd HH:mm:ss) which gets the UTC timezone assigned
- Flat date-time with offset (yyyy-MM-dd'T'HH:mm:ss'T'00:00)
They are all converted to the `Instant` Java type.
#### Big Decimal
Only the `Decimal` byte logical type is handled. It is converted to a `BigDecimal` Java type.
### XML Namespace
The `xmlNamespaces` attribute defined at the root of the AVSC file is used to specify the namespaces used in the XML file.
> It should be noted that this attribute is used in different ways depending on the conversion direction as described in the following sections.
#### XML to Avro
The namespaces are used to unify the XML file.
If multiple namespace definitions refer to the same URI, only the one defined in the `xmlNamespaces` attribute will be kept during conversion.
For instance, with the given AVSC and XML:
```avro schema
{
"name": "Object",
"type": "record",
"namespace": "com.example",
"xpath": "objectRoot",
"xmlNamespaces": {
"null": "http://namespace.uri/default",
"ns1": "http://namespace.uri/1"
},
"fields": [
{"name": "element", "type": "string", "xpath": "element"},
{"name": "secondElement", "type": "string", "xpath": "ns1:secondElement"},
{"name": "thirdElement", "type": "string", "xpath": "ns1:thirdElement"}
]
}
```
```xml
content
second element content
third element content
```
Before conversion to Avro, the initial document is tweaked as such:
```xml
content
second element content
third element content
```
The root `xmlns` namespace is replaced with `xmlns:noprefixns` and the `ns1` is simply preserved.
The `ns2` namespace is removed because it refers to the same URI as the `ns1` namespace.
> Failing to provide `xmlNamespaces` for XML to Avro conversion simply means that namespaces in XPath have to be consistent.
#### Avro to XML
The namespaces are used for root namespace definition.
> Failing to provide `xmlNamespaces` for Avro to XML conversion means that no namespace should be used in the XPath attributes, as it would mean that the produced XML would be invalid.
## Keywords
### keepEmptyTag
The `keepEmptyTag` attribute can be used to signify that the tag needs to be kept in the Avro to XML conversion in case the original Avro field is null:
AVSCXML
```avro schema
{
"name": "Object",
"type": "record",
"namespace": "com.example",
"xpath": "/objectRoot",
"fields": [
{
"name": "emptyElement",
"xpath": "element",
"keepEmptyTag": true,
"type": ["null","string"],
"default": null
}
]
}
```
```xml
```
### Custom Implementations
Using the provided method `AvroToXmlMapper#convertAvroToXmlDocument` allows for custom implementations and editing of the document before it is converted to String.
Conversion can be finalized using `GenericUtils#documentToString` method.
## Contribution
We welcome contributions from the community! Before you get started, please take a look at
our [contribution guide](https://github.com/michelin/avro-xml-mapper/blob/main/CONTRIBUTING.md) to learn about our guidelines
and best practices. We appreciate your help in making Avro XML Mapper a better tool for everyone.