https://github.com/mysterious-ben/xmlrecords

Utilities to extract tabular data from XML
https://github.com/mysterious-ben/xmlrecords

parser xml

Last synced: 12 months ago
JSON representation

Utilities to extract tabular data from XML

Host: GitHub
URL: https://github.com/mysterious-ben/xmlrecords
Owner: mysterious-ben
License: apache-2.0
Created: 2020-05-30T12:14:21.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2023-06-15T12:17:00.000Z (about 3 years ago)
Last Synced: 2023-06-15T13:26:23.182Z (about 3 years ago)
Topics: parser, xml
Language: Python
Homepage:
Size: 42 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # XML Records

`xmlrecords` is a user-friendly wrapper of `lxml` package for extraction of tabular data from XML files.

>>> This data provider sends all his data in... XML. You know nothing about XML, except that it looks kind of weird and you would *definitely* never use it for tabular data. How could you just transform all this XML nightmare into a sensible tabular format, like a DataFrame? Don't worry: you are in the right place!

# Installation

```shell script

pip install xmlrecords

```

The package requires `python 3.7+` and one external dependency `lxml`.

# Usage

## Basic example

Usually, you only need to specify path to table rows; optionally, you can specify paths to any extra data you'd like to add to your table:

```python

# XML object

xml_bytes = b"""\

    

        Virtual Shore

    

    

        2020-02-02T05:12:22

        

            Sunny Night

            

            2017

            112.34

        

        

            Babel-17

            

            1963

            10

        

    

"""

# Transform XML to records (= a list of key-value pairs)

import xmlrecords

records = xmlrecords.parse(

    xml=xml_bytes, 

    records_path=['Shelf', 'Book'],  # The rows are XML nodes with the repeating tag 

    meta_paths=[['Library', 'Name'], ['Shelf', 'Timestamp']],  # Add additional "meta" nodes

)

for r in records:

    print(r)

# Output:

# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Sunny Night', 'alive': 'no', 'name': 'Mysterious Mark', 'Year': '2017', 'Price': '112.34'}

# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Babel-17', 'alive': 'yes', 'name': 'Samuel R. Delany', 'Year': '1963', 'Price': '10'}

# Validate record keys

xmlrecords.validate(

    records, 

    expected_keys=['Name', 'Timestamp', 'Title', 'alive', 'name', 'Year', 'Price'],

)

``` 

## With Pandas

You can easily transform records to a pandas DataFrame:

```python

import pandas as pd

df = pd.DataFrame(records)

```

## With SQL

You can use records directly with INSERT statements if your SQL database is [PEP 249 compliant](https://www.python.org/dev/peps/pep-0249/). Most SQL databases are.

SQLite is an exception. There, you'll have to transform records (= a list of dictionaries) into a list of lists:

```python

import sqlite3

with sqlite3.connect('maindev.db') as conn:

    c = conn.cursor()

    c.execute("""\

        CREATE TABLE BOOKS (

           LIBRARY_NAME TEXT,

           SHELF_TIMESTAMP TEXT,

           TITLE TEXT,

           AUTHOR_ALIVE TEXT,

           AUTHOR_NAME TEXT,

           YEAR INT,

           PRICE FLOAT,

           PRIMARY KEY (TITLE, AUTHOR_NAME)

        )

        """

    )

    c.executemany(

        """INSERT INTO BOOKS VALUES (?,?,?,?,?,?,?)""",

        [list(x.values()) for x in records],

    )

    conn.commit()

```

# FAQ

1. **Why not `xmltodict`?** `xmltodict` can convert arbitrary XML to a python dict. However, it is 2-3 times slower than `xmlrecords` and does not support some features specific for tablular data.

2. **Why not `xml` or `lxml`**? `xmlrecords` uses `lxml` under the hood. Using `xml` or `lxml` directly is a viable option too - in case this package doesn't cover your particular use case.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mysterious-ben/xmlrecords

Awesome Lists containing this project

README