Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/slott56/stingray-reader

Read very complex spreadsheets or COBOL files with a single, uniform Python style that manages schema explicitly
https://github.com/slott56/stingray-reader

Last synced: 7 days ago
JSON representation

Read very complex spreadsheets or COBOL files with a single, uniform Python style that manages schema explicitly

Host: GitHub
URL: https://github.com/slott56/stingray-reader
Owner: slott56
License: mit
Created: 2018-10-28T15:38:07.000Z (about 6 years ago)
Default Branch: main
Last Pushed: 2024-10-14T12:28:30.000Z (about 1 month ago)
Last Synced: 2024-10-14T12:36:00.152Z (about 1 month ago)
Language: Python
Homepage: https://slott56.github.io/Stingray-Reader/
Size: 35.4 MB
Stars: 14
Watchers: 4
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.rst
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

        ================================================================

The Stingray Schema-Based File Reader

================================================================

Spreadsheet format files are the *lingua franca* of data processing.

CSV, Tab, XLS, XSLX and ODS files are used widely.  Python's ``csv``

module handles two common formats. Add-on packages are required for the

variety of other physical file formats.

The problem is that each add-on package has a unique view of the underlying

data.

The Stingray Schema-Based File Reader offers several features to help

process files in spreadsheet formats.

1.  It wraps format-specific modules with a unified

    "workbook" Facade to make applications able to work with any

    of the physical formats.

2.  It extends the workbook concept to include non-delimited files, including

    COBOL files encoded in any of the Unicode encodings, as well as ASCII and EBCDIC.

3.  It provides a uniform way to load and use schema information based on JSONSchema.

    A schema can be as small as header rows in the individual sheets of a workbook, or it can be separate

    schema information in another spreadsheet, a JSONSchema document, or COBOL "copybook"

    data definitions.

4.  It provides a suite of data conversions that cover the most common cases.

Additionally, the Stingray Reader provides some guidance on how to structure

file-processing applications so that they are testable and composable.

Stingray 5.1 requires Python >= 3.12. The code is fully annotated with type hints.

This depends on additional projects to read .XLS, .XLSX, .ODS, and .NUMBERS files.

-   CSV files are built-in using the ``csv`` module.

-   COBOL files are built-in using the ``estruct`` and ``cobol_parser`` modules.

-   NDJSON or JSON Newline files are JSON with an extra provision that each document must be complete on one physical line. 

    These use the built-in ``json`` module.

-   XLS files are read via the ``xlrd`` project:  http://www.lexicon.net/sjmachin/xlrd.htm

-   XLSX files are read via two projects: https://openpyxl.readthedocs.io/en/stable/

-   Numbers (v13 and higher) usees protobuf and and snappy compression. See https://pypi.org/project/numbers-parser/.

-   YAML files can be a sequence of documents, permitting a direct mapping to a Workbook with a single Sheet.

-   TOML files are -- in effect -- giant dictionaries with flexible syntax and can be described by a JSONSchema.

-   XML files can be wrapped in a Workbook. There's no automated translation from XSD to JSONSchema here.

    A sample is provided, but this may not solve very many problems in general.

-   ODS files are read via http://docs.pyexcel.org/. **NOTE**. Currently, ODS file processing has problems with the 0.7.0 release.

A file-suffix registry is used to map a suffix to a Workbook subclass that handles the physical format.

A decorator is used to add or replace file suffix mappings, permitting an application to fold in extensions.

Installation

============

::

    python -m pip install stingray-reader

Or. Using ``uv``.

::

    uv add stingray-reader

Note that there's a tall stack of dependencies.