https://github.com/jayclassless/tabfilereader

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/jayclassless/tabfilereader
Owner: jayclassless
License: mit
Created: 2020-10-25T02:24:57.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-11-08T06:05:36.000Z (over 5 years ago)
Last Synced: 2025-01-16T19:53:12.285Z (about 1 year ago)
Language: Python
Size: 5.64 MB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- License: LICENSE.rst

Awesome Lists containing this project

README

          *******

Welcome

*******

.. image:: https://img.shields.io/pypi/v/tabfilereader.svg

   :target: https://pypi.python.org/pypi/tabfilereader

.. image:: https://img.shields.io/pypi/l/tabfilereader.svg

   :target: https://pypi.python.org/pypi/tabfilereader

.. image:: https://github.com/jayclassless/tabfilereader/workflows/Test/badge.svg

   :target: https://github.com/jayclassless/tabfilereader/actions

.. image:: https://github.com/jayclassless/tabfilereader/workflows/Docs/badge.svg

   :target: https://jayclassless.github.io/tabfilereader/

Overview

========

``tabfilereader`` is a small library to make reading flat, tabular data from

files a bit less tedious.

At its base, to use ``tabfilereader``, you simply define your Schema, then use

it to open a Reader. You can then iterate through the Reader to retrieve

records from the file.

    >>> import tabfilereader as tfr

    >>> class MySchema(tfr.Schema):

    ...     column1 = tfr.Column('column_1')

    ...     column2 = tfr.Column('column_2', data_type=tfr.IntegerType(), data_required=True)

    >>> reader = tfr.CsvReader.open('test/data/simple_header.csv', MySchema)

    >>> for record, errors in reader:

    ...     print(record)

    Record(column1='foo', column2=123)

    Record(column1='bar', column2=None)

Schemas

=======

Schema classes tell ``tabfilereader`` what columns to expect in the file, and

what datatypes the values contained in them should be cast as. You create your

schemas by defining a class that inherits from ``tabfilereader.Schema``. In

this class, you define properties that are instances of

``tabfilereader.Column``, which specify where columns are in the file, and what

their datatype is. An example::

    >>> import re

    >>> class ExampleSchema(tfr.Schema):

    ...     first = tfr.Column('First Name')

    ...     last = tfr.Column('Last Name', data_required=True)

    ...     birthdate = tfr.Column(re.compile(r'^Birth.*'), data_type=tfr.DateType())

    ...     weight = tfr.Column('Weight', data_type=tfr.FloatType(), required=False)

Columns require at least one argument that tells ``tabfilereader`` how to find

the column in the file. For files where the first record contains column names,

you can specify either:

* The exact name of the column as a string.

* An ``re.Pattern`` that will match the column name.

* A sequence of strings or ``re.Pattern`` objects that the column could

  possibly be named as.

For files that do not contain a header record, you specify the column's

location with an zero-based integer index.

Columns also take a series of optional parameters:

``required``

    To indicate whether or not it is required that this column exists in the

    file. Defaults to ``True``.

``data_required``

    To indicate whether or not the column must have a value for every record in

    the file. Defaults to ``False``.

``data_type``

    With this parameter, you can provide a ``callable`` that will receive a

    string value from the file and return a parsed and properly-typed value. If

    the value is invalid, the callable should throw a ``ValueError``.

    ``tabfilereader`` provides an array of pre-defined Types that you can use

    here for the most common data types (numbers, dates, strings, etc).

    See the API documentation for all the available pre-defined Types. This

    parameter defaults to ``tabfilereader.StringType()`` if not specified.

There are also a handful of optional parameteres that can be declared on the

Schema itself. The available options are:

``ignore_unknown_columns``

    To indicate what should be done if a Reader finds columns in the file that

    are not declared in the Schema. Defaults to ``False``, which means the

    Reader will throw an exception.

``ignore_empty_records``

    To indicate what should be done if a Reader encounters a record with no

    columns whatsoever. Defaults to ``False``, which means the reader will

    return a record that is full of errors. This option is particularly useful

    for CSV files when people are a bit sloppy with their newlines at the end

    of a file.

To set these Schema-level options, pass them as keyword arguments in the class

declaration::

    >>> class SchemaWithOptions(tfr.Schema, ignore_unknown_columns=True):

    ...     column1 = tfr.Column('column_1')

Readers

=======

Readers use the Schemas to interpret the contents of the tabular files.

``tabfilereader`` provides the following Readers to handle various types of

files:

``CsvReader``

    Handles Comma Separated Value files (or similarly-constructed files; TSV,

    etc).

``ExcelReader``

    Handles Excel spreadsheets; either XLS- or XLSX-formatted.

``OdsReader``

    Handles OpenDocumentFormat spreadsheets.

Readers can be created by either calling the ``open()`` classmethod on the

specific Reader class you want to use, or by defining your own Reader class

that inherits from one provided by ``tabfilereader`` like so::

    >>> class MyReader(tfr.CsvReader):

    ...     schema = MySchema

    ...     delimiter = '|'

    >>> reader = MyReader('test/data/simple_header_pipe.csv')

Each reader allows for a variety of optional parameters (like ``delimiter`` in

the example above). See the API documentation for a full listing of the options

for each.

Readers are iterable. Each iteration returns a tuple of two values. The first

value is a Record that contains the values from the file. The second value is

a collection of all the errors encountered when trying to parse the values in

the columns.

    >>> record, errors = next(reader)

    >>> record.column1

    'foo'

    >>> record['column2']

    123

    >>> bool(errors)

    False

    >>> record, errors = next(reader)

    >>> record.column1

    'bar'

    >>> record['column2'] is None

    True

    >>> bool(errors)

    True

    >>> errors['column2']

    'A value is required'

License

=======

This project is released under the terms of the `MIT License`_.

.. _MIT License: https://opensource.org/licenses/MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jayclassless/tabfilereader

Awesome Lists containing this project

README