Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/birkenfeld/dconv

Last synced: 19 days ago
JSON representation
Host: GitHub
URL: https://github.com/birkenfeld/dconv
Owner: birkenfeld
Created: 2019-08-22T11:57:26.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2019-08-22T11:59:06.000Z (over 5 years ago)
Last Synced: 2024-10-31T10:43:10.602Z (2 months ago)
Language: Python
Size: 19.5 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.txt
Awesome Lists containing this project

README

        dconv data file conversion utility

==================================

:Author: Georg Brandl

:Last updated: Feb 8, 2008

This utility converts data files between different (ASCII) formats.

Input and output formats are described using a simple "format

description" language.

Format descriptions

-------------------

A format description is a block of text that consists of simple

one-line property-value assignments.  Every line is either empty, a

comment (starts with `#`) or a property assignment of the form

`property: value`.

String properties can be given with or without quotes; if quoted, they

are evaluated as a Python string (enabling you to use escape

sequences).

Some properties can be given multiple times; this is noted in their

description.

Input format

~~~~~~~~~~~~

Input data files consist of three parts (where the first two are

optional):

* Skipped text (which can contain global definitions)

* Headers (descriptions etc. for every field)

* Data, organized as one record per line

The input format description can have these properties:

`comments` : `none`, or a string (default: `none`)

    If this is not `none`, lines in the data file starting with the

    given string are ignored.  Blank lines are always ignored.

`defs` :  `auto`, `equals`, `colon` or `none` (default: `auto`)

    If `none`, no definitions are recognized in skipped text.  If

    `equals`, definitions of the form `name = value` are recognized, if

    `colon`, definitions of the form `name: value` are recognized. The

    value `auto` allows both forms of definitions.

    For definition values that are of the form ``number unit``, the

    unit is split off and only the number is kept converted to a float.

`skip_until` : a string (no default)

    If this property is given, all lines at the beginning of the file

    are skipped, until a line contains this string.  (The line

    containing the string is not skipped; use `extra_skip` to do that.)

`extra_skip` : an integer (default: 0)

    This amount of lines is skipped at the beginning of the file, or

    after the `skip_until` string has been found.

`headers` : [multiple] a string (no default)

    This option marks the first line (or subsequent lines, if given

    multiple times) of the data to be headers.  The headers are read,

    split and stored in the `headers` attribute of the Data instance,

    with the given value as the key.  The headers with the key `name`

    are special: they are used for assigning names to the record

    fields.

`fields` : a comma-separated list of strings (no default)

    If the record field names are not in the data file, or the ones

    provided in the data file are not usable, this property can be used

    to assign names to the record fields.  (It has priority over a

    `names` header, if given.)

`fieldsep` : a string (default: `" \t"`)

    Record lines are split at the delimiters given (each character is a

    separate delimiter).

`linejunk` : [multiple] a string (no default)

    The given strings are ignored when splitting a record line.

`fieldregex` : a string (compilable as a regular expression) (no default)

    If this is given, the `fieldsep` and `linejunk` properties are

    ignored, and instead each line containing fields is matched against

    the regular expression, each subgroup of the regex being regarded

    as one field.

`end` : `auto`, or a string (default: `auto`)

    If this is `auto`, reading the data stops as soon as an error is

    encountered.

    Else, reading the data stops as soon as a line contains the given

    string.

Output format

~~~~~~~~~~~~~

`fieldnames` : a comma-separated list of strings

    This list renames the fields coming from the input file to the

    given names.

`module` : [multiple] a module name

    All modules given are imported and their contents can be used in

    the code calculating constants, conditions and fields.

`assertion` : [multiple] a Python expression

    The given expressions are evaluated once, before writing individual

    records.  If one expression is evaluated to a False value, writing

    stops with an error.  This can be used to make sure the data is in

    the correct format.

`const` : [multiple] a Python statement

    The given statements are executed before individual records are

    converted.  This can be used to set constants for calculation; they

    are available in `condition` and `field` expressions.

`condition` : [multiple] a Python expression

    The given conditions are tested for each record before it is

    written.  If any condition is evaluated to a False value, the

    record is skipped.

`field` : [multiple] a Python statement

    These statements are executed once for each record, in the order

    they occur in the definition.  They can be used to calculate output

    fields.

`outfields` : [multiple] a string

    This property selects which fields are written as the output

    record.  The field names available are names created by `field`

    statements and the field names of the input file (possibly renamed

    by a `fieldnames` property).

    The value must be either a comma-separated list of strings, or a

    `!` followed by a Python expression.  If it's a list of strings,

    they will be taken directly as names.  If it's an expression, it

    must evaluate to a list of strings.  All the named fields are then

    added as output fields.

`headers` : True/False (default: False)

    If true, write the output field names as the first line in the

    output file.

`fieldsep` : a string

    This string is used to separate output fields.

`prologue` : [multiple] a string

    All strings given here are written to the output file before any

    data record, one string per line.

    Within the strings, you can use variable substitution of the form

    `$name` or `${name}` to insert constants into the text.

`epilogue` : [multiple] a string

    All strings given here are written to the output file after all

    data records, one string per line.  You can use the same

    substitution mechanism as for `prologue`.

`filename_ext` : a string

    A hint for the output filename extension.

Additional variables available for code execution

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When executing code for `const`, `condition` and `field` items, or

substituting names in `prologue` and `epilogue`, these names are

always defined:

* `records` -- a list of all records in the input data

* `numrecords` -- the total number of records

* `numfields` -- the number of fields in a record

* `fieldnames` -- a list of `numfields` field names from the input file

  (*not* influenced by a `fieldnames` property)

* `headers` -- the dictionary of headers

* `skipped` -- a list of all skipped lines from the input file

* `defs` -- an object that has all definitions from the input file as

  attributes

* and of course, all Python built-in functions.

These names are additionally defined for `condition` and `field` item code:

* `i` -- sequential number of the current record, starting with 0

* `r` -- current record, you can access the fields as attributes