https://github.com/kosarev/tproc

A small yet powerful text processor in Python
https://github.com/kosarev/tproc
macro-processor mit-license preprocessor python python-generators template-processor text-processor word-processor
Last synced: 10 months ago
JSON representation
A small yet powerful text processor in Python
Host: GitHub
URL: https://github.com/kosarev/tproc
Owner: kosarev
License: mit
Created: 2018-08-01T07:06:06.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-10-17T09:55:16.000Z (over 5 years ago)
Last Synced: 2024-10-18T13:15:22.746Z (over 1 year ago)
Topics: macro-processor, mit-license, preprocessor, python, python-generators, template-processor, text-processor, word-processor
Language: Python
Homepage:
Size: 37.1 KB
Stars: 8
Watchers: 2
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # tproc

A small yet powerful text processor written in Python.

[![Build Status](https://travis-ci.org/kosarev/tproc.svg?branch=master)](https://travis-ci.org/kosarev/tproc)

## Features:

* Provides a way to program your documentation.

* Unleashes the full power of Python for organizing, generating,

validating and debugging your data. Supports arbitrary Python

code and modules. No new languages to learn.

* Interleaved text and code. The order of definitions is up to you.

* Text pieces are implicitly defined as functions that can be

called from anywhere in the input file as well as from an

external code having access to the processor object.

* Supports Python 2.7 and 3.

* Available under the MIT license.

## Contents

* [Installation](#installation)

* [Hello world](#hello-world)

* [Definitions](#definitions)

* [Replacement fields](#replacement-fields)

* [Format specifiers](#format-specifiers)

* [Passing data to generators](#passing-data-to-generators)

* [Escape sequences](#escape-sequences)

* [Tokens](#tokens)

* [Generation of non-text data](#generation-of-non-text-data)

* [Namespaces and processor objects](#namespaces-and-processor-objects)

* [API](#api)

* [Basic design principles](#basic-design-principles)

## Installation

```shell

pip install tproc

```

## Hello world

```python

# hello.tproc

@hello

Hello {world}

@world

World!

@main

{hello}

```

Processing:

```

$ tproc hello.tproc

Hello World!

```

The input contains three definitions, each expanding into its

body text. The names in curly braces are replaced with the body

of the corresponding definition.

Note that tproc only expands input on request, and not as it

reads and processes the definitions. Because of this, the

definitions may come in any order as seems best for your needs.

Whitespace just before and just after definition bodies is

stripped, so all the three definitions in the example produce

inline output with no new-line characters.

The part of the input before the first definition is ignored, and

supposed to be used for describing the purpose of the input and

other relevant information.

## Definitions

tproc translates text definitions into Python generators that

produce the body text in its original form, that is, before any

expansion. This makes it possible to write definitions as normal

Python functions, like this:

```python

#!/usr/bin/env tproc

@

def hello():

    yield 'Hello {'

    yield 'world'

    yield '}'

@world

World!

@main

{hello}

```

Output:

```

Hello World!

```

Custom generators can yield the whole piece of data at once or

generate it by chunks of arbitrary size.

## Replacement fields

Replacement fields are portions of text surrounded with curly

braces that tproc replaces with some other content during

expansion process. For example:

```python

@email

info@{domain}

@domain

example.com

```

Such simplest replacement fields contain the name of a text

definition or of a custom generator (which is the same). But they

in fact can be arbitrary expressions:

```python

@

import time

@main

Happy {time.strftime('%A')}!

```

On Fridays this results into:

```

Happy Friday!

```

Note that the value of a replacement field is evaluated every

time the field is expanded, and it is expanded every time tproc

encounters its invocation, so such values are never cached. This

allows generators to produce different content for different

invocations, like in this example:

```python

@

counter = 0

def count():

    global counter

    yield '%d' % counter

    counter += 1

@main

{count} {count} {count}

```

Output:

```

0 1 2

```

To guarantee reproducible results invocations of replacement

fields are always processed in the left-to-right order.

## Format specifiers

In addition to value expressions, replacement fields may contain

format specifiers:

```python

@title

ESIO TROT

@main

{title:-^15}

```

Generates:

```

---ESIO TROT---

```

As you may guess, the syntax of format specifiers is the same as

for the lovely `format()` function.

## Passing data to generators

In replacement fields, portions of data delimited with colons may

follow (possibly empty) format specifiers. Each such piece of

data will then be passed as an argument to the generator. For

example:

```python

@

def section(title, body):

    yield ''

    yield ''

    for chunk in title:

        yield chunk

    yield ''

    yield ''

    for chunk in body:

        yield chunk

    yield ''

    yield ''

@main

{section::NAME:tproc - A text processor}

{section::SYNOPSIS:tproc [-e DEFINITION] [infile] [outfile]}

```

This gives:

```

NAMEtproc - A text processor

SYNOPSIStproc [-e DEFINITION] [infile] [outfile]

```

And of course such arguments can nest and each of the nested

arguments gets expanded before passing to the generator:

```python

@

def p(body):

    yield '
'

    for chunk in body:

        yield chunk

    yield '
'

def i(body):

    yield ''

    for chunk in body:

        yield chunk

    yield ''

@main

{p::It is {i::crucial} to support nested arguments.}

```

## Escape sequences

To support nested arguments it is necessary that curly braces and

colons preserve their special meaning everywhere within bodies of

text definitions. But that also means there should be a way to

specify the brace and colon characters in its literal meaning,

that is, as part of the body text. Escape sequences is the way to

do that.

Escape sequences start with slash (`\`) followed by the character

to escape. For example:

```

@

@main

This example:

{code::

#include 

int main() \{

    std\:\:cout << "@ Hey! @" << std\:\:endl;

\}

}

just prints:

\@ Hey! \@

@

def code(source):

    yield '```'

    for chunk in source: yield chunk

    yield '```'

```

To represent non-printable characters and for better

interchangeability with other sources and consumers of textual

data, tproc also supports the standard C escape sequences:

`\\` `\'` `\"` `\a` `\b` `\f` `\n` `\r` `\t` `\v`

## Tokens

Consider this:

```python

@main

'{echo:: {echo:: \: } }'

@

def echo(content):

    return content

```

The code seems obvious: the inner `echo` invocation gets expanded

into a colon character surrounded by spaces, which then becomes

the argument of the outer invocation that too replicates the

colon adding some more spaces around it, resulting in:

```

'  :  '

```

However, if the inner `echo` gets its argument containing the

colon in its literal de-escaped form, which is so, then why that

colon character doesn't work as an argument delimiter when it's

passed to the outer `echo`?

The answer is that before an expansion takes place, all

characters that form the sequence to expand are converted into

tokens. Curly braces designating bounds of replacement fields and

colons separating format specifiers and arguments within them

become delimiter tokens and all other data becomes literal

tokens. Being parsed, tokens preserve their meaning until the

very end of the expansion process, so once the escaped colon

character in the example above becomes part of a literal token,

it will always be considered as part of text, and not as a

delimiter.

Let's change the example a bit to see what the generators

actually get:

```python

@main

{eat:: '{outer:: {inner:: \: } }' }

@

inner_chunks = []

outer_chunks = []

def inner(content):

    for chunk in content:

        inner_chunks.append(chunk)

        yield chunk

def outer(content):

    for chunk in content:

        outer_chunks.append(chunk)

        yield chunk

def eat(content):

    for chunk in content:

        pass

    print('inner: %r' % inner_chunks)

    print('outer: %r' % outer_chunks)

    yield ''

```

The output:

```

inner: [, , ]

outer: [, , , , ]

```

For both the inner and outer invocations the content is a

sequence of literal tokens containing spaces and colon

characters. Curly braces and colons that work as delimiters are

consumed and processed by tproc accordingly to their meaning.

In terms of code, literal tokens are instances of class

`LiteralToken` that have a public member `.content` that stores

the literal as a string.

# Generation of non-text data

As we already said, the value of a replacement field can be any

expression. If it evaluates to something callable, it is called

and the returned value is considered as the field value. Then, if

the value is a generator, it becomes the source of the value

chunks. Any other values are converted into literal tokens with

the `.content` field storing the original value.

Here's how it works:

```python

@content

{55} {[5, 7, 9]} {tuple(range(3))} {'{year}'}

# {lambda\: [(yield [11] * 5)]}

@year

2018

@main

{dump::{content}}

@

def dump(content):

    for chunk in content:

        print('%r' % chunk)

    yield ''

```

The values of the replacement fields in `content` are evaluated

and expanded, and then passed to `dump` as a sequence of literal

tokens:

```

```

On full expansion, tokens are converted back to their literals and appear

in the resulting output in their stringized form:

```python

@main

{55} {[5, 7, 9]} {tuple(range(3))} {'{year}'}

# {lambda\: [(yield [11] * 5)]}

@year

2018

```

```

55 [5, 7, 9] (0, 1, 2) 2018

# [11, 11, 11, 11, 11]

```

Using nested replacements lists that expand into non-text data

makes it possible to translate custom markups directly into

Python data structures. For example:

```python

@main

{section::TITLE:

{p::

First paragraph.}

{p::

Second paragraph.}

}

@

def collect(tokens):

    return [x.content for x in tokens]

def p(body):

    yield ('p', collect(body))

def section(title, body):

    yield ('section', collect(title), collect(body))

```

Results in:

```

('section', ['TITLE'], ['\n', ('p', ['\nFirst paragraph.']), '\n', ('p', ['\nSecond paragraph.']), '\n'])

```

## Namespaces and processor objects

Every processor instance has its own space for global names. This

namespace is independent of the tproc's code namespace so users

are free to name their generators and other global entities as

they like.

The only name that comes predefined in the input's code namespace

is `tproc`. That name refers to the processor object that handles

the input source. Through this name the input code can access the

public API of the processor class described in the corresponding

section below. For example, `tproc.LiteralToken` refers to the

type of tokens passed to generators that have arguments:

```python

@main

{'%r' % tproc.LiteralToken}

```

```

```

## API

### `tproc.LiteralToken`

* `LiteralToken.content`

  Contains the literal of the token as a string.

### `tproc.Processor`

* `Processor.expand(input)`

   Returns a generator producing a fully expanded input. The

   `input` parameter is a generator of source data.

* `Processor.LiteralToken`

   The type of literal tokens. See `tproc.LiteralToken`.

## Basic design principles

* Input files are Python programs, presented in a form suitable

for text processing. They may import, define and execute

arbitrary Python code as they get processed. They may define a

`main()` function to implement the default action.

* All sources of input data, including text definitions, are

Python generators. Similarly, the `Processor.expand()` method is

a generator producing output data. The data is consumed and

generated in chunks that may be of any type and size. String

chunks are subject to expansion. Chunks of other types are passed

to the output without any additional processing unless the they

constitute an input of a custom generator.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kosarev/tproc

Awesome Lists containing this project

README