Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/technologicat/pydialect

Build languages on Python.
https://github.com/technologicat/pydialect

dialects macropy metaprogramming programming-language-development python python3 python34 syntactic-macros

Last synced: about 2 months ago
JSON representation

Build languages on Python.

Awesome Lists containing this project

README

        

# Pydialect: build languages on Python

**NOTE** *April 2021: This project is **obsolete**. This technology now lives on in [`mcpyrate`](https://github.com/Technologicat/mcpyrate), and the example dialects can be found in [`unpythonic`](https://github.com/Technologicat/unpythonic).*

```python
from __lang__ import lispython

def fact(n):
def f(k, acc):
if k == 1:
return acc
f(k - 1, k*acc)
f(n, acc=1)
assert fact(4) == 24
print(fact(5000))
```

Pydialect makes Python into a language platform, à la [Racket](https://racket-lang.org/).
It provides the plumbing that allows to create, in Python, dialects that compile into Python
at import time. Pydialect is geared toward creating languages that extend Python
and look almost like Python, but extend or modify its syntax and/or semantics.
Hence *dialects*.

As examples, we currently provide the following dialects:

- [**Lispython**: Python with tail-call optimization (TCO), implicit return, multi-expression lambdas](lispython/)
- [**Pytkell**: Python with automatic currying and lazy functions](pytkell/)
- [**LisThEll**: Python with prefix syntax and automatic currying](listhell/)

All three dialects support [unpythonic's](https://github.com/Technologicat/unpythonic)
``continuations`` block macro (to add ``call/cc`` to the language), but do not enable it automatically.
Lispython aims at production quality; the others are intended just for testing.

Pydialect itself is only a lightweight infrastructure hook that makes
it convenient to define and use dialects. To implement the actual semantics
for your dialect (which is where all the interesting things happen), you may
want to look at [MacroPy](https://github.com/azazel75/macropy). Examples can be
found in [unpythonic](https://github.com/Technologicat/unpythonic); see especially
the macros. On packaging a set of semantics into a dialect, look at the example
dialects; all three are thin wrappers around ``unpythonic``.

Note what Pydialect does is similar to the rejected [PEP 511](https://www.python.org/dev/peps/pep-0511/),
but via import hooks, as indeed suggested in the rejection notice. Thus, beside dialects proper,
it is possible to use Pydialect to hook in a custom AST optimizer, by defining a dialect whose
``ast_transformer`` is actually an optimizer. For ideas, see [here](http://compileroptimizations.com/).
Some possibilities are e.g. constant folding, hoisting, if optimization and loop unrolling.

### Why dialects?

An extension to the Python language doesn't need to make it into the Python core,
*or even be desirable for inclusion* into the Python core, in order to be useful.

Building on functions and syntactic macros, customization of the language itself
is one more tool for the programmer to extract patterns, at a higher level.
Hence, beside language experimentation, such extensions can be used as a
framework that allows shorter and/or more readable programs.

Pydialect places language-creation power in the hands of its users, without the
need to go to extreme lengths to hack CPython itself or implement from scratch
a custom language that compiles to Python AST or bytecode.

Pydialect dialects compile to Python and are implemented in Python, allowing
the rest of the user program to benefit from new versions of Python, mostly
orthogonally to the development of any dialect.

At its simplest, a custom dialect can alleviate the need to spam a combination
of block macros in every module of a project that uses a macro-based language
extension, such as ``unpythonic.syntax``. Being named as a dialect, a particular
combination of macros becomes instantly recognizable,
and [DRY](https://en.wikipedia.org/wiki/Don't_repeat_yourself):
the dialect definition becomes the only place in the codebase that defines
the macro combination to be used by each module in the project.

The same argument applies to custom builtins: any functions or macros that
feel like they "should be" part of the language layer, so that they won't
have to be explicitly imported in each module where they are used.

### Using dialects

Place a **lang-import** at the start of your module that uses a dialect:

```python
from __lang__ import piethon
```

Run your program (in this example written in the ``piethon`` dialect)
through the ``pydialect`` bootstrapper instead of ``python3`` directly,
so that the main program gets imported instead of run directly, to trigger
the import hook that performs the dialect processing. (Installing Pydialect
will install the bootstrapper.)

Any imported module that has a *lang-import* will be detected, and the appropriate
dialect module (if and when found) will be invoked. The result is then sent to
the macro expander (if MacroPy is installed and the code uses macros at that
point), after which the final result is imported normally.

The lang-import must appear as the first statement of the module; only the
module docstring is allowed to appear before it. This is to make it explicit
that **a dialect applies to the whole module**. (Local changes to semantics
are better represented as a block macro.)

At import time, the dialect importer replaces the lang-import with an
assignment that sets the module's ``__lang__`` attribute to the dialect name,
for introspection. If a module does not have a ``__lang__`` attribute at
run time, then it was not compiled by Pydialect. Note that just like with
MacroPy, at run time the code is pure Python.

The lang-import is a construct specific to Pydialect. This ensures that the
module will immediately fail if run under standard Python, because there is
no actual module named ``__lang__``.

If you use MacroPy, the Pydialect import hook must be installed at index ``0``
in ``sys.meta_path``, so that the dialect importer triggers before MacroPy's
standard macro expander. The ``pydialect`` bootstrapper takes care of this.
If you need to enable Pydialect manually for some reason, the incantation
to install the hook is ``import dialects.activate``.

The lang-import syntax was chosen as a close pythonic equivalent to Racket's
``#lang foo``.

### Defining a dialect

In Pydialect, a dialect is any module that provides one or both of the following
callables:

- ``source_transformer``: source text -> source text

The **full source code** of the module being imported (*including*
the lang-import) is sent to the the source transformer. The data type
is whatever the loader's ``get_source`` returns, usually ``str``.

Source transformers are useful e.g. for defining custom infix
operators. For example, the monadic bind syntax ``a >>= b``
could be made to transform into the syntax ``a.__mbind__(b)``.

Although the input is text, in practice a token-based approach is
recommended; see stdlib's ``tokenize`` module as a base to work from.
(Be sure to untokenize when done, because the next stage expects text.)

**After the source transformer**, the source text must be valid
surface syntax for **standard Python**, i.e. valid input for
``ast.parse``.

- ``ast_transformer``: ``list`` of AST nodes -> ``list`` of AST nodes

After the source transformer, but before macro expansion, the full AST
of the module being imported (*minus* the module docstring and the
lang-import) is sent to this whole-module AST transformer.

This allows injecting implicit imports to create builtins for the
dialect, as well as e.g. lifting the whole module (except the docstring
and the code to set ``__lang__``) into a ``with`` block to apply
some MacroPy block macro(s) to the whole module.

**After the AST transformer**, the module is sent to MacroPy for
macro expansion (if MacroPy is installed, and the module has macros
at that point), and after that, the result is finally imported normally.

The AST transformer can use MacroPy if it wants, but doesn't have to; this
decision is left up to each developer implementing a dialect.

If you make an AST transformer, and have MacroPy, then see ``dialects.util``,
which can help with the boilerplate task of pasting in the code from the
user module (while handling macro-imports correctly in both the dialect
template and in the user module).

**The name** of a dialect is simply the name of the module or package that
implements the dialect. In other words, it's the name that needs to be imported
to find the transformer functions.

Note that a dotted name in place of the ``xxx`` in ``from __lang__ import xxx``
is not valid Python syntax, so (currently) **a dialect should be defined in a
top-level module** (no dots in the name). Strictly, the dialect finder doesn't
need to care about this (though it currently does), but IDEs and tools in
general are much happier with code that does not contain syntax errors.
(This allows using standard Python tools with dialects that do not introduce
any new surface syntax.)

A dialect can be implemented using another dialect, as long as there are no
dependency loops. *Whenever* a lang-import is detected, the dialect importer
is invoked (especially, also during the import of a module that defines a
new dialect). This allows creating a tower of languages.

### Combining existing dialects

*Dangerous things should be difficult to do _by accident_.* --[John Shutt](http://fexpr.blogspot.com/2011/05/dangerous-things-should-be-difficult-to.html)

Due to the potentially unlimited complexity of interactions between language
features defined by different dialects, there is *by design* no automation for
combining dialects. In the general case, this is something that requires human
intervention.

If you know (or at least suspect) that two or more dialects are compatible,
you can define a new dialect whose ``source_transformer`` and ``ast_transformer``
simply chain those of the existing dialects (in the desired order; consider how
the macros expand), and then use that dialect.

### When to make a dialect

Often explicit is better than implicit. There is however a tipping point with
regard to complexity, and/or simply length, after which implicit becomes
better.

This already applies to functions and macros; code in a ``with continuations``
block (see macros in [unpythonic](https://github.com/Technologicat/unpythonic))
is much more readable and maintainable than code manually converted to
continuation-passing style (CPS). There's obviously a tradeoff; as PG reminds
in [On Lisp](http://paulgraham.com/onlisp.html), each abstraction is another
entity for the reader to learn and remember, so it must save several times
its own length to become an overall win.

So, when to make a dialect depends on how much it will save (in a project or
across several), and on the other hand on how important it is to have a shared
central definition that specifies a "language-level" common ground for a set of
user modules.

### Dialect implementation considerations

A dialect can do anything from simply adding some surface syntax (such as a
monadic bind operator), to completely changing Python's semantics, e.g. by adding
automatic tail-call optimization, continuations, and/or lazy functions. The only
limitation is that the desired functionality must be (macro-)expressible in Python.

The core of a dialect is defined as a set of functions and macros, which typically
live inside a library. The role of the dialect module is to package that core
functionality into a whole that can be concisely loaded with a single lang-import.

Typically a dialect implicitly imports its core functions and macros, to make
them appear as builtins (in the sense of *defined by default*) to modules that
use the dialect.

A dialect may also define non-core functions and macros that live in the same
library; those essentially comprise *the standard library* of the dialect.

For example, the ``lispython`` dialect itself is defined by a subset of
``unpythonic`` and ``unpythonic.syntax``; the rest of the library is available
to be imported manually; it makes up the standard library of ``lispython``.

For macros, Pydialect supports MacroPy. Technically, macros are optional,
so Pydialect's dependency on MacroPy is strictly speaking optional.
Pydialect already defines a hook for a full-module AST transform;
if that (and/or a source transform) is all you need, there's no need
to have your dialect depend on MacroPy.

However, where MacroPy shines is its infrastructure. It provides a uniform
syntax to direct AST transformations to apply to particular expressions or
blocks in the user code, and it provides a hygienic quasiquote system, which
is absolutely essential for avoiding inadvertent name conflicts (identifier
capture, free variable injection). It's also good at fixing missing source
location info for macro-generated AST nodes, which is extremely useful, since
in Python the source location info is compulsory for every AST node.

Since you're reading this, you probably already know, but be aware that, unlike
how it was envisioned during the *extensible languages* movement in the 1960s-70s,
language extension is hardly an exercise requiring only *modest amounts of labor
by unsophisticated users* [[1]](http://fexpr.blogspot.com/2013/12/abstractive-power.html).
Especially the interaction between different macros needs a lot of thought,
and as the number of language features grows, the complexity skyrockets.
(For an example, look at what hoops other parts of ``unpythonic`` must jump
through to make ``lazify`` happy.) Seams between parts of the user program
that use or do not use some particular feature (or a combination of features)
also require special attention.

Python is a notoriously irregular language, not to mention a far cry from
homoiconic, so it is likely harder to extend than a Lisp. Be prepared for some
head-scratching, especially when dealing with function call arguments,
assignments and the two different kinds of function definitions (``def`` and
``lambda``, one of which supports the full Python language while the other is
limited to the expression sublanguage).
[Green Tree Snakes - the missing Python AST docs](https://greentreesnakes.readthedocs.io/en/latest/index.html)
is a highly useful resource here.

Be sure to understand the role of ``ast.Expr`` (the *expression statement*) and
its implications when working with MacroPy. (E.g. ``ast_literal[tree]`` by itself
in a block-quasiquote is an ``Expr``, where the value is the ``Subscript``
``ast_literal[tree]``. This may not be what you want, if you're aiming to
splice in a block of statements, but it's unavoidable given Python's AST
representation.

Dialects implemented via macros will mainly require maintenance when Python's
AST representation changes (if incompatible changes or interesting new features
hit a relevant part). Large parts of the AST representation have remained
stable over several of the latest minor releases, or even all of Python 3.
Perhaps most notably, the handling of function call arguments changed in an
incompatible way in 3.5, along with introducing MatMult and the async
machinery. The other changes between 3.4 (2014) and 3.7 (2018) are just
a couple of new node types.

Long live [language-oriented programming](https://en.wikipedia.org/wiki/Language-oriented_programming),
and have fun!

### Notes

``dialects/importer.py`` is based on ``core/import_hooks.py`` in MacroPy 1.1.0b2,
and was then heavily customized.

In the Lisp community, surface syntax transformations are known as *reader macros*
(although technically it's something done at the parsing step, unrelated to
syntactic macros).

**Further reading**:

- [Hacking Python without hacking Python](http://stupidpythonideas.blogspot.com/2015/06/hacking-python-without-hacking-python.html)
- [Operator sectioning for Python](http://stupidpythonideas.blogspot.com/2015/05/operator-sectioning-for-python.html)
- [How to use loader and finder objects in Python](http://www.robots.ox.ac.uk/~bradley/blog/2017/12/loader-finder-python.html)

**Ground-up extension efforts** that replace Python's syntax:

- [Dogelang](https://pyos.github.io/dg/), Python with Haskell-like syntax,
compiles to Python bytecode.
- [Hy](http://docs.hylang.org/en/stable/), a Lisp-2 that compiles to Python AST.