https://github.com/c42f/julialowering.jl

Julia code lowering with precise provenance
https://github.com/c42f/julialowering.jl
Last synced: about 1 year ago
JSON representation
Julia code lowering with precise provenance
Host: GitHub
URL: https://github.com/c42f/julialowering.jl
Owner: c42f
License: mit
Created: 2024-03-25T23:46:21.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-05-22T10:08:08.000Z (about 2 years ago)
Last Synced: 2024-05-22T11:28:22.950Z (about 2 years ago)
Language: Julia
Size: 220 KB
Stars: 41
Watchers: 5
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # JuliaLowering

[![Build Status](https://github.com/c42f/JuliaLowering.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/c42f/JuliaLowering.jl/actions/workflows/CI.yml?query=branch%3Amain)

JuliaLowering.jl is an experimental port of Julia's code lowering compiler

passes, written in Julia itself. "Code lowering" is the set of compiler passes

which *symbolically* transform and simplify Julia's syntax prior to type

inference.

## Goals

This work is intended to

* Bring precise code provenance to Julia's lowered form (and eventually

  downstream in type inference, stack traces, etc). This has many benefits

    - Talk to users precisely about their code via character-precise error and

      diagnostic messages from lowering 

    - Greatly simplify the implementation of critical tools like Revise.jl

      which rely on analyzing how the user's source maps to the compiler's data

      structures

    - Allow tools like JuliaInterpreter to use type-inferred and optimized

      code, with the potential for huge speed improvements.

* Bring improvements for macro authors

    - Prototype "automatic hygiene" (no more need for `esc()`!)

    - Precise author-defined error reporting from macros

    - Sketch better interfaces for syntax trees (hopefully!)

## Trying it out

Note this is a work in progress; many types of syntax are not yet handled.

1. You need a 1.12-DEV build of Julia: At least 1.12.0-DEV.512. Commit `263928f9ad4` is currentl known to work. Note that JuliaLowering relies on Julia internals and may be broken on the latest Julia dev version from time to time. (In fact it is currently broken on the latest `1.12-DEV`.)

2. Check out the main branch of [JuliaSyntax](https://github.com/JuliaLang/JuliaSyntax.jl)

3. Get the latest version of [JuliaSyntaxFormatter](https://github.com/c42f/JuliaSyntaxFormatter.jl)

4. Run the demo `include("test/demo.jl")`

# Design notes

## Syntax trees

Want something something better than `JuliaSyntax.SyntaxNode`! `SyntaxTree` and

`SyntaxGraph` provide this. Some future version of these should end up in

`JuliaSyntax`.

We want to allow arbitrary attributes to be attached to tree nodes by analysis

passes. This separates the analysis pass implementation from the data

structure, allowing passes which don't know about each other to act on a shared

data structure.

Design and implementation inspiration comes in several analogies:

Analogy 1: the ECS (Entity-Component-System) pattern for computer game design.

This pattern is highly successful because it separates game logic (systems)

from game objects (entities) by providing flexible storage

* Compiler passes are "systems"

* AST tree nodes are "entities"

* Node attributes are "components"

Analogy 2: The AoS to SoA transformation. But here we've got a kind of

tree-of-structs-with-optional-attributes to struct-of-Dicts transformation.

The data alignment / packing efficiency and concrete type safe storage benefits

are similar.

Analogy 3: Graph algorithms which represent graphs as a compact array of node

ids and edges with integer indices, rather than using a linked data structure.

### References

Sander Mertens, the author of the Flecs ECS has a blog post series discussing

ECS data structures and the many things that may be done with them. We may want

to use some of these tricks to make `SyntaxTree` faster, eventually. See, for

example,

[Building Games in ECS with Entity Relationships](https://ajmmertens.medium.com/building-games-in-ecs-with-entity-relationships-657275ba2c6c)

### Structural assertions / checking validity of syntax trees

Syntax trees in Julia `Expr` form are very close to lisp lists: a symbol at the

`head` of the list which specifies the syntactic form, and a sequence of

children in the syntax tree. This is a representation which `JuliaSyntax` and

`JuliaLowering` follow but it does come with certain disadvantages. One of the

most problematic is that the number of children affects the validity (and

sometimes semantics) of an AST node, as much as the `head` symbol does.

In `JuliaSyntax` we've greatly reduced the overloading of `head` in order to

simplify the interpretation of child structures in the tree. For example,

broadcast calls like `f.(x,y)` use the `K"dotcall"` kind rather than being a

node with `head == Symbol(".")` and a tuple as children.

However, there's still many ways for lowering to encounter invalid expressions

of type `SyntaxTree` and these must be checked. In JuliaSyntax we have several

levels of effort corresponding to the type of errors conditions we desire to

check and report:

* For invalid syntax which is accepted by the `JuliaSyntax`

  parser but is invalid in lowering we use manual `if` blocks followed by

  throwing a `LoweringError`. This is more programming effort but allows for

  the highest quality error messages for the typical end user.

* For invalid syntax which can only be produced by macros (ie, not by the

  parser) we mostly use the `@chk` macro. This is a quick tool for validating

  input but gives lesser quality error messages.

* For JuliaLowering's internal invariants we just use `@assert` - these should

  never be hit and can be compiled out in principle.

## Provenance tracking

Expression provenance is tracked through lowering by attaching provenance

information in the `source` attribute to every expression as it is generated.

For example when parsing a source file we have

```julia

julia> ex = parsestmt(SyntaxTree, "a + b", filename="foo.jl")

SyntaxTree with attributes kind,value,name_val,syntax_flags,source

[call-i]                                │ 

  a                                     │ 

  +                                     │ 

  b                                     │ 

julia> ex[3].source

a + b

#   ╙ ── these are the bytes you're looking for 😊

```

The `provenance` function should be used to look up the `source` attribute and

the `showprov` function used to inspect the content (this is preferred because

the encoding of `source` is an implementation detail). For example:

```julia

julia> showprov(ex[3])

a + b

#   ╙ ── in source

# @ foo.jl:1

```

During macro expansion and lowering provenance gets more complicated because an

expression can arise from multiple sources. For example, we want to keep track

of the entire stack of macro expansions an expression was generated by, while

also recording where it occurred in the original source file.

For this, we use a tree data structure. Let's look at the following pair of

macros

```julia

julia> JuliaLowering.include_string(Main, raw"""

       module M

           macro inner()

               :(2)

           end

           macro outer()

               :((1, @inner))

           end

       end

       """, "some_macros.jl")

```

The tree which arises from macro expanding this is pretty simple:

```julia

julia> expanded = JuliaLowering.macroexpand(Main, parsestmt(SyntaxTree, "M.@outer()"))

SyntaxTree with attributes scope_layer,kind,value,var_id,name_val,syntax_flags,source

[tuple-p]                               │ 

  1                                     │ 

  2                                     │ 

```

but the provenance information recorded for the second element `2` of this

tuple is not trivial; it includes the macro call expressions for `@inner` and

`@outer`. We can show this in tree form:

```julia

julia> showprov(expanded[2], tree=true)

2

├─ 2

│  └─ @ some_macros.jl:3

└─ (macrocall @inner)

   ├─ (macrocall @inner)

   │  └─ @ some_macros.jl:7

   └─ (macrocall-p (. M @outer))

      └─ @ foo.jl:1

```

or as a more human readable flattened list highlighting of source ranges:

```julia

module M

    macro inner()

        :(2)

#         ╙ ── in source

    end

# @ some_macros.jl:3

    macro outer()

        :((1, @inner))

#             └────┘ ── in macro expansion

    end

end

# @ some_macros.jl:7

M.@outer()

└────────┘ ── in macro expansion

# @ foo.jl:1

```

## Problems with Hygiene in Julia's exiting macro system

To write correct hygienic macros in Julia (as of 2024), macro authors must use

`esc()` on any any syntax passed to the macro so that passed identifiers escape

to the macro caller scope. However

* This is not automatic and the correct use of `esc()` is one of the things

  that new macro authors find most confusing. (My impression, based on various

  people complaining about how confusing `esc()` is.)

* `esc()` wraps expressions in `Expr(:escape)`, but this doesn't work well when

  macros pass such escaped syntax to an inner macro call. As discussed in

  [Julia issue #37691](https://github.com/JuliaLang/julia/issues/37691), macros

  in Julia's existing system are not composable by default. Writing

  composable macros in the existing system would require preserving the escape

  nesting depth when recursing into any macro argument nested expressions.

  Almost no macro author knows how to do this and is prepared to pay for the

  complexity of getting it right.

The requirement to use `esc()` stems from Julia's pervasive use of the simple

`Expr` data structure which represents a unadorned AST in which names are plain

symbols. For example, a macro call `@foo x` gets passed the  symbol `:x`

which is just a name without any information attached to indicate that it came

from the scope where `@foo` was called.

### Hygiene References

* [Toward Fearless Macros](https://lambdaland.org/posts/2023-10-17_fearless_macros) -

  a blog post by Ashton Wiersdorf

* [Towards the Essence of Hygiene](https://michaeldadams.org/papers/hygiene/hygiene-2015-popl-authors-copy.pdf) - a paper by Michael Adams

* [Bindings as sets of scopes](https://www-old.cs.utah.edu/plt/scope-sets/) - a description of Racket's scope set mechanism by Matthew Flatt

# Overview of lowering passes

JuliaLowering uses six symbolic transformation passes:

1. Macro expansion - expanding user-defined syntactic constructs by running the

   user's macros. This pass also includes a small amount of other symbolic

   simplification.

2. Syntax desugaring - simplifying Julia's rich surface syntax down to a small

   number of syntactic forms.

3. Scope analysis - analyzing identifier names used in the code to discover

   local variables, closure captures, and associate global variables to the

   appropriate module. Transform all names (kind `K"Identifier"`) into binding

   IDs (kind `K"BindingId"`) which can be looked up in a table of bindings.

4. Closure conversion - convert closures to types and deal with captured

   variables efficiently where possible.

5. Flattening to untyped IR - convert code in hierarchical tree form to a

   flat array of statements; convert control flow into gotos.

6. Convert untyped IR to `CodeInfo` form for integration with the Julia runtime.

## Pass 1: Macro expansion

This pass expands macros and quoted syntax, and does some very light conversion

of a few syntax `Kind`s in preparation for syntax desugaring.

### Hygiene in JuliaLowering

In JuliaLowering we make hygiene automatic and remove `esc()` by combining names

with scope information. In the language of the paper [*Towards the Essence of

Hygiene*](https://michaeldadams.org/papers/hygiene/hygiene-2015-popl-authors-copy.pdf)

by Michael Adams, this combination is called a "syntax object". In

JuliaLowering our representation is the tuple `(name,scope_layer)`, also called

`VarId` in the scope resolution pass.

JuliaLowering's macro expander attaches a unique *scope layer* to each

identifier in a piece of syntax. A "scope layer" is an integer identifer

combined with the module in which the syntax was created.

When expanding macros,

* Any identifiers passed to the macro are tagged with the scope layer they were

  defined within.

* A new unique scope layer is generated for the macro invocation, and any names

  in the syntax produced by the macro are tagged with this layer.

Subsequently, the `(name,scope_layer)` pairs are used when resolving bindings.

This ensures that, by default, we satisfy the basic rules for hygenic macros

discussed in Adams' paper:

1. A macro can't insert a binding that can capture references other than those

   inserted by the macro.

2. A macro can't insert a reference that can be captured by bindings other than

   those inserted by the macro.

TODO: Write more here...

## Pass 2: Syntax desugaring

This pass recursively converts many special surface syntax forms to a smaller

set of syntax `Kind`s, following the AST's hierarchical tree structure. Some

such as `K"scope_block"` are internal to lowering and removed during later

passes. See `kinds.jl` for a list of these internal forms.

This pass is implemented in `desugaring.jl`. It's quite large because Julia has

many special syntax features.

### Desugaring of function definitions

Desugaring of function definitions is particularly complex because of the cross

product of features which need to work together consistently:

* Positional arguments (with and without defaults, with and without types)

* Keyword arguments (with and without defaults, with and without types)

* Type parameters with `where` syntax

* Argument slurping syntax with `...`

* Fancy arguments (argument destructuring)

The combination of positional arguments with defaults and keyword arguments is

particularly complex. Here's an example.  Suppose we're given the function

definition

```julia

function f(a::A=a_default, b::B=b_default; x::X=x_default,y::Y=y_default)

    body

end

```

This generates

* One method of `f` for each number of positional arguments which can be

  called when `f` is called without keyword args

* One overload of `Core.kwcall(kws, ::typeof(f), ...)` for each number of

  positional arguments (when called with a nonzero number of keyword args; the

  tuple `kws` being constructed by the caller)

* One internal method for the body of the function (we can call it `f_kw`

  though it will be named something like `#f#18`)

First, partially expanding the kw definitions this roughly looks like

```julia

function f_kw(x::X, y::X, f_self::typeof(f), a::A, b::B)

    body

end

function f(a::A=a_default, b::B=b_default)

    f_kw(x_default, y_default, var"#self#", a, b)

end

function Core.kwcall(kws::NamedTuple, self::typeof(f), a::A=a_default, b::B=b_default)

    if Core.isdefined(kws, :x)

        x_tmp = Core.getfield(kws, :x)

        if x_tmp isa X

            nothing

        else

            Core.throw($(Expr(:new, Core.TypeError, Symbol("keyword argument"), :x, X, x_tmp)))

        end

        x = x_tmp

    else

        x = 1

    end

    if Core.isdefined(kws, :y)

        y_tmp = Core.getfield(kws, :y)

        if y_tmp isa Y

            nothing

        else

            Core.throw($(Expr(:new, Core.TypeError, Symbol("keyword argument"), :y, Y, y_tmp)))

        end

        y = y_tmp

    else

        y = 2

    end

    if Base.isempty(Base.diff_names(Base.keys(kws), (:x, :y)))

        nothing

    else

        # Else unsupported kws

        Base.kwerr(kws, self, a, b)

    end

    f_kw(x, y, self, a, b)

end

```

We can then pass this to function expansion for default arguments which expands

each of the above into three more methods. For example, for the first

definition we conceptually expand `f(a::A=a_default, b::B=b_default)` into the

methods

```julia

# The body

function f(a::A, b::B)

    f_kw(x_default, y_default, var"#self#", a, b)

end

# And two methods for the different numbers of default args

function f(a::A)

    var"#self#"(a, b_default)

end

function f()

    var"#self#"(a_default, b_default)

end

```

In total, this expands a single "function definition" into seven methods.

Note that the above is only a sketch! There's more fiddly details when `where`

syntax comes in

### Desugaring of generated functions

A brief description of how this works. Let's consider the generated function

```julia

function gen(x::NTuple{N}, y) where {N,T}

    shared = :shared

    # Unnecessary use of @generated, but it shows what's going on.

    if @generated

        quote

            maybe_gen = ($x, $N)

        end

    else

        maybe_gen = (typeof(x), N)

    end

    (shared, maybe_gen)

end

```

This is desugared into the following two function definitions. First, a code

generator which will generate code for the body of the function, given the

static parameters `N`, `T` and the positional arguments `x`, `y`.

(`var"#self#"::Type{typeof(gen)}` is also provided by the Julia runtime to

complete the full signature of `gen`, though the user won't normally use this.)

```julia

function var"#gen@generator#0"(__context__::JuilaSyntax.MacroContext, N, T, var"#self#", x, y)

    gen_stuff = quote

        maybe_gen = ($x, $N)

    end

    quote

        shared = :shared

        $gen_stuff

        (shared, maybe_gen)

    end

end

```

Second, the non-generated version, using the `if @generated` else branches, and

containing mostly normal code.

```julia

function gen(x::NTuple{N}, y) where {N,T}

    $(Expr(:meta, :generated,

        Expr(:call, JuliaLowering.GeneratedFunctionStub,

             :var"#gen@generator#0", sourceref_of_gen,

             :(Core.svec(:var"#self", :x, :y))

             :(Core.svec(:N, :T)))))

    shared = :shared

    maybe_gen = (typeof(x), N)

    (shared, maybe_gen)

end

```

The one extra thing added here is the `Expr(:meta, :generated)` which is an

expression creating a callable wrapper for the user's generator, to be

evaluated at top level. This wrapper will then be invoked by the runtime

whenever the user calls `gen` with a new signature and it's expected that a

`CodeInfo` be returned from it. `JuliaLowering.GeneratedFunctionStub` differs

from `Core.GeneratedFunctionStub` in that it contains extra provenance

information (the `sourcref_of_gen`) and expects a `SyntaxTree` to be returned

by the user's generator code.

## Pass 3: Scope analysis / binding resolution

This pass replaces variables with bindings of kind `K"BindingId"`,

disambiguating variables when the same name is used in different scopes. It

also fills in the list of non-global bindings within each lambda and metadata

about such bindings as will be used later during closure conversion.

Scopes are documented in the Juila documentation on

[Scope of Variables](https://docs.julialang.org/en/v1/manual/variables-and-scoping/)

During scope resolution, we maintain a stack of `ScopeInfo` data structures.

When a new `lambda` or `scope_block` is discovered, we create a new `ScopeInfo` by

1. Find all identifiers bound or used within a scope. New *bindings* may be

   introduced by one of the `local`, `global` keywords, implicitly by

   assignment, as function arguments to a `lambda`, or as type arguments in a

   method ("static parameters"). Identifiers are *used* when they are

   referenced.

2. Infer which bindings are newly introduced local or global variables (and

   thus require a distinct identity from names already in the stack)

3. Assign a `BindingId` (unique integer) to each new binding

We then push this `ScopeInfo` onto the stack and traverse the expressions

within the scope translating each `K"Identifier"` into the associated

`K"BindingId"`. While we're doing this we also resolve some special forms like

`islocal` by making use of the scope stack.

The detailed rules for whether assignment introduces a new variable depend on

the `scope_block`'s `scope_type` attribute when we are processing top-level

code.

* `scope_type == :hard` (as for bindings inside a `let` block) means an

  assignment always introduces a new binding

* `scope_type == :neutral` - inherit soft or hard scope from the parent scope.

* `scope_type == :soft` - assignments are to globals if the variable

  exists in global module scope. Soft scope doesn't have surface syntax and is

  introduced for top-level code by REPL-like environments.

## Pass 4: Closure conversion / lower bindings

The main goal of this pass is closure conversion, but it's also used for

lowering typed bindings and global assignments. Roughly, this is passes 3 and 4

in the original `julia-syntax.scm`. In JuliaLowering it also comes in two steps:

The first step (part of `scope_resolution.jl`) is to compute metadata related

to bindings, both per-binding and per-binding-per-closure-scope.

Properties which are computed per-binding which can help with symbolic

optimizations include:

* Type is declared (`x::T` syntax in a statement): type conversions must be

  inserted at every assignment of `x`.

* Never undefined: value is always assigned to the binding before being read

  hence this binding doesn't require the use of `Core.NewvarNode`.

* Single assignment: (TODO how is this defined, what is it for and does it go

  here or below?)

Properties of non-globals which are computed per-binding-per-closure include:

* Read: the value of the binding is used.

* Write: the binding is asssigned to.

* Captured: Bindings defined outside the closure which are either Read or Write

  within the closure are "captured" and need to be one of the closure's fields.

* Called: the binding is called as a function, ie, `x()`. (TODO - what is this

  for?)

The second step uses this metadata to

* Convert closures into `struct` types

* Lower bindings captured by closures into references to boxes as necessary

* Deal with typed bindings (`K"decl"`) and their assignments

* Lower const and non-const global assignments

* TODO: probably more here.

### Q&A

#### When does `function` introduce a closure?

Closures are just functions where the name of the function is *local* in scope.

How does the function name become a local? The `function` keyword acts like an

assignment to the function name for the purposes of scope resolution. Thus

`function f() body end` is rather like `f = ()->body` and may result in the

symbol `f` being either `local` or `global`. Like other assignments, `f` may be

declared global or local explicitly, but if not `f` is subject to the usual

rules for assignments inside scopes. For example, inside a `let` scope

`function f() ...` would result in the symbol `f` being local.

Examples:

```julia

begin

    # f is global because `begin ... end` does not introduce a scope

    function f()

        body

    end

    # g is a closure because `g` is explicitly declared local

    local g

    function g()

        body

    end

end

let

    # f is local so this is a closure becuase `let ... end` introduces a scope

    function f()

        body

    end

    # g is not a closure because `g` is declared global

    global g

    function g()

        body

    end

end

```

#### How do captures work with non-closures?

Yes it's true, you can capture local variables into global methods. For example:

```julia

begin

    local x = 1

    function f(y)

        x + y

    end

    x = 2

end

```

The way this works is to put `x` in a `Box` and interpolate it into the AST of

`f` (the `Box` can be eliminated in some cases, but not here). Essentially this

lowers to code which is almost-equivalent to the following:

```julia

begin

    local x = Core.Box(1)

    @eval function f(y)

        $(x.contents) + y

    end

    x.contents = 2

end

```

#### How do captures work with closures with multiple methods?

Sometimes you might want a closure with multiple methods, but those methods

might capture different local variables. For example,

```julia

let

    x = 1

    y = 1.5

    function f(xx::Int)

        xx + x

    end

    function f(yy::Float64)

        yy + y

    end

    f(42)

end

```

In this case, the closure type must capture both `x` and `y` and the generated

code looks rather like this:

```julia

struct TheClosureType

    x

    y

end

let

    x = 1

    y = 1.5

    f = TheClosureType(x,y)

    function (self::TheClosureType)(xx::Int)

        xx + self.x

    end

    function (self::TheClosureType)(yy::Int)

        yy + self.y

    end

    f(42)

end

```

#### When are `method` defs lifted to top level?

Closure method definitions must be lifted to top level whenever the definitions

appear inside a function. This is allow efficient compilation and avoid world

age issues.

Conversely, when method defs appear in top level code, they are executed

inline.

## Pass 5: Convert to untyped IR

This pass is implemented in `linear_ir.jl`.

### Untyped IR (JuliaLowering form)

JuliaLowering's untyped IR is very close to the runtime's `CodeInfo` form (see

below), but is more concretely typed as `JuliaLowering.SyntaxTree`.

Metadata is generally represented differently:

* The statements retain full code provenance information as `SyntaxTree`

  objects. See `kinds.jl` for a list of which `Kind`s occur in the output IR

  but not in surface syntax.

* The list of slots is `Vector{Slot}`, including `@nospecialize` metadata

### Lowering of exception handlers

Exception handling involves a careful interplay between lowering and the Julia

runtime. The forms `enter`, `leave` and `pop_exception` dynamically modify the

exception-related state on the `Task`; lowering and the runtime work together

to maintain correct invariants for this state.

Lowering of exception handling must ensure that

* Each `enter` is matched with a `leave` on every possible non-exceptional

  program path (including implicit returns generated in tail position).

* Each `catch` block which is entered and handles the exception - by exiting

  via a non-exceptional program path - is matched with a `pop_exception`

* Each `finally` block runs, regardless of the way it's entered - either by

  normal program flow, an exception, early `return` or a jump out of an inner

  context via `break`/`continue`/`goto` etc.

The following special forms are emitted into the IR:

* `(= tok (enter catch_label dynscope))` -

  push exception handler with catch block at `catch_label` and dynamic

  scope `dynscope`, yielding a token which is used by `leave` and

  `pop_exception`. `dynscope` is only used in the special `tryfinally` form

  without associated source level syntax (see the `@with` macro)

* `(leave tok)` -

    pop exception handler back to the state of the `tok` from the associated

    `enter`. Multiple tokens can be supplied to pop multiple handlers using

    `(leave tok1 tok2 ...)`.

* `(pop_exception tok)` - pop exception stack back to state of associated enter

When an `enter` is encountered, the runtime pushes a new handler onto the

`Task`'s exception handler stack which will jump to `catch_label` when an

exception occurs.

There are two ways that the exception-related task state can be restored

1. By encountering a `leave` which will restore the handler state with `tok`.

2. By throwing an exception. In this case the runtime will pop one handler

   automatically and jump to the catch label with the new exception pushed

   onto the exception stack. On this path the exception stack state must be

   restored back to the associated `enter` by encountering `pop_exception`.

Note that the handler and exception stack represent two distinct types of

exception-related state restoration which need to happen. Note also that the

"handler state restoration" actually includes several pieces of runtime state

including GC flags - see `jl_eh_restore_state` in the runtime for that.

#### Lowering finally code paths

When lowering `finally` blocks we want to emit the user's finally code once but

multiple code paths may traverse the finally block. For example, consider the

code

```julia

function foo(x)

    while true

        try

            if x == 1

                return f(x)

            elseif x == 2

                g(x)

                continue

            else

                break

            end

        finally

            h()

        end

    end

end

```

In this situation there's four distinct code paths through the finally block:

1. `return f(x)` needs to call `val = f(x)`, leave the `try` block, run `h()` then

   return `val`.

2. `continue` needs to call `h()` then jump to the start of the while loop

3. `break` needs to call `h()` then jump to the exit of the while loop

4. If an exception occurs in `f(x)` or `g(x)`, we need to call `h()` before

   falling back into the while loop.

To deal with these we create a `finally_tag` variable to dynamically track

which action to take after the finally block exits. Before jumping to the block

we set this variable to a unique integer tag identifying the incoming code

path. At the exit of the user's code (`h()` in this case) we perform the jump

appropriate to the `break`, `continue` or `return` as necessary based on the tag.

(TODO - these are the only four cases which can occur, but, for example,

multiple `return`s create multiple tags rather than assigning to a single

variable. Collapsing these into a single case might be worth considering? But

also might be worse for type inference in some cases?)

## Pass 6: Convert IR to `CodeInfo` representation

This pass convert's JuliaLowering's internal representation of untyped IR into

a form the Julia runtime understands. This is a necessary decoupling which

separates the development of JuliaLowering.jl from the evolution of the Julia

runtime itself.

### Untyped IR (`CodeInfo` form)

The final lowered IR is expressed as `CodeInfo` objects which are a sequence of

`code` statments containing

* Literals

* Restricted forms of `Expr` (with semantics different from surface syntax,

  even for the same `head`! for example the arguments to `Expr(:call)` in IR

  must be "simple" and aren't evaluated in order)

* `Core.SlotNumber` 

* Other special forms from `Core` like `Core.ReturnNode`, `Core.EnterNode`, etc.

* `Core.SSAValue`, indexing any value generated from a statement in the `code`

  array.

* Etc (todo)

The IR obeys certain invariants which are checked by the downstream code in

base/compiler/validation.jl.

See also https://docs.julialang.org/en/v1/devdocs/ast/#Lowered-form

CodeInfo layout (as of early 1.12-DEV):

```julia

mutable struct CodeInfo

    code::Vector{Any}             # IR statements

    codelocs::Vector{Int32}       # `length(code)` Vector of indices into `linetable`

    ssavaluetypes::Any            # `length(code)` or Vector of inferred types after opt

    ssaflags::Vector{UInt32}      # flag for every statement in `code`

                                  #   0 if meta statement

                                  #   inbounds_flag - 1 bit (LSB)

                                  #   inline_flag   - 1 bit

                                  #   noinline_flag - 1 bit

                                  #   ... other 8 flags which are defined in compiler/optimize.jl

                                  #   effects_flags - 9 bits

    method_for_inference_limit_heuristics::Any

    linetable::Any

    slotnames::Vector{Symbol}     # names of parameters and local vars used in the code

    slotflags::Vector{UInt8}      # vinfo flags from flisp

    slottypes::Any                # nothing (used by typeinf)

    rettype::Any                  # Any (used by typeinf)

    parent::Any                   # nothing (used by typeinf)

    edges::Any

    min_world::UInt64

    max_world::UInt64

    inferred::Bool

    propagate_inbounds::Bool

    has_fcall::Bool

    nospecializeinfer::Bool

    inlining::UInt8

    constprop::UInt8

    purity::UInt16

    inlining_cost::UInt16

end

```

## Notes on toplevel-only forms and eval-related functions

In the current Julia runtime,

`Base.eval()`

- Uses `jl_toplevel_eval_in` which calls `jl_toplevel_eval_flex`

`jl_toplevel_eval_flex(mod, ex)`

- Lowers if necessay

- Evaluates certain blessed top level forms

  * `:.`

  * `:module`

  * `:using`

  * `:import`

  * `:public`

  * `:export`

  * `:global`

  * `:const`

  * `:toplevel`

  * `:error`

  * `:incomplete`

  * Identifier and literals

- Otherwise expects `Expr(:thunk)`

  * Use codegen "where necessary/profitable" (eg ccall, has_loops etc)

  * Otherwise interpret via `jl_interpret_toplevel_thunk`

Should we lower the above blessed top level forms to julia runtime calls?

Pros:

- Semantically sound. Lowering should do syntax checking in things like

  `Expr(:using)` rather than doing this in the runtime support functions.

- Precise lowering error messages

- Replaces more Expr usage

- Replaces a whole pile of C code with significantly less Julia code

- Lowering output becomes more consistently imperative

Cons: 

- Lots more code to write

- May need to invent intermediate data structures to replace `Expr`

- Bootstrap?

- Some forms require creating toplevel thunks

In general, we'd be replacing current *declarative* lowering targets like

`Expr(:using)` with an *imperative* call to a `Core` API instead. The call and

the setup of its arguments would need to go in a thunk. We've currently got an

odd mixture of imperative and declarative lowered code.

## Bugs in Julia's lowering

Subset of bugs which exist in upstream in flisp implementation, but which are fixed here

* `f()[begin]` has the side effect `f()` twice.

* `a[(begin=1; a=2)]` gives a weird error

* `function A.ccall() ; end` allows `ccall` as a name but it's not allowed without the `A.`

* `a .< b .< c` expands to `(a .< b) .& (b .< c)` where the scope of the `&` is

  the expansion module but should be `top.&` to avoid scope-dependence

  (especially in the presence of macros)

## Notes on Racket's hygiene

People look at [Racket](https://racket-lang.org/) as an example of a very

complete system of hygienic macros. We should learn from them, but keeping in

mind that Racket's macro system is inherently more complicated. Racket's

current approach to hygiene is described in an [accessible talk](https://www.youtube.com/watch?v=Or_yKiI3Ha4)

and in more depth in [a paper](https://www-old.cs.utah.edu/plt/publications/popl16-f.pdf).

Some differences which makes Racket's macro expander different from Julia:

* Racket allows *local* definitions of macros. Macro code can be embedded in an

  inner lexical scope and capture locals from that scope, but still needs to be

  executed at compile time. Julia supports macros at top level scope only.

* Racket goes to great lengths to execute the minimal package code necessary to

  expand macros; the "pass system". Julia just executes all top level

  statements in order when precompiling a package.

* As a lisp, Racket's surface syntax is dramatically simpler and more uniform
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/c42f/julialowering.jl

Awesome Lists containing this project

README