Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Chymyst/curryhoward

Automatic code generation for Scala functions and expressions via the Curry-Howard isomorphism
https://github.com/Chymyst/curryhoward

automated-theorem-provers code-generation curry-howard-isomorphism functional-programming intuitionistic-logic lambda-calculus lambda-terms scala scala-macros theorem-prover

Last synced: 3 months ago
JSON representation

Automatic code generation for Scala functions and expressions via the Curry-Howard isomorphism

Awesome Lists containing this project

README

        

[![Build Status](https://travis-ci.com/Chymyst/curryhoward.svg?branch=master)](https://travis-ci.com/Chymyst/curryhoward)
[![Coverage Status](https://codecov.io/gh/Chymyst/curryhoward/coverage.svg?branch=master)](https://codecov.io/gh/Chymyst/curryhoward?branch=master)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Github Tag](https://img.shields.io/github/tag/Chymyst/curryhoward.svg?label=release&colorB=blue)](https://github.com/Chymyst/curryhoward/tags)
[![Maven Central](https://img.shields.io/maven-central/v/io.chymyst/curryhoward_2.12.svg)](http://search.maven.org/#search%7Cga%7C1%7Cio.chymyst)

# Quotation

_These are all interesting examples, but is there a practical side to Curry-Howard isomorphism? Probably not in everyday programming._ -- Bartosz Milewski (2015), [Category Theory for Programmers, Chapter 9: Function types.](https://bartoszmilewski.com/2015/03/13/function-types/)

The `curryhoward` library aims to use the Curry-Howard isomorphism as a tool for practical applications.

# curryhoward

This is a library for automatic implementation of Scala expressions via the Curry-Howard isomorphism.

Quick start:

```scala
// build.sbt
libraryDependencies += "io.chymyst" %% "curryhoward" % "latest.integration"

// Scala
import io.chymyst.ch.implement

// Automatically derive code for this function, based on its type signature:
def f[X, Y]: (X ⇒ Y) ⇒ (Int ⇒ X) ⇒ (Int ⇒ Y) = implement

// The macro `implement` will automatically generate this code for the function body:
// {
// (g: X ⇒ Y) ⇒ (r: Int ⇒ X) ⇒ (e: Int) ⇒ g(r(e))
// }

```

## Features

- Automatically fill in the function body, given the function's type alone (`implement`)
- Automatically generate an expression of a specified type from given arguments (`ofType`)
- Works as a macro at compile time; when a type cannot be implemented, emits a compile-time error
- Supports function types, tuples, sealed traits / case classes / case objects
- Can use conventional Scala syntax `def f[T](x: T): T` and curried syntax `def f[T]: T ⇒ T`
- Java-style argument groups can be used, e.g. `A ⇒ (B, C) ⇒ D`, in addition to using a tuple type, e.g. `A ⇒ ((B, C)) ⇒ D`
- When a type can be implemented in more than one way, heuristics ("least information loss") are used to prefer implementations that are more likely to satisfy algebraic laws
- Emits a compile-time error when a type can be implemented in more than one way despite using heuristics
- Debugging and logging options are available
- DSL for inspecting the generated code (STLC lambda-terms) at run time; facilities are provided for symbolic evaluation and checking equational laws
- [Tutorial](docs/Tutorial.md) explains how to do that in detail

# Usage

There are two main functions, `implement` and `ofType`.

The function `implement` works when defining methods, and is used syntactically as a placeholder for the method's code:

```scala
import io.chymyst.ch.implement

def f[X, Y]: X ⇒ Y ⇒ X = implement

// The code `(x: X) ⇒ (y: Y) ⇒ x` is generated for the body of the function `f`.

f(123)("abc") // returns 123

// The applicative `map2` function for the Reader monad.
def map2[E, A, B, C](readerA: E ⇒ A, readerB: E ⇒ B, f: A ⇒ B ⇒ C): E ⇒ C = implement
// Generates the code (e: E) ⇒ f(readerA(e), readerB(e))

```

The function `ofType` is designed for generating expressions using previously computed values:

```scala
import io.chymyst.ch.ofType

case class User(name: String, id: Long)

val x: Int = 123
val s: String = "abc"
val f: Int ⇒ Long = _.toLong // whatever

val userId = ofType[User](f, s, x).id
// Generates the expression User(s, f(x)) from previously computed values f, s, x, and returns the `id`

```

The generated code is purely functional and assumes that all given values and types are free of side effects.

The code is generated by the `implement` and `ofType` macros at compile time.

A compile-time error occurs when there are several inequivalent implementations for a given type, or if the type cannot be implemented at all.

See the [tutorial](docs/Tutorial.md) for more details.

## How it works

The macros `implement` and `ofType` examine the given type expression via reflection (at compile time).
The type expression is rewritten as a formula in the intuitionistic propositional logic (IPL) with universally quantified propositions.

This is possible due to the Curry-Howard isomorphism, which maps functions with fully parametric types to theorems in the (IPL) with universally quantified propositions.

For example, the type signature of the function

```scala
def f[X, Y]: X ⇒ Y ⇒ X = (x: X) ⇒ (y: Y) ⇒ x

```

is mapped to the propositional theorem `∀ X : ∀ Y : X ⇒ (Y ⇒ X)` in the IPL.

The resulting IPL formula is passed on to an IPL theorem prover.
The theorem prover performs an exhaustive proof search to look for possible lambda-terms (terms in the simply-typed lambda-calculus, STLC) that implement the given type.
After that, the terms are simplified, so that equivalent terms that are different only by alpha-conversion, beta-conversion, or eta-conversion are eliminated.

Finally, the terms are measured according to their "information loss score", and sorted so that one or more terms with the least information loss are returned (and all other terms ignored).
The Scala macro then converts the lambda-term(s) into a Scala expression,
which is substituted instead of `implement` into the right-hand side of the function definition.

All this happens at compile time, so compilation may take longer if a lot of terms are being generated.

If there are additional values available for constructing the expression, the types of those additional values are added as extra function arguments.

For example, if required to implement a type `Option[B]` given value `x` of type `Option[A]` and value `f` of type `A ⇒ Option[B]`, the library will first rewrite the problem as that of implementing a function type `Option[A] ⇒ (A ⇒ Option[B]) ⇒ Option[B]` with type parameters `A` and `B`.
Having obtained a solution, i.e. a term of that type, the library will then apply this term to arguments `x` and `f`.
The resulting term will be returned as Scala code that uses the given values `x` and `f`.

In addition, the `curryhoward` library provides a DSL for manipulating the generated lambda-calculus terms symbolically.
This DSL can be used to rigorously verify algebraic laws (at run time).

The current implementation uses an IPL theorem prover based on the sequent calculus called LJT as presented in:

[D. Galmiche, D. Larchey-Wendling. _Formulae-as-Resources Management for an Intuitionistic Theorem Prover_ (1998)](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.2618). In 5th Workshop on Logic, Language, Information and Computation, WoLLIC'98, Sao Paulo.

Some changes were made to the order of LJT rules, and some trivial additional rules implemented, in order to generate as many as possible different implementations, and also in order to support Scala case classes and case objects (i.e. named conjunctions).

The original presentation of LJT is found in:

[R. Dyckhoff, _Contraction-Free Sequent Calculi for Intuitionistic Logic_, The Journal of Symbolic Logic, Vol. 57, No. 3, (Sep., 1992), pp. 795-807](https://rd.host.cs.st-andrews.ac.uk/publications/jsl57.pdf).

For a good overview of approaches to IPL theorem proving, see these talk slides:

[R. Dyckhoff, _Intuitionistic Decision Procedures since Gentzen_ (2013)](http://apt13.unibe.ch/slides/Dyckhoff.pdf)

See the [youtube presentation](https://youtu.be/cESdgot_ZxY) for more details about how `curryhoward` works.

[This lecture](https://youtu.be/v=KcfD3Iv--UM) is a pedagogical explanation of the Curry-Howard correspondence in the context of functional programming.

# Unit tests

`sbt test`

Build the tutorial (thanks to the [tut plugin](https://github.com/tpolecat/tut)):

`sbt tut`

# Revision history

- 0.3.7 Implement the `typeExpr` macro instead of the old test-only API. Detect and use `val`s from the immediately enclosing class. Minor performance improvements and bug fixes (alpha-conversion for STLC terms). Tests for automatic discovery of some monads.
- 0.3.6 STLC terms are now emitted for `implement` as well; the JVM bytecode limit is obviated; fixed bug with recognizing `Function10`.
- 0.3.5 Added `:@@` and `@@:` operations to the STLC interpreter. Fixed a bug whereby `Tuple2(x._1, x._2)` was not simplified to `x`. Fixed other bugs in alpha-conversion of type parameters.
- 0.3.4 Reduced verbosity by default. Fixed a bug uncovered during the demo in the February 2018 meetup presentation.
- 0.3.3 Automatic renaming of type variables in lambda-terms; `anyOfType`; minor bug fixes.
- 0.3.2 More aggressive simplification of named conjunctions; a comprehensive lambda-term API with a new tutorial section.
- 0.3.1 Code cleanups, minor fixes, first proof-of-concept for symbolic law checking via lambda-terms API.
- 0.3.0 Experimental API for obtaining lambda-terms. Simplified the internal code by removing the type parameter `T` from AST types.
- 0.2.4 Support named type aliases and ordering heuristics for conjunctions and disjunctions; bug fixes for conventional function types not involving type parameters and for eta-contraction
- 0.2.3 Fix stack overflow when using recursive types (code is still not generated for recursive functions); implement loop detection in proof search; bug fixes for alpha-conversion of type-Lambdas
- 0.2.2 Bug fix for back-transform in rule named-&R
- 0.2.1 Checking monad laws for State monad; fix some problems with alpha-conversion of type-Lambdas
- 0.2.0 Implement `allOfType`; use eta-contraction to simplify and canonicalize terms (simplify until stable); cache sequents already seen, limit the search tree by information loss score to avoid exponential blow-up in some examples
- 0.1.0 Initial release; support case classes and tuples; support `implement` and `ofType`; full implementation of LJT calculus

# Status

- The theorem prover for the full IPL is working and appears to be error-free
- Heuristics ("least information loss") seem to be working adequately in many cases
- STLC interpreter is reasonably full-featured
- 92% unit test coverage
- Cross-compiled to Scala 2.11.11 and 2.12.4, published on Maven
- This project is now included in the [Scala community build](https://github.com/scala/community-builds)

# Roadmap

- Implement type class derivation for standard type classes (Functor, Contravariant, Foldable, Traversable, Monad, Applicative, etc.) whenever possible and unambiguous; make this compatible with `scalaz` and `cats` libraries, so that one can write `implicit val x: Functor[P] = implement`
- Keep trying to use `curryhoward` for production code and see what new features may be desirable

# Known limitations

- No support for non-case classes, classes with non-standard constructors, or class hierarchies other than a flat `sealed trait` / `final case class` hierarchies.
- Very limited support for recursive case classes (including `List`): generated code may fail and, in particular, cannot contain recursive functions. A non-recursive example that fails to generate sensible code: `T ⇒ List[T]` (the generated code always returns empty list).
- Functions with zero arguments are currently not supported, e.g. `ofType[Int ⇒ () ⇒ Int]` will not compile.
- If the type contains lots of repeated copies of the same type parameter, or lots of `Option[T]`, heuristics will sometimes fail to produce the desired implementation.

# Examples of functionality

The following code examples show how various functions are implemented automatically, given their type.

"Weak" Peirce's law:

```scala
def f[A, B]: ((((A ⇒ B) ⇒ A) ⇒ A) ⇒ B) ⇒ B = implement

```

"Weak" law of _tertium non datur_:

```scala
def f[A, B]: (Either[A, A ⇒ B] ⇒ B) ⇒ B = implement

```

Automatic implementations of `pure`, `map`, and `flatMap` for the `Reader` monad:

```scala
def pure[E, A]: A ⇒ (E ⇒ A) = implement
def map[E, A, B]: (E ⇒ A) ⇒ (A ⇒ B) ⇒ (E ⇒ B) = implement
def flatMap[E, A, B]: (E ⇒ A) ⇒ (A ⇒ E ⇒ B) ⇒ (E ⇒ B) = implement

```

Constant (non-parametric) types are treated as type parameters:

```scala

def f[A, B]: A ⇒ Int ⇒ (A, Int) = implement

f("abc")(123) // returns the tuple ("abc", 123)

```

Tuples, sealed traits, and case classes/case objects are supported, including `Option` and `Either`:

```scala
def eitherCommut[A, B]: Either[A, B] ⇒ Either[B, A] = implement

def eitherAssoc[A, B, C]: Either[A, Either[B, C]] ⇒ Either[Either[A, B], C] = implement

```

Case objects (and case classes with zero-argument constructors) are treated as named versions of the `Unit` type.

Case classes and sealed traits can be nested and can have type parameters.

Type aliases (`type P[T] = ...`) are supported as well.

Lambda-terms can be obtained and manipulated symbolically.

```scala
type R[X, A] = X ⇒ A

def mapReader[X, A, B]: R[X, A] ⇒ (A ⇒ B) ⇒ R[X, B] = implement
// mapReader is now a compiled function of the required type
val mapReaderTerm = mapReader.lambdaTerm
// mapReaderTerm is a lambda-term representing the code of mapReader
mapReaderTerm.prettyPrint // returns the string "a ⇒ b ⇒ c ⇒ b (a c)"

```

Symbolic computations with lambda-terms can be used for a rigorous verification of equational laws for the generated code.
See the [tutorial](docs/Tutorial.md#working-with-lambda-terms) for some examples of such computations.

## Supported syntax

Code can be generated based on a given type and possibly on given values:

1. `def f[...](...): ... = implement` -- the type and extra values are specified on the left-hand side
2. `ofType[...](...)` -- the type and extra values are specified within an expression
3. `allOfType[...](...)` -- similar to `ofType[...](...)`, except that now all inequivalent implementations with the lowest information loss are returned
4. `anyOfType[...](...)` - similar to `allOfType` except all found implementations are returned, including those with higher information loss

```scala
import io.chymyst.ch._

// Conventional Scala syntax for functions.
def f1[T, U](x: T, y: T ⇒ U) : (T, U) = implement

// Fully or partially curried functions are supported.
def f2[T, U](x: T): (T ⇒ U) ⇒ (T, U) = implement

def f3[T, U]: T ⇒ (T ⇒ U) ⇒ (T, U) = implement

// Generating code within expressions.
final case class User(name: String, id: Long)

val f: Int ⇒ Long = _.toLong

ofType[User](123, "abc", f).id // This expression evaluates to `123L`.

```

If the `implement` macro is used as the body of a class method, values from the class constructor and `val` members may be used:

```scala
import io.chymyst.ch._

final case class User[A](name: String, id: A) {
def map[B](f: A ⇒ B): User[B] = implement
}

```

# Debugging and logging

The logging options are controlled via the JVM property `"curryhoward.log"`.
The value of this property is a comma-separated list of keywords.
The full logging is switched on by putting `-Dcurryhoward.log=macros,terms,prover,verbose` on the Java or SBT command line.

The `verbose` logging option will print the lambda-term that the macro functions generate, and a warning when more than one term was found.
The `macros` logging option will print the code that the macro functions generate.
The `prover` logging option will print the steps in the proof search, including the new sequents generated by each applied rule.
The `trace` logging option will print more debugging information about the proof search.
The `terms` logging option will print the terms generated, in the short notation.

With none of these options given, only minimal diagnostic messages are printed, with terms in a short notation:

- information message when a term is returned
- warning message when several implementations are found with the same (lowest) information loss

## Heuristics for choosing different implementations

If the theorem prover finds several alternative implementations of a type, it attempts to find the implementation with the smallest "information loss".

The "information loss" of a term is defined as an integer number being the sum of:

- the number of (curried) arguments that are ignored by some function,
- the number of tuple parts that are computed but subsequently not used,
- the number of `case` clauses that do not use their arguments,
- the number of disjunction or conjunction parts that are inserted out of order,
- the number of times an input value is used (if more than once).

Choosing the smallest "information loss" is a heuristic that enables automatic choice of the correct implementations for a large number of cases (but, of course, not in all cases).

For example, here are correct implementations of `pure`, `map`, and `flatMap` for the `State` monad:

```scala
def pure[S, A]: A ⇒ (S ⇒ (A, S)) = implement
def map[S, A, B]: (S ⇒ (A, S)) ⇒ (A ⇒ B) ⇒ (S ⇒ (B, S)) = implement
def flatMap[S, A, B]: (S ⇒ (A, S)) ⇒ (A ⇒ S ⇒ (B, S)) ⇒ (S ⇒ (B, S)) = implement

```

Note that there are several inequivalent implementations for the State monad's `map` and `flatMap`,
but only one of them loses no information -- and thus has a higher chance of satisfying the correct laws.

The theorem prover does not check any equational laws.
It selects the terms with the smallest level of information loss, in hopes that there will be only one term selected,
and that it will be the desired term that satisfies equational laws of whatever sort the users expect.

The theorem prover will generate a (compile-time) error when there are two or more implementations with the smallest level of information loss.

If there are several possible implementations but only one implementation with the smallest level of information loss,
the theorem prover will choose that implementation but print a warning message such as

```
Warning:scalac: type (S → (A, S)) → (A → B) → S → (B, S) has 2 implementations (laws need checking?)

```

This message means that the resulting implementation is _probably_ the right one, but there was a choice to be made.
If there exist some equational laws that apply to this function, the laws need to be checked by the user (e.g. in unit tests).

In some cases, there are several inequivalent implementations that all have the same level of "information loss."
The function `allOfType` returns all these implementations.

An experimental API is provided for examining the lambda-terms corresponding to the returned implementations.
The [tutorial](docs/Tutorial.md#working-with-lambda-terms) gives more detail about using that API.

The lambda-term API is experimental because, in particular, it exposes the internal implementation of STLC, which could change in future versions of `curryhoward`.

### Examples of heuristic choice

The "smallest information loss" heuristic allows us to select the "better" implementation in the following example:

```scala
def optionId[X]: Option[X] ⇒ Option[X] = implement

optionId(Some(123)) == 123
optionId(None) == None

```

There are two possible implementations of the type `Option[X] ⇒ Option[X]`: the "trivial" implementation (always returns `None`), and the "interesting" implementation (returns the same value as given).
The "trivial" implementation is rejected by the algorithm because it ignores the information given in the original data, so it has higher "information loss".

Another heuristic is to prefer implementations that use more parts of a disjunction.
This avoids implementations that e.g. always generate the `Some` subtype of `Option` and never generate `None`.

Another heuristic is to prefer implementations that preserve the order of parts in conjunctions and disjunctions.

For example,

```scala
def f[X]: ((X, X, X)) ⇒ (X, X, X) = implement

```

will generate the code for the identity function on triples, that is, `((a, b, c)) ⇒ (a, b, c)`.
There are many other implementations of this type, for example `((a, b, c)) ⇒ (b, c, a)`.
However, these implementations do not preserve the order of the elements in the input data, which is considered as a "loss of information".

The analogous parts-ordering heuristic is used for disjunctions, which selects the most reasonable implementation of

```scala
def f[X]: Either[X, X] ⇒ Either[X, X] = implement

```