https://github.com/ruuda/dfc
Dataflow compiler experiment
https://github.com/ruuda/dfc
Last synced: 11 months ago
JSON representation
Dataflow compiler experiment
- Host: GitHub
- URL: https://github.com/ruuda/dfc
- Owner: ruuda
- License: apache-2.0
- Created: 2018-12-08T14:12:42.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-12-08T14:20:10.000Z (over 7 years ago)
- Last Synced: 2025-03-30T10:30:22.063Z (about 1 year ago)
- Language: Haskell
- Homepage:
- Size: 43.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: license
Awesome Lists containing this project
README
# Dataflow Compiler
A proof of concept optimizing compiler for a dataflow-based intermediate stream
processing language. The compiler takes is a program that for every element in
the input stream yields zero or more elements. (Think list comprehensions in
Python, `for` comprehensions in Scala, or LINQ in C#). It optimizes the program
by analyzing the data flow, taking advantage of purity, and of the limited
opportunity for control flow in such programs.
My goal is to target a strict language, which is an interesting code generation
problem because the data flow leaves a lot of freedom for scheduling operations.
We need to infer the control flow from the dataflow. On the one hand to avoid
doing useless work, but also for correctness: a conditional division that
verifies that the denominator is nonzero, should not compile to a program that
divides by zero anyway. Targeting a lazy language is simpler in this regard,
because dependencies are tracked dynamically at runtime, rather than statically
at compile time. For example, if a value is only used conditionally, we should
only compute it in the branch where it is used. In a lazy language we could get
away with unconditionally producing a thunk, as it would only be forced inside
the branch.
## Implementation Notes
* The variable and expression type use GADTs for type safety. An optimization
pass that would change the type of a value would not typecheck.
* The use of `PatternSynonyms` and `ViewPatterns` makes for quite readable
peephole optimization passes.
* I started out allowing both variables and constants in expressions, but
allowing only variables (and making constants expressions) make writing
optimizations more uniform.
* Having an identity expression is useful to modularize optimization passes.
One pass would rewrite `$2 = $1 + 0` to `$2 = $1`, and it does not need to
be cluttered by anything else. Another pass would then rewrite references to
`$2` with references to `$1`, at which point `$2` becomes dead code.
## Building
stack build
stack exec dfc