https://github.com/bollu/lz

A minimal in MLIR dialect along the lines of STG to represent laziness.
https://github.com/bollu/lz
Last synced: 6 months ago
JSON representation
A minimal in MLIR dialect along the lines of STG to represent laziness.
Host: GitHub
URL: https://github.com/bollu/lz
Owner: bollu
License: other
Created: 2020-11-06T13:01:56.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2022-01-07T13:27:03.000Z (over 3 years ago)
Last Synced: 2025-03-29T12:23:41.195Z (6 months ago)
Language: LLVM
Size: 398 MB
Stars: 15
Watchers: 3
Forks: 1
Open Issues: 24
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project

README

          # Core-MLIR

- add API to clone region with entry handle.

# Points

- Experience report on writing a LEAN backend.

- inline C string is a huge pain.

- Optimisations that you'd be interested to implement?

- Upstreaming?

- Help formalizing the document?

# Notes on GHC

- smallest size is `32` bit word. Can't pack stuff!

- GHC plugin that strictifies/unboxes most things and prints out the new

  file.

- IORefs are bad

- Example GHC perf slide because of laziness: https://gitlab.haskell.org/ghc/ghc/-/issues/19102#note_319557

- [Novel GC algorithm for pure funcitonal languages](https://www.reddit.com/r/haskell/comments/knzfhn/novel_garbage_collection_technique_for_immutable/)

- [supercompiler by evaluation](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/supercomp-by-eval.pdf)

- [GHC grin has benchmark suite](https://github.com/grin-compiler/ghc-grin/tree/master/ghc-grin-benchmark/boquist-grin-bench)

- [`fast-math` haskell library has some RULES limitations](https://github.com/liyang/fast-math/)

- [Another example where fusion is bad for `Text`](https://github.com/haskell/text/pull/348)

# LEAN wishlist

- mlir.lean:246:17: error: unknown constant 'IO.readTextFile': Print possible solutions?

# Thoughts on writing a new LEAN backend

- Why only `closureMaxArgs` for `app` and not `pap`?

- Also, I should generate `llvm.switch` for efficiency.

- Similary, I should check that me calling the intrinsics such as `lean_nat_sub`

  does not impact my performance!

- I should tag the values in the `library.ll` to be `alwaysinline` for performance.

- I should generate `llvm.musttail`.

- Please don't use things like function overloading (looking at you `lean_inc`).

- The existence of `extern C inline` within the compiler / prelude makes stuff very complicated. Eg.

  the fact that adding `uint` is implemented using `[extern c inline "#1 + #2]` makes it complex to use,

  since I can't lower this to MLIR (or any other lowering mechanism, really). I am concerned this feature will lead to 

  a lock-in into C(++) syntax.

- One massive quality of life improvement would be if lambdapure printed in MLIR syntax.

  That way, it's unambiguous about semantics! and can potentially eventually round-trip

  through the compiler!

- It's both too high and too low level. `case` of `int` in lambdapure generates

  as calls to runtime `lean_dec_eq` + a boolean `int` case on return value,

  while `case` of objects is represented as a real `case.`

- Initialization machinery is confusing. I still don't understand the

  invariants around why certain things are initialized the way they are.

- Quite minimal and pleasant to work with, all said and done.

- Can tell LLVM about tail calls instead of hand rolling a tail call.

- Can maybe use TBAA to teach LLVM about different object types, instead of erasing all info

  at the lambdapure level. 

- Can potentially use the objective-c machinery + [LLVM GC](https://llvm.org/docs/GarbageCollection.html) to implement correct

  refcounting.

- `jmp` encodes nicely in MLIR thanks to nested regions.

- LEAN4 APIS: foldable/traversable/divisible/decidable?

- I saw the [bachelor thesis on snake lemma](https://pp.ipd.kit.edu/thesis.php?id=313) (I wanted the snake lemma recently...). 

  [How is homology computed? Can we make it faster?](https://pastel.archives-ouvertes.fr/pastel-00605836/document)

  (sparse linear algebra).

- Which optimisation to do at LEAN level?

- Can we leverage proofs at the LEAN level?

- Interactive compliation: write tactics to prove properties about code.

# Log:  [newest] to [oldest]

# Oct 10

- Configuring `elan` for `lean` development:

```

/home/bollu/work/lean4-contrib/tests/bench$ elan toolchain link lean-contrib /home/bollu/work/lean4-contrib/build/release/stage2/

/home/bollu/work/lean4-contrib/tests/bench$ elan override set lean-contrib

info: override toolchain for '/home/bollu/work/lean4-contrib/tests/bench' set to 'lean-contrib'

/home/bollu/work/lean4-contrib/tests/bench$ 

```

# Aug 08

- Bump allocator should obey stack discipline!

# Jul 29

- `slow.lean`: minimal version of `binarytrees.lean` without parallelism.

```

  36.16%  exe.out  exe.out           [.] l_check

  20.60%  exe.out  exe.out           [.] lean_del

  13.17%  exe.out  exe.out           [.] l_make_x27

  12.07%  exe.out  exe.out           [.] lean_alloc_ctor_memory

   9.89%  exe.out  exe.out           [.] lean_free_small

   7.32%  exe.out  exe.out           [.] lean_alloc_small

   0.08%  exe.out  libc-2.33.so      [.] __memset_avx2_erms

   0.06%  exe.out  exe.out           [.] l_depth

   0.05%  exe.out  [kernel.vmlinux]  [k] check_preemption_disabled

```

```

  34.46%  exe-ref.out  exe-ref.out       [.] l_check

  24.58%  exe-ref.out  exe-ref.out       [.] lean_del

  16.53%  exe-ref.out  exe-ref.out       [.] l_make_x27.part.0

  12.46%  exe-ref.out  exe-ref.out       [.] lean_free_small

  10.69%  exe-ref.out  exe-ref.out       [.] lean_alloc_small

   0.45%  exe-ref.out  exe-ref.out       [.] l_sumT

   0.18%  exe-ref.out  exe-ref.out       [.] lean::allocator::alloc_page

   0.08%  exe-ref.out  [kernel.vmlinux]  [k] irqentry_exit_to_user_mode

   0.04%  exe-ref.out  ld-2.33.so        [.] _dl_lookup_symbol_x

   0.04%  exe-ref.out  [kernel.vmlinux]  [k] get_page_from_freelist

   0.04%  exe-ref.out  [kernel.vmlinux]  [k] free_unref_page_list

```

- From `exe-ref.ll`

```ll

define dso_local %struct.lean_object* @l_make(i32 %0) local_unnamed_addr #0 {

  %2 = tail call %struct.lean_object* @l_make_x27(i32 %0, i32 %0)

  ret %struct.lean_object* %2

}

```

- On the other hand, `exe.ll`:

```ll

define i8* @l_make(i32 %0) !dbg !67 {

  %2 = call i8* @lean_box(i32 0), !dbg !68

  %3 = call i8* @l_make_x27(i32 %0, i32 %0), !dbg !70

  ret i8* %3, !dbg !71

}

```

- I fixed the random `lean_box` being generated due to my handling of irrelevant args. The new perf data:

```

(ours)

  36.84%  exe.out  exe.out           [.] l_check

  18.98%  exe.out  exe.out           [.] lean_del

  13.57%  exe.out  exe.out           [.] l_make_x27

  11.43%  exe.out  exe.out           [.] lean_alloc_ctor_memory

   9.66%  exe.out  exe.out           [.] lean_free_small

   7.11%  exe.out  exe.out           [.] lean_alloc_small

   0.70%  exe.out  exe.out           [.] main

   0.37%  exe.out  libc-2.33.so      [.] __memset_avx2_erms

   0.17%  exe.out  [kernel.vmlinux]  [k] irqentry_exit_to_user_mode

   0.16%  exe.out  [kernel.vmlinux]  [k] check_preemption_disabled

   0.11%  exe.out  exe.out           [.] l_depth___lambda__1

   0.09%  exe.out  [kernel.vmlinux]  [k] clear_page_erms

   0.07%  exe.out  [kernel.vmlinux]  [k] native_irq_return_iret

   0.07%  exe.out  [kernel.vmlinux]  [k] error_entry

   0.05%  exe.out  [kernel.vmlinux]  [k] swapgs_restore_regs_and_return_to_usermode

   0.04%  exe.out  [kernel.vmlinux]  [k] get_page_from_freelist

   0.04%  exe.out  [kernel.vmlinux]  [k] __mod_memcg_state.part.0

   0.03%  exe.out  [kernel.vmlinux]  [k] handle_mm_fault

   0.03%  exe.out  [kernel.vmlinux]  [k] __mod_node_page_state

   0.03%  exe.out  [kernel.vmlinux]  [k] __mod_memcg_lruvec_state

   0.02%  exe.out  [kernel.vmlinux]  [k] __mod_lruvec_state

   0.02%  exe.out  [kernel.vmlinux]  [k] __free_one_page

   0.02%  exe.out  [kernel.vmlinux]  [k] sync_regs

   0.02%  exe.out  [kernel.vmlinux]  [k] mem_cgroup_charge

   0.02%  exe.out  [kernel.vmlinux]  [k] prep_new_page

   0.02%  exe.out  [kernel.vmlinux]  [k] __this_cpu_preempt_check

   0.02%  exe.out  [kernel.vmlinux]  [k] unmap_page_range

```

```

(theirs)

  34.81%  exe-ref.out  exe-ref.out         [.] l_check

  24.03%  exe-ref.out  exe-ref.out         [.] lean_del

  15.56%  exe-ref.out  exe-ref.out         [.] l_make_x27.part.0

  11.24%  exe-ref.out  exe-ref.out         [.] lean_free_small

  10.79%  exe-ref.out  exe-ref.out         [.] lean_alloc_small

   0.85%  exe-ref.out  exe-ref.out         [.] lean_mark_persistent

   0.47%  exe-ref.out  exe-ref.out         [.] l_depth___lambda__1___boxed

   0.41%  exe-ref.out  exe-ref.out         [.] lean::allocator::alloc_page

   0.17%  exe-ref.out  [kernel.vmlinux]    [k] check_preemption_disabled

   0.16%  exe-ref.out  [kernel.vmlinux]    [k] clear_page_erms

   0.15%  exe-ref.out  [kernel.vmlinux]    [k] irqentry_exit_to_user_mode

   0.08%  exe-ref.out  [kernel.vmlinux]    [k] error_entry

   0.07%  exe-ref.out  [kernel.vmlinux]    [k] try_charge

   0.07%  exe-ref.out  [kernel.vmlinux]    [k] native_irq_return_iret

   0.07%  exe-ref.out  [kernel.vmlinux]    [k] swapgs_restore_regs_and_return_to_usermode

   0.06%  exe-ref.out  [kernel.vmlinux]    [k] handle_mm_fault

   0.06%  exe-ref.out  [kernel.vmlinux]    [k] __mod_node_page_state

   0.05%  exe-ref.out  [kernel.vmlinux]    [k] __pagevec_lru_add_fn

   0.05%  exe-ref.out  [kernel.vmlinux]    [k] __free_one_page

   0.05%  exe-ref.out  [kernel.vmlinux]    [k] release_pages

   0.05%  exe-ref.out  [kernel.vmlinux]    [k] page_remove_rmap

   0.04%  exe-ref.out  [kernel.vmlinux]    [k] __mod_memcg_lruvec_state

   0.04%  exe-ref.out  [kernel.vmlinux]    [k] unmap_page_range

   0.03%  exe-ref.out  [kernel.vmlinux]    [k] kernel_init_free_pages

   0.03%  exe-ref.out  [kernel.vmlinux]    [k] __rcu_read_unlock

   0.03%  exe-ref.out  [kernel.vmlinux]    [k] get_page_from_freelist

   0.03%  exe-ref.out  [kernel.vmlinux]    [k] __mod_zone_page_state

   0.03%  exe-ref.out  [kernel.vmlinux]    [k] free_pages_and_swap_cache

   0.03%  exe-ref.out  [kernel.vmlinux]    [k] free_unref_page_list

```

- If we now look at `l_make`:

```ll

(ours)

; Function Attrs: nounwind sspstrong

define i8* @l_make_x27(i32 %0, i32 %1) local_unnamed_addr #2 !dbg !74 {

  %.not = icmp eq i32 %1, 0, !dbg !75

  br i1 %.not, label %14, label %lean_ctor_set.exit1, !dbg !77

lean_ctor_set.exit1:                              ; preds = %2

  %3 = add i32 %1, -1, !dbg !78

  %4 = tail call i8* @l_make_x27(i32 %0, i32 %3), !dbg !79

  %5 = add i32 %0, 1, !dbg !80

  %6 = tail call i8* @l_make_x27(i32 %5, i32 %3), !dbg !81

  ;; SLOW ALLOCATOR

  %7 = tail call %struct.lean_object* @lean_alloc_ctor_memory(i32 26) #3, !dbg !82

  ;; SLOW ALLOCATOR

  %8 = getelementptr inbounds %struct.lean_object, %struct.lean_object* %7, i64 0, i32 0, !dbg !82

  store i64 72620543991349249, i64* %8, align 8, !dbg !82, !tbaa !46

  %9 = getelementptr inbounds %struct.lean_object, %struct.lean_object* %7, i64 1, !dbg !83

  %10 = bitcast %struct.lean_object* %9 to i8**, !dbg !83

  store i8* %4, i8** %10, align 8, !dbg !83, !tbaa !71

  %11 = bitcast %struct.lean_object* %7 to i8*, !dbg !82

  %12 = getelementptr inbounds %struct.lean_object, %struct.lean_object* %7, i64 2, !dbg !84

  %13 = bitcast %struct.lean_object* %12 to i8**, !dbg !84

  store i8* %6, i8** %13, align 8, !dbg !84, !tbaa !71

  ret i8* %11, !dbg !85

14:                                               ; preds = %2

  %15 = load i8*, i8** @l_make_x27___closed__1, align 8, !dbg !86

  ret i8* %15, !dbg !87

}

```

- versus:

```ll

(theirs)

define dso_local %struct.lean_object* @l_make_x27(i32 %0, i32 %1) local_unnamed_addr #0 {

  %3 = icmp eq i32 %1, 0

  br i1 %3, label %16, label %4

4:                                                ; preds = %2

  %5 = add i32 %1, -1

  %6 = tail call %struct.lean_object* @l_make_x27(i32 %0, i32 %5)

  %7 = add i32 %0, 1

  %8 = tail call %struct.lean_object* @l_make_x27(i32 %7, i32 %5)

  ;; BETTER ALLOC

  %9 = tail call i8* @lean_alloc_small(i32 24, i32 2) #3 <- BETTER ALLOC

  ;; BETTER ALLOC

  %10 = bitcast i8* %9 to %struct.lean_object*

  %11 = bitcast i8* %9 to i64*

  store i64 72620543991349249, i64* %11, align 8, !tbaa !4

  %12 = getelementptr inbounds i8, i8* %9, i64 8

  %13 = bitcast i8* %12 to %struct.lean_object**

  store %struct.lean_object* %6, %struct.lean_object** %13, align 8, !tbaa !17

  %14 = getelementptr inbounds i8, i8* %9, i64 16

  %15 = bitcast i8* %14 to %struct.lean_object**

  store %struct.lean_object* %8, %struct.lean_object** %15, align 8, !tbaa !17

  br label %18

16:                                               ; preds = %2

  %17 = load %struct.lean_object*, %struct.lean_object** @l_make_x27___closed__1, align 8, !tbaa !17

  br label %18

18:                                               ; preds = %16, %4

  %19 = phi %struct.lean_object* [ %10, %4 ], [ %17, %16 ]

  ret %struct.lean_object* %19

}

```

# Jul 28

Here's the LLVM:

```

https://gist.github.com/bollu/89b4b6f412433305022fbbbccd82614b

define i8* @l_even(i8* %0) local_unnamed_addr !dbg !7 {

  ...

  %9 = tail call i8* @l_odd(i8* %8), !dbg !19

  ...

}

define i8* @l_odd(i8* %0) local_unnamed_addr !dbg !23 {

  ...

  %9 = tail call i8* @l_even(i8* %8), !dbg !32

  ...  

}

```

Here's the LLC:

```

56      in /home/bollu/work/lz/test/lambdapure/compile/

(gdb) bt

#0  l_even () at /home/bollu/work/lz/test/lambdapure/compile/:56

#1  0x0000000000406612 in l_odd () at /home/bollu/work/lz/test/lambdapure/compile/:95

#2  0x000000000040659e in l_even () at /home/bollu/work/lz/test/lambdapure/compile/:69

#3  0x0000000000406612 in l_odd () at /home/bollu/work/lz/test/lambdapure/compile/:95

#4  0x000000000040659e in l_even () at /home/bollu/work/lz/test/lambdapure/compile/:69

#5  0x0000000000406612 in l_odd () at /home/bollu/work/lz/test/lambdapure/compile/:95

#6  0x000000000040659e in l_even () at /home/bollu/work/lz/test/lambdapure/compile/:69

#7  0x0000000000406612 in l_odd () at /home/bollu/work/lz/test/lambdapure/compile/:95

#8  0x000000000040659e in l_even () at /home/bollu/work/lz/test/lambdapure/compile/:69

#9  0x000000000040685e in main_lean_custom_entrypoint_hack () at /home/bollu/work/lz/test/lambdapure/compile/:223

#10 0x0000000000432d86 in main ()

(gdb)  

```

    

so I don't actually understand the guarantees provided by `tail call`. I presume the point is the call

can't be tailed, since it's followed by more code:

```lvm

define i8* @l_even(i8* %0) local_unnamed_addr !dbg !7 {

  ...

  %9 = tail call i8* @l_odd(i8* %8), !dbg !19

  tail call void bitcast (void (%struct.lean_object*)* @lean_dec to void (i8*)*)(i8* %8), !dbg !20

  ret i8* %9, !dbg !21

}

```

# Jul 24

- binarytrees.lean:

```

(theirs)

  30.09%  exe-ref.out  exe-ref.out       [.] l_check

  20.28%  exe-ref.out  exe-ref.out       [.] l_make_x27.part.0

  18.12%  exe-ref.out  exe-ref.out       [.] lean_del

  11.48%  exe-ref.out  exe-ref.out       [.] lean_alloc_small

  10.27%  exe-ref.out  exe-ref.out       [.] lean_free_small

   2.63%  exe-ref.out  [kernel.vmlinux]  [k] clear_page_erms

   1.40%  exe-ref.out  ld-2.33.so        [.] _dl_relocate_object

   1.03%  exe-ref.out  [kernel.vmlinux]  [k] irqentry_exit_to_user_mode

   0.95%  exe-ref.out  exe-ref.out       [.] initialize_Init_Data_ByteArray_Basic

   0.82%  exe-ref.out  exe-ref.out       [.] lean_mark_persistent

   0.64%  exe-ref.out  [kernel.vmlinux]  [k] __memcg_kmem_charge_page

   0.64%  exe-ref.out  exe-ref.out       [.] lean::allocator::alloc_page

   0.41%  exe-ref.out  [kernel.vmlinux]  [k] unmap_page_range

   0.41%  exe-ref.out  [kernel.vmlinux]  [k] kmem_cache_alloc

   0.40%  exe-ref.out  [kernel.vmlinux]  [k] free_pcp_prepare

   0.40%  exe-ref.out  [kernel.vmlinux]  [k] free_unref_page_commit

   0.03%  perf         [kernel.vmlinux]  [k] perf_event_exec

   0.00%  perf         [kernel.vmlinux]  [k] intel_bts_enable_local

   0.00%  perf         [kernel.vmlinux]  [k] native_write_msr

```

```

(ours)

  31.02%  exe.out  exe.out           [.] l_check

  17.97%  exe.out  exe.out           [.] lean_del

  16.39%  exe.out  exe.out           [.] l_make_x27

  10.70%  exe.out  exe.out           [.] lean_alloc_small

   9.51%  exe.out  exe.out           [.] lean_alloc_ctor_memory

   6.81%  exe.out  exe.out           [.] lean_free_small

   1.49%  exe.out  libc-2.33.so      [.] __memset_avx2_erms

   1.05%  exe.out  [kernel.vmlinux]  [k] prep_new_page

   1.00%  exe.out  ld-2.33.so        [.] do_lookup_x

   0.94%  exe.out  exe.out           [.] main

   0.74%  exe.out  [kernel.vmlinux]  [k] clear_page_erms

   0.58%  exe.out  [kernel.vmlinux]  [k] unmap_page_range

   0.46%  exe.out  [kernel.vmlinux]  [k] __this_cpu_preempt_check

   0.43%  exe.out  [kernel.vmlinux]  [k] try_charge

   0.29%  exe.out  [kernel.vmlinux]  [k] native_irq_return_iret

   0.29%  exe.out  [kernel.vmlinux]  [k] __mod_node_page_state

   0.29%  exe.out  [kernel.vmlinux]  [k] __free_one_page

   0.03%  perf     [kernel.vmlinux]  [k] strrchr

   0.00%  perf     [kernel.vmlinux]  [k] native_sched_clock

   0.00%  perf     [kernel.vmlinux]  [k] native_write_msr

```

- binary-trees-int.lean

```

  16.74%  exe-ref.out  [kernel.vmlinux]  [k] swapgs_restore_regs_and_return_to_usermode

  14.09%  exe-ref.out  [kernel.vmlinux]  [k] clear_page_erms

  11.31%  exe-ref.out  [kernel.vmlinux]  [k] vma_interval_tree_insert

  11.19%  exe-ref.out  ld-2.33.so        [.] check_match

   9.33%  exe-ref.out  ld-2.33.so        [.] lookup_malloc_symbol

   7.63%  exe-ref.out  [kernel.vmlinux]  [k] handle_mm_fault

   6.56%  exe-ref.out  exe-ref.out       [.] initialize_Init_Data_Format_Basic

   6.29%  exe-ref.out  [kernel.vmlinux]  [k] page_remove_rmap

   5.88%  exe-ref.out  [kernel.vmlinux]  [k] lru_add_drain_cpu

   5.68%  exe-ref.out  [kernel.vmlinux]  [k] free_pages_and_swap_cache

   5.03%  exe-ref.out  [kernel.vmlinux]  [k] _find_next_bit.constprop.0

   0.24%  perf         [kernel.vmlinux]  [k] native_write_msr

   0.01%  perf         [kernel.vmlinux]  [k] intel_bts_enable_local

```

```

  16.43%  exe.out  [kernel.vmlinux]  [k] __split_vma

  15.66%  exe.out  ld-2.33.so        [.] _dl_lookup_symbol_x

  14.31%  exe.out  libc-2.33.so      [.] __memset_avx2_erms

  13.09%  exe.out  [kernel.vmlinux]  [k] release_pages

  12.04%  exe.out  [kernel.vmlinux]  [k] native_irq_return_iret

  10.37%  exe.out  exe.out           [.] lean_mark_persistent

   9.57%  exe.out  [kernel.vmlinux]  [k] unlink_file_vma

   8.05%  exe.out  [kernel.vmlinux]  [k] perf_event_mmap_output

   0.44%  perf     [kernel.vmlinux]  [k] perf_event_exec

   0.02%  perf     [kernel.vmlinux]  [k] native_sched_clock

   0.00%  perf     [kernel.vmlinux]  [k] native_write_msr

```

Try batshit insane options to get performance --- disable all stack

related stuff, inline everything, try to expose maximum information

to the compiler. no dice!

```sh

# compile_lean.sh

#!/usr/bin/env bash

set -e

set -o xtrace

rm $1-exe.out || true

# lean -c fails if relative path walks upward. eg. lean -c ../exe.c -o foo

(lean $1 -c exe-ref.c && clang -I /home/bollu/work/lean4/build/stage0/include  -O2 -S  -emit-llvm exe-ref.c -o exe-lean-ref.ll) || true

# compile MLIR file

lean $1 -m exe.mlir

hask-opt exe.mlir | \

  hask-opt  --convert-scf-to-std | hask-opt --lean-lower  | hask-opt --ptr-lower | \

  mlir-translate --mlir-to-llvmir -o exe.ll

# | opt -S -O3 | llc -filetype=obj -o exe.o

llvm-link exe.ll \

  /home/bollu/work/lz/lean-linking-incantations/lib-includes/library.ll \

  /home/bollu/work/lz/lean-linking-incantations/lib-runtime/runtime.ll \

  /home/bollu/work/lz/lean-linking-incantations/lean-shell.ll \

  -S -o exe-linked.ll

opt exe-linked.ll -passes=bitcast-call-converter -S -o exe-linked-nobitcast.ll

# opt exe-linked.ll  -S -o exe-linked-nobitcast.ll

opt -always-inline -O3  exe-linked-nobitcast.ll -S  -o exe-linked-o3.ll

sed -i "s/attributes \(.*\) = { \(.*\) }/attributes \1 = { alwaysinline \2 }/" exe-linked-o3.ll

sed -i "s/nounwind/nounwind alwaysinline /g" exe-linked-o3.ll

sed -i "s/safestack//g" exe-linked-o3.ll

sed -i "s/sspstrong//g" exe-linked-o3.ll

sed -i "s/ssp//g" exe-linked-o3.ll

sed -i "s/sspreq//g" exe-linked-o3.ll

# v remove empty attributes

sed -i "s/attributes .*= {[ ]*}$//g" exe-linked-o3.ll

opt exe-linked-o3.ll -passes=bitcast-call-converter | opt -always-inline -O3  -S  -o exe-linked-o3-2.ll

cp exe-linked-o3-2.ll exe-linked-o3.ll

opt -verify exe-linked-o3.ll

echo "@@@@HACK: REMOVING TAIL ANNOTATIONS!"

sed -i "s/musttail/tail/g" exe-linked-o3.ll

llc --relocation-model=static -O3 -march=x86-64 -filetype=obj exe-linked-o3.ll -o exe.o

# leancpp: undefined reference to lean_name_eq

# `l_Lean_Syntax_isOfKind':

# Lean: lean_name_hash 

c++ -O3 -D LEAN_MULTI_THREAD -I/home/bollu/work/lean4/build/stage1/include \

    exe.o \

    -no-pie -Wl,--start-group  -lleancpp -lInit -lStd -lLean -Wl,--end-group \

    -L/home/bollu/work/lean4/build/stage1/lib/lean -lgmp -ldl -pthread \

    -Wno-unused-command-line-argument -o exe.out

```

I actually don't get what's happening. The [extracted LLVM file](https://gist.github.com/bollu/d67fa754f82e5982bfcf4c87316202ac)

should be able to inline `lean_del`, but it doesn't even though `lean_del` has the `alwaysinline` attribute.

I'm unsure what's going on.

# Jul 23

- Latest perf numbers for `binarytrees.lean`:

```

MLIR (ours)

  33.64%  exe.out  exe.out           [.] l_check

  18.24%  exe.out  exe.out           [.] lean_del

  13.28%  exe.out  exe.out           [.] l_make_x27

  10.55%  exe.out  exe.out           [.] lean_free_small

   8.82%  exe.out  exe.out           [.] lean_alloc_ctor_memory

   7.15%  exe.out  exe.out           [.] lean_alloc_small

   1.57%  exe.out  libc-2.33.so      [.] __memset_avx2_erms

   1.15%  exe.out  [kernel.vmlinux]  [k] get_page_from_freelist

   0.93%  exe.out  ld-2.33.so        [.] _dl_map_object_from_fd

   0.90%  exe.out  ld-2.33.so        [.] do_lookup_x

   0.82%  exe.out  exe.out           [.] main

   0.55%  exe.out  [kernel.vmlinux]  [k] error_entry

   0.48%  exe.out  [kernel.vmlinux]  [k] irqentry_exit_to_user_mode

   0.38%  exe.out  [kernel.vmlinux]  [k] __task_pid_nr_ns

   0.37%  exe.out  [kernel.vmlinux]  [k] __pagevec_lru_add_fn

   0.29%  exe.out  [kernel.vmlinux]  [k] lock_page_memcg

   0.29%  exe.out  [kernel.vmlinux]  [k] unmap_page_range

   0.29%  exe.out  [kernel.vmlinux]  [k] check_preemption_disabled

   0.29%  exe.out  [kernel.vmlinux]  [k] free_pcppages_bulk

   0.02%  perf     [kernel.vmlinux]  [k] perf_event_exec

   0.00%  perf     [kernel.vmlinux]  [k] check_preemption_disabled

   0.00%  perf     [kernel.vmlinux]  [k] native_write_msr

```

```

  26.63%  exe-ref.out  exe-ref.out       [.] l_check

  24.14%  exe-ref.out  exe-ref.out       [.] lean_del

  16.74%  exe-ref.out  exe-ref.out       [.] l_make_x27.part.0

  11.81%  exe-ref.out  exe-ref.out       [.] lean_alloc_small

   9.21%  exe-ref.out  exe-ref.out       [.] lean_free_small

   2.88%  exe-ref.out  ld-2.33.so        [.] do_lookup_x

   1.47%  exe-ref.out  exe-ref.out       [.] lean_mark_persistent

   1.15%  exe-ref.out  exe-ref.out       [.] lean::allocator::alloc_page

   1.05%  exe-ref.out  [kernel.vmlinux]  [k] __mod_lruvec_state

   0.96%  exe-ref.out  [kernel.vmlinux]  [k] __pagevec_lru_add_fn

   0.89%  exe-ref.out  [kernel.vmlinux]  [k] clear_page_erms

   0.71%  exe-ref.out  [kernel.vmlinux]  [k] copy_page

   0.69%  exe-ref.out  [kernel.vmlinux]  [k] exc_page_fault

   0.41%  exe-ref.out  [kernel.vmlinux]  [k] unmap_page_range

   0.41%  exe-ref.out  [kernel.vmlinux]  [k] native_irq_return_iret

   0.41%  exe-ref.out  [kernel.vmlinux]  [k] check_preemption_disabled

   0.41%  exe-ref.out  [kernel.vmlinux]  [k] free_unref_page_list

   0.04%  perf         [kernel.vmlinux]  [k] memcpy_erms

   0.00%  perf         [kernel.vmlinux]  [k] intel_pmu_handle_irq

   0.00%  perf         [kernel.vmlinux]  [k] native_write_msr

```

Places for performance diff:

- I should lower into `case`, not `if` laddder.

- `i32` vs `i64`?

- `musttail`? doubtful.

- Run more rounds of `-O3`? :) 

- Need to pull more stuff into `runtime`? Currently not all of `-llean` is in `runtime.ll`.

# Jul 20

For a while, I thought I neeeded by own pass. It doesn't look like it, maybe I can just

generate slightly different IR and get away with it:

```

%struct.lean_object = type { i64 }

define %struct.lean_object* @cant_inline_1(i8* %0, i32 %1, i32 %2) alwaysinline {

    unreachable

}

define %struct.lean_object* @cant_inline_2(i8* %0, i64 %1, i64 %2) alwaysinline {

    unreachable

}

define %struct.lean_object* @cant_inline_3(%struct.lean_object* %0, i64 %1, i64 %2) alwaysinline {

    unreachable

}

define i1 @main (i8* %in) {

  ;; %out = tail call i8* bitcast (%struct.lean_object* (i8*, i32, i32)* @cant_inline_1 to i8* (i8*, i64, i64)*)(i8* %in, i64 3, i64 0)

   %out2 = tail call i8* bitcast (%struct.lean_object* (i8*, i64, i64)* @cant_inline_2 to i8* (i8*, i64, i64)*)(i8* %in, i64 3, i64 0)

  %out3 = tail call i8* bitcast (%struct.lean_object* (%struct.lean_object*, i64, i64)* @cant_inline_3 to i8* (i8*, i64, i64)*)(i8* %in, i64 3, i64 0)

  ret i1 1

}

```

In this program, only `cant_inline_1` fails due to the need to sign truncate the calls of an `i32` function pretending

to be `i64`. 

Here's the list of functions with a screwed up call blocked by bitcasts (found by using sed):

```

cat exe-linked-o3.ll| sed -r -n "s/.*call.*bitcast.*(@.* ).*/\1/p"

```

The list:

```

@l_make_x27_match__1___rarg___boxed to i8*), i64 3, i64 0), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 1, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %1, i64 0, i8* nonnull inttoptr (i64 1 to i8*)), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %1, i64 1, i8* nonnull inttoptr (i64 1 to i8*)), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 1, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %8, i64 0, i8* %5), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %8, i64 1, i8* %7), !dbg 

@lean_obj_tag to i64 (i8*)*)(i8* %0), !dbg 

@l_check_match__1___rarg to i8*), i64 3, i64 0), !dbg 

@lean_obj_tag to i64 (i8*)*)(i8* %0), !dbg 

@lean_obj_tag to i64 (i8*)*)(i8* %8), !dbg 

@l_sumT_match__1___rarg___boxed to i8*), i64 4, i64 0), !dbg 

@l_depth_match__1___rarg to i8*), i64 3, i64 0), !dbg 

@l_depth___lambda__1___boxed to i8*), i64 3, i64 2), !dbg 

@lean_closure_set to void (i8*, i64, i8*)*)(i8* %14, i64 0, i8* %0), !dbg 

@lean_closure_set to void (i8*, i64, i8*)*)(i8* %14, i64 1, i8* %13), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 0, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %19, i64 0, i8* %0), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %19, i64 1, i8* %18), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 0, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %20, i64 0, i8* %13), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %20, i64 1, i8* %19), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 1, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %24, i64 0, i8* %20), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %24, i64 1, i8* %23), !dbg 

@l_main_match__1___rarg to i8*), i64 2, i64 0), !dbg 

@lean_obj_tag to i64 (i8*)*)(i8* %0), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 0, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %6, i64 0, i8* nonnull inttoptr (i64 1 to i8*)), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %6, i64 1, i8* %.tr1.lcssa), !dbg 

@lean_obj_tag to i64 (i8*)*)(i8* %36), !dbg 

@lean_obj_tag to i64 (i8*)*)(i8* %39), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 1, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %54, i64 0, i8* %51), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %54, i64 1, i8* %53), !dbg 

@lean_obj_tag to i64 (i8*)*)(i8* %5), !dbg 

@lean_obj_tag to i64 (i8*)*)(i8* %13), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 1, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %34, i64 0, i8* %31), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %34, i64 1, i8* %33), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 1, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %40, i64 0, i8* %37), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %40, i64 1, i8* %39), !dbg 

@lean_alloc_ctor to i8* (i64, i64, i64)*)(i64 1, i64 2, i64 2), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %4, i64 0, i8* nonnull inttoptr (i64 1 to i8*)), !dbg 

@lean_ctor_set to void (i8*, i64, i8*)*)(i8* %4, i64 1, i8* nonnull inttoptr (i64 1 to i8*)), !dbg 

@__dso_handle) 

@__dso_handle) 

@__dso_handle) 

@__dso_handle) 

@_ZN4lean9throwableD2Ev to i8*)) 

@_ZN4lean9throwableD2Ev to i8*)) 

@_ZN4lean9throwableD2Ev to i8*)) 

@_ZN4lean9throwableD2Ev to i8*)) 

```

# Jul 15

```

23.77%  exe.out  exe.out           [.] lean_del

```

`lean_del` comes from the lean runtime, so I'm now writing scripts to extract out the runtime

functions.

After folding in `runtime`, we don't have `lean_del` as topmost for `binarytrees`.

Rather we have `lean_ctor_get`, which is strange since it comes from `lean.h` that's

already in `include.ll`. But we still have calls like:

```llvm

%140 = tail call i8* 

  bitcast (%struct.lean_object* (%struct.lean_object*, i32)* 

   @lean_ctor_get to i8* (i8*, i64)*)(i8* nonnull %5, i 64 1), !dbg !439

```

floating around which I find mysterious, since  wrote a pass to get rid of exactly

this type of thing! I need to debug more to find out what's happening.

# Jul 14

- Perf report on `binarytrees.lean`, the file that shows a large delta.

- It seems like we don't correctly inline some functions like `lean_del`?

```

# Report, OURS (MLIR/LLVM backend)

  23.77%  exe.out  exe.out           [.] lean_del                   

  13.06%  exe.out  exe.out           [.] l_check                    

  11.77%  exe.out  exe.out           [.] lean_ctor_get              

  10.32%  exe.out  exe.out           [.] lean_free_small            

   9.37%  exe.out  exe.out           [.] lean_alloc_ctor            

   9.28%  exe.out  exe.out           [.] lean_obj_tag               

   8.88%  exe.out  exe.out           [.] l_make_x27                 

   4.38%  exe.out  exe.out           [.] lean_alloc_small           

   4.04%  exe.out  exe.out           [.] lean_ctor_set              

   1.11%  exe.out  [kernel.vmlinux]  [k] irqentry_exit_to_user_mode 

   0.80%  exe.out  exe.out           [.] lean_mark_persistent       

   0.55%  exe.out  ld-2.33.so        [.] strcmp                     

   0.54%  exe.out  exe.out           [.] lean::allocator::alloc_page

   0.50%  exe.out  [kernel.vmlinux]  [k] clear_page_erms            

   0.42%  exe.out  exe.out           [.] lean::mix                  

   0.29%  exe.out  [kernel.vmlinux]  [k] do_user_addr_fault         

   0.18%  exe.out  exe.out           [.] l_depth___lambda__1        

   0.18%  exe.out  [kernel.vmlinux]  [k] unmap_page_range           

   0.18%  exe.out  [kernel.vmlinux]  [k] error_entry                

   0.18%  exe.out  [kernel.vmlinux]  [k] __free_one_page            

   0.18%  exe.out  [kernel.vmlinux]  [k] __mod_memcg_lruvec_state   

   0.02%  perf     [kernel.vmlinux]  [k] strrchr                    

   0.00%  perf     [kernel.vmlinux]  [k] intel_pmu_handle_irq       

   0.00%  perf     [kernel.vmlinux]  [k] native_write_msr

```

- Versus theirs:

```

  20.75%  exe-ref.out  exe-ref.out       [.] l_check                    

  18.29%  exe-ref.out  exe-ref.out       [.] lean_free_small            

  11.67%  exe-ref.out  exe-ref.out       [.] l_make_x27.part.0          

   7.74%  exe-ref.out  exe-ref.out       [.] lean_alloc_small           

   2.13%  exe-ref.out  ld-2.33.so        [.] do_lookup_x                

   1.16%  exe-ref.out  exe-ref.out       [.] lean_mark_persistent       

   0.84%  exe-ref.out  [kernel.vmlinux]  [k] try_charge                 

   0.77%  exe-ref.out  [kernel.vmlinux]  [k] sync_regs                  

   0.70%  exe-ref.out  [kernel.vmlinux]  [k] filemap_map_pages          

   0.65%  exe-ref.out  [kernel.vmlinux]  [k] get_page_from_freelist     

   0.60%  exe-ref.out  [kernel.vmlinux]  [k] unmap_page_range           

   0.52%  exe-ref.out  [kernel.vmlinux]  [k] memcg_slab_post_alloc_hook 

   0.47%  exe-ref.out  [kernel.vmlinux]  [k] page_mapping               

   0.30%  exe-ref.out  [kernel.vmlinux]  [k] __free_one_page            

   0.29%  exe-ref.out  exe-ref.out       [.] lean::allocator::alloc_page

   0.22%  exe-ref.out  exe-ref.out       [.] l_depth___lambda__1___boxed

   0.03%  perf         [kernel.vmlinux]  [k] strlen                     

   0.00%  perf         [kernel.vmlinux]  [k] intel_pmu_handle_irq       

   0.00%  perf         [kernel.vmlinux]  [k] native_write_msr           

```

# Jun 25

```

; not converted to the right call?

%4 = call i64 bitcast (i32 (%struct.lean_object*)* @lean_obj_tag to i64 (i8*)*)(i8* %0), !dbg !13

```

Calls like the above are not inlined for whatever reason. Unsure why.

# June 18

# June 12

- MLIR: try to use last stable release, and find the delta.

# June 11

```

The following tests FAILED:

        568 - leancomptest_closure_bug1.lean (Failed)

        569 - leancomptest_closure_bug2.lean (Failed)

        570 - leancomptest_closure_bug3.lean (Failed)

        576 - leancomptest_expr.lean (Failed)

        585 - leancomptest_phashmap3.lean (Failed)

        642 - leanplugintest_SnakeLinter.lean (Failed)

```

# June 8

- Scheme's design of laziness is very nice. they have force, delay, and clean semantics for what all of these mean.

  Part of R5RS. Also, their thunks can have side effects.

# June 3

- CMake settings for test files:

```

# src/shell/CMakeLists.txt

# also look at tests/test_single.sh

file(GLOB LEANT0TESTS "${LEAN_SOURCE_DIR}/../tests/lean/trust0/*.lean")

FOREACH(T ${LEANT0TESTS})

  GET_FILENAME_COMPONENT(T_NAME ${T} NAME)

  add_test(NAME "leant0test_${T_NAME}"

           WORKING_DIRECTORY "${LEAN_SOURCE_DIR}/../tests/lean/trust0"

           COMMAND bash -c "PATH=${LEAN_BIN}:$PATH ./test_single.sh ${T_NAME}")

ENDFOREACH(T)

```

```

# https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html#manual:cmake-generator-expressions(7)

$

List of objects resulting from build of objLib.

```

This is used to link against `runtime`, `kernel`, etc.

```

if(LEANCPP)

  add_library(leancpp STATIC IMPORTED)

  set_target_properties(leancpp PROPERTIES

    IMPORTED_LOCATION "${CMAKE_BINARY_DIR}/lib/lean/libleancpp.a")

  add_custom_target(copy-leancpp

    COMMAND cmake -E copy_if_different "${LEANCPP}" "${CMAKE_BINARY_DIR}/lib/lean/libleancpp.a")

  add_dependencies(leancpp copy-leancpp)

else()

  add_subdirectory(runtime)

  set(LEAN_OBJS ${LEAN_OBJS} $)

  add_subdirectory(util)

  set(LEAN_OBJS ${LEAN_OBJS} $)

  add_subdirectory(kernel)

  set(LEAN_OBJS ${LEAN_OBJS} $)

  add_subdirectory(library)

  set(LEAN_OBJS ${LEAN_OBJS} $)

  add_subdirectory(library/constructions)

  set(LEAN_OBJS ${LEAN_OBJS} $)

  add_subdirectory(library/compiler)

  set(LEAN_OBJS ${LEAN_OBJS} $)

  add_subdirectory(initialize)

  set(LEAN_OBJS ${LEAN_OBJS} $)

  if(${STAGE} EQUAL 0)

    add_subdirectory(../stdlib stdlib)

    set(LEAN_OBJS ${LEAN_OBJS} $)

  endif()

```

Running `filter.lean` with large enough program sizes causes the C++ backend

to stack overflow:

```

def main : IO Unit := let l := length (filter (make 80000)); IO.println (toString l)

```

- Test for stack overflow: `leancomptest_StackOverflow.lean`

# June 2

Why does the backend have BOTH things like `insertReset/insertReuse` which

emit calls to `lean_ctor_set_tag` as well as PRIMITIVES like `setTg`

which emit calls to `lean_ctor_set_tag`? This seems very schiozophrenic to me.

A call to `ensureHasDefault` (my lean commit `e20ee48959078cb40aa19ee4ffd22a65fd6b0195`)

changed the CORRECTNESS of the compliation. This seems dodgy at best?!

```lean

-- | pretty sure emitDec is broken, there is no variant of lean_dec

-- that can take more than 1 arg.

def emitDec (x : VarId) (n : Nat) (checkRef : Bool) : M Unit := do

  emit (if checkRef then "lean_dec" else "lean_dec_ref");

  emit "("; emit x;

  if n != 1 then emit ", "; emit n

  emitLn ");"

```

- In all of `render.lean`, there is no telltale sign of either reset or reuse.

  I could find no calls to reuse's `lean_ctor_set_tag` and reset's `lean_ctor_release`.

  So the crash must be from something else.

- I changed `sext` to `zext` because `zext` extends true into `1`, while `sext` extends true into `-1`.

- I also tried to see if sharing was the problem, so I forced LEAN to consider everything as shared all the time.

  This still only allows `render.lean` to crash `x(`.

```lean

-- | Code to force everything to be considered as sharing.

-- | when writing into a variable sign-extend boolean i1s into i8s.

-- This is SUCH a clusterfuck.

def emitIsShared (z : VarId) (x : VarId) : M Unit := do

  emitLhs z; emit " std.constant 1 : i8"; -- nothing is ever exclusive!

  -- let excl <- gensym "exclusive";

  -- emit $ "%" ++ excl ++ " = call @lean_is_exclusive(%" ++ (toString x) ++ ")";

  -- emitLn $ " : (!lz.value) -> i1";

  -- emitLhs z; emit $ (escape "ptr.not") ++ "(%" ++ excl ++ ")";

  -- emitLn $ " : (i1) -> i8"

```

- Disable `dec` (decrement) fixes the crash in `render.lean`. Of course, this is a disgusting hack!

```

-- | Hack to fix crash in `render.lean`.

def emitDec (x : VarId) (n : Nat) (checkRef : Bool) : M Unit := do

  -- if n != 1 then panicM "there is no lean_dec for more than 1 parameter"

  -- let nv <- emitI64 "n" n;

  -- emit $ "call " ++ (if checkRef then "@lean_dec" else "@lean_dec_ref");

  -- emit "(%"; emit x; emitLn ") : (!lz.value) -> ()"

  return ()

```

- I wanted to get a sense of our backend v/s the LEAN backend. For one, we can tolerate larger problem sizes.

  For example, set `n=15` on `const_fold.lean`. This program allows us to succeed, while the C backend fails.

- We are also much faster. For example, on `qsort.lean` with `n=100`:

```

/home/bollu/work/lz/test/lambdapure/compile/bench$ time ./exe.out    

./exe.out  0.64s user 0.01s system 99% cpu 0.652 total

/home/bollu/work/lz/test/lambdapure/compile/bench$ time ./exe-ref.out

./exe-ref.out  0.87s user 0.01s system 99% cpu 0.880 total

```

- We need to control `musttail`, which I've addded as a `TODO`.

# May 28

Some kind of miscompile of case statements from `render.lean`. I generate

an empty `case:`

```

"lz.caseRet"(%2) ( {

},  {

}) : (!lz.value) -> ()

```

The larger context is from the function `l_IO_randFloat`:

```

caseop parent:

func @l_IO_randFloat(%arg0: f64, %arg1: f64, %arg2: !lz.value) -> !lz.value {

  %c0_i64 = constant 0 : i64

  %0 = call @lean_box(%c0_i64) : (i64) -> !lz.value

  %1 = "ptr.loadglobal"() {value = @l_IO_stdGenRef} : () -> !lz.value

  %2 = call @lean_st_ref_get(%1, %arg2) : (!lz.value, !lz.value) -> !lz.value

  %3 = call @lean_obj_tag(%2) : (!lz.value) -> i64

  %c0_i64_0 = constant 0 : i64

  %4 = cmpi eq, %3, %c0_i64_0 : i64

  cond_br %4, ^bb1, ^bb2

  "lz.caseRet"(%2) ( {

  },  {

  }) : (!lz.value) -> ()

^bb1:  // pred: ^bb0

  %5 = "lz.project"(%2) {value = 0 : i64} : (!lz.value) -> !lz.value

  %6 = "lz.project"(%2) {value = 1 : i64} : (!lz.value) -> !lz.value

  %7 = call @l_randomFloat___at_IO_randFloat___spec__1(%5) : (!lz.value) -> !lz.value

  "lz.caseRet"(%7) ( {

    %10 = "lz.project"(%7) {value = 0 : i64} : (!lz.value) -> !lz.value

    %11 = "lz.project"(%7) {value = 1 : i64} : (!lz.value) -> !lz.value

    %12 = call @lean_st_ref_set(%1, %11, %6) : (!lz.value, !lz.value, !lz.value) -> !lz.value

    "lz.caseRet"(%12) ( {

      %13 = "lz.project"(%12) {value = 1 : i64} : (!lz.value) -> !lz.value

      %14 = call @lean_float_sub(%arg1, %arg0) : (f64, f64) -> f64

      %15 = call @lean_unbox_float(%10) : (!lz.value) -> f64

      %16 = call @lean_float_mul(%14, %15) : (f64, f64) -> f64

      %17 = call @lean_float_add(%arg0, %16) : (f64, f64) -> f64

      %18 = call @lean_box_float(%17) : (f64) -> !lz.value

      %19 = "lz.construct"(%18, %13) {dataconstructor = @"0", size = 2 : i64} : (!lz.value, !lz.value) -> !lz.value

      lz.return %19 : !lz.value

    },  {

    }) : (!lz.value) -> ()

  }) : (!lz.value) -> ()

^bb2:  // pred: ^bb0

  %8 = call @lean_obj_tag(%2) : (!lz.value) -> i64

  %c1_i64 = constant 1 : i64

  %9 = cmpi eq, %8, %c1_i64 : i64

}

```

The offending function, with comments:

```

// ERR: emitDeclAux (l_IO_randFloat) | isExtern?false

// ERR: emitDeclAux Decl.fdecl (l_IO_randFloat)

func @"l_IO_randFloat"(%x_1: f64, %x_2: f64, %x_3: !lz.value) -> !lz.value{

%c0_irr = std.constant 0 : i64

%irrelevant = call @lean_box(%c0_irr) : (i64) -> (!lz.value)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.fap

%x_4 = "ptr.loadglobal"(){value=@"l_IO_stdGenRef"} : () -> !lz.value// <== ERR: emitFullApp (pointer)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.fap

%x_5 = call @"lean_st_ref_get"(%x_4, %x_3) : (!lz.value,!lz.value) -> (!lz.value) // <== ERR: emitSimpleExternalCall // <== ys: [x_4, x_3]| tys: 

// ^^ ERR: ExternEntry.standard

// ^^^ ERR: emitFullApp (Decl.extern)

// ERR: FnBody.case

"lz.caseRet"(%x_5)({

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.proj

%x_6 = "lz.project"(%x_5){value=0}:(!lz.value)  -> (!lz.value)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.proj

%x_7 = "lz.project"(%x_5){value=1}:(!lz.value)  -> (!lz.value)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.fap

%x_8 = call @"l_randomFloat___at_IO_randFloat___spec__1"(%x_6):(!lz.value)->(!lz.value)// <== ERR: emitFullApp (fncall)

// ERR: FnBody.case

"lz.caseRet"(%x_8)({

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.proj

%x_9 = "lz.project"(%x_8){value=0}:(!lz.value)  -> (!lz.value)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.proj

%x_10 = "lz.project"(%x_8){value=1}:(!lz.value)  -> (!lz.value)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.fap

%x_11 = call @"lean_st_ref_set"(%x_4, %x_10, %x_7) : (!lz.value,!lz.value,!lz.value) -> (!lz.value) // <== ERR: emitSimpleExternalCall // <== ys: [x_4, x_10, x_7]| tys: 

// ^^ ERR: ExternEntry.standard

// ^^^ ERR: emitFullApp (Decl.extern)

// ERR: FnBody.case

"lz.caseRet"(%x_11)({

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.proj

%x_12 = "lz.project"(%x_11){value=1}:(!lz.value)  -> (!lz.value)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.fap

%x_13 = call @"lean_float_sub"(%x_2, %x_1) : (f64,f64) -> (f64) // <== ERR: emitSimpleExternalCall // <== ys: [x_2, x_1]| tys: 

// ^^ ERR: ExternEntry.standard

// ^^^ ERR: emitFullApp (Decl.extern)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.unbox

%x_14 = call @lean_unbox_float(%x_9) : (!lz.value) -> (f64)

//^UNBOX type: (f64)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.fap

%x_15 = call @"lean_float_mul"(%x_13, %x_14) : (f64,f64) -> (f64) // <== ERR: emitSimpleExternalCall // <== ys: [x_13, x_14]| tys: 

// ^^ ERR: ExternEntry.standard

// ^^^ ERR: emitFullApp (Decl.extern)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.fap

%x_16 = call @"lean_float_add"(%x_1, %x_15) : (f64,f64) -> (f64) // <== ERR: emitSimpleExternalCall // <== ys: [x_1, x_15]| tys: 

// ^^ ERR: ExternEntry.standard

// ^^^ ERR: emitFullApp (Decl.extern)

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.box

%x_17 = call @lean_box_float(%x_16) : (f64) -> (!lz.value) // ERR: xType: float

//ERR: fnBody.vdecl (non-tail) call

%x_18 = "lz.construct"(%x_17, %x_12){dataconstructor = @"0", size=2} : (!lz.value, !lz.value) -> (!lz.value)

//ERR: FnBody.ret

lz.return %x_18 : !lz.value

}

, {

// ERR: FnBody.unreachable

}

)

 : (!lz.value) -> ()

}

)

 : (!lz.value) -> ()

}

, {

// ERR: FnBody.unreachable

}

)

 : (!lz.value) -> ()

}

```

At least some of the problem is caused by `// ERR: FnBody.unreachable`. Let me teach LEAN how to generate

an unreachable.

```

Writing to out.ppm

./exe-ref.out  532.94s user 8.26s system 712% cpu 1:15.99 total

```

```

line 133 of 133

Writing to out.ppm

./exe.out  514.85s user 7.02s system 746% cpu 1:09.91 total

/home/bollu/work/lz/test/lambdapure/compile$ 

```

Next step, turn on all the passes in the LEAN compiler. This will force me to implement `reset/reuse` etc.

Current status before I turn on more passes:

```

/home/bollu/work/lz/test/lambdapure/compile$ llvm-lit -j1 .

-- Testing: 28 tests, 1 workers --

PASS: HASK_OPT :: lambdapure/compile/bench/deriv.lean (1 of 28)

PASS: HASK_OPT :: lambdapure/compile/bench/qsort.lean (2 of 28)

PASS: HASK_OPT :: lambdapure/compile/bench/unionfind.lean (3 of 28)

PASS: HASK_OPT :: lambdapure/compile/bench/rbmap_checkpoint.lean (4 of 28)

PASS: HASK_OPT :: lambdapure/compile/render.lean (5 of 28)

PASS: HASK_OPT :: lambdapure/compile/pap.lean (6 of 28)

PASS: HASK_OPT :: lambdapure/compile/bench/const_fold.lean (7 of 28)

PASS: HASK_OPT :: lambdapure/compile/jmp.lean (8 of 28)

PASS: HASK_OPT :: lambdapure/compile/ctor.lean (9 of 28)

PASS: HASK_OPT :: lambdapure/compile/loop.lean (10 of 28)

PASS: HASK_OPT :: lambdapure/compile/bench/binarytrees-int.lean (11 of 28)

PASS: HASK_OPT :: lambdapure/compile/bench/binarytrees.lean (12 of 28)

PASS: HASK_OPT :: lambdapure/compile/case.lean (13 of 28)

PASS: HASK_OPT :: lambdapure/compile/main-print.lean (14 of 28)

PASS: HASK_OPT :: lambdapure/compile/ctor-simple.lean (15 of 28)

PASS: HASK_OPT :: lambdapure/compile/bench/filter.lean (16 of 28)

PASS: HASK_OPT :: lambdapure/compile/mutualrec.lean (17 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/exe.mlir (18 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/rbmap3.lean (19 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/map-destruct.mlir (20 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/rbmap2.lean (21 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/map-ref.mlir (22 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/rbmap4.lean (23 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/rbmap500k.lean (24 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/lambdapure.mlir (25 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/map-runtime.mlir (26 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/exe.mlir (27 of 28)

UNRESOLVED: HASK_OPT :: lambdapure/compile/bench/loop.mlir (28 of 28)

********************

Unresolved Tests (11):

  HASK_OPT :: lambdapure/compile/bench/exe.mlir

  HASK_OPT :: lambdapure/compile/bench/lambdapure.mlir

  HASK_OPT :: lambdapure/compile/bench/loop.mlir

  HASK_OPT :: lambdapure/compile/bench/map-destruct.mlir

  HASK_OPT :: lambdapure/compile/bench/map-ref.mlir

  HASK_OPT :: lambdapure/compile/bench/map-runtime.mlir

  HASK_OPT :: lambdapure/compile/bench/rbmap2.lean

  HASK_OPT :: lambdapure/compile/bench/rbmap3.lean

  HASK_OPT :: lambdapure/compile/bench/rbmap4.lean

  HASK_OPT :: lambdapure/compile/bench/rbmap500k.lean

  HASK_OPT :: lambdapure/compile/exe.mlir

Testing Time: 47.74s

  Passed    : 17

  Unresolved: 11

```

#### debugging `lz.jmp` crash

`simpCase` causes `lz.jmp` to crash. Diff between `nocrash.mlir` and `crash.mlir`

```

58,63d57

<           %3 = "lz.papExtend"(%arg3, %arg0) : (!lz.value, !lz.value) -> !lz.value

<           lz.return %3 : !lz.value

<         },  {

<           %3 = "lz.papExtend"(%arg3, %arg0) : (!lz.value, !lz.value) -> !lz.value

<           lz.return %3 : !lz.value

<         },  {

66a61,63

>         },  {

>           %3 = "lz.papExtend"(%arg3, %arg0) : (!lz.value, !lz.value) -> !lz.value

>           lz.return %3 : !lz.value

70,75d66

<           %3 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

<           "lz.jump"(%3) {value = 12 : i64} : (!lz.value) -> ()

<         },  {

<           %3 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

<           "lz.jump"(%3) {value = 12 : i64} : (!lz.value) -> ()

<         },  {

77,78c68,69

<             %3 = "lz.project"(%1) {value = 0 : i64} : (!lz.value) -> !lz.value

<             %4 = "lz.papExtend"(%arg2, %3, %2) : (!lz.value, !lz.value, !lz.value) -> !lz.value

---

>             %3 = "lz.project"(%2) {value = 0 : i64} : (!lz.value) -> !lz.value

>             %4 = "lz.papExtend"(%arg1, %1, %3) : (!lz.value, !lz.value, !lz.value) -> !lz.value

84,87d74

<           },  {

<             %3 = "lz.project"(%2) {value = 0 : i64} : (!lz.value) -> !lz.value

<             %4 = "lz.papExtend"(%arg1, %1, %3) : (!lz.value, !lz.value, !lz.value) -> !lz.value

<             lz.return %4 : !lz.value

88a76,78

>         },  {

>           %3 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

>           "lz.jump"(%3) {value = 11 : i64} : (!lz.value) -> ()

94,96d83

<     },  {

<       %1 = "lz.papExtend"(%arg3, %arg0) : (!lz.value, !lz.value) -> !lz.value

<       lz.return %1 : !lz.value

135c122

<           %3 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__1} : () -> !lz.value

---

>           %3 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__2} : () -> !lz.value

140,142d126

<         },  {

<           %3 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__2} : () -> !lz.value

<           lz.return %3 : !lz.value

146,151d129

<           %3 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

<           "lz.jump"(%3) {value = 8 : i64} : (!lz.value) -> ()

<         },  {

<           %3 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

<           "lz.jump"(%3) {value = 8 : i64} : (!lz.value) -> ()

<         },  {

153c131

<             %3 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__3} : () -> !lz.value

---

>             %3 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__2} : () -> !lz.value

158,160d135

<           },  {

<             %3 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__2} : () -> !lz.value

<             lz.return %3 : !lz.value

161a137,139

>         },  {

>           %3 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

>           "lz.jump"(%3) {value = 7 : i64} : (!lz.value) -> ()

167,169d144

<     },  {

<       %1 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__1} : () -> !lz.value

<       lz.return %1 : !lz.value

176,181d150

<       %1 = "lz.papExtend"(%arg2, %arg0) : (!lz.value, !lz.value) -> !lz.value

<       lz.return %1 : !lz.value

<     },  {

<       %1 = "lz.papExtend"(%arg2, %arg0) : (!lz.value, !lz.value) -> !lz.value

<       lz.return %1 : !lz.value

<     },  {

184a154,156

>     },  {

>       %1 = "lz.papExtend"(%arg2, %arg0) : (!lz.value, !lz.value) -> !lz.value

>       lz.return %1 : !lz.value

197c169

<       %1 = "lz.int"() {value = 420 : i64} : () -> !lz.value

---

>       %1 = "lz.project"(%arg0) {value = 0 : i64} : (!lz.value) -> !lz.value

201,203d172

<       lz.return %1 : !lz.value

<     },  {

<       %1 = "lz.project"(%arg0) {value = 0 : i64} : (!lz.value) -> !lz.value

```

This is caused by the pass `simpCase.lean` which shuffles around case branches.

This now exposes the compiler to more complex code. For example, without

optimisation, I can assume that the case alternatives are in order, such as `0, 1, 2`.

After `simplifyCase`, this is no longer the case, it could have been

transformed into 2, default since the cases for 0 and 1 do the same thing. This

needs me to change lowering.

#### Turning on non-risky passes

To turn on: reset/reuse, borrow, RC.

```

private def compileAux (decls : Array Decl) : CompilerM Unit := do

  -- log (LogEntry.message "// compileAux")

  -- logDecls `init decls

  -- logPreamble (LogEntry.message mlirPreamble)

  -- logDeclsUnconditional decls

  checkDecls decls

  let decls ← elimDeadBranches decls

  logDecls `elim_dead_branches decls

  let decls := decls.map Decl.pushProj

  logDecls `push_proj decls

  -- let decls := decls.map Decl.insertResetReuse

  -- logDecls `reset_reuse decls

  let decls := decls.map Decl.elimDead

  logDecls `elim_dead decls

  let decls := decls.map Decl.simpCase

  logDecls `simp_case decls

  let decls := decls.map Decl.normalizeIds

  -- logDeclsUnconditional decls

  -- let decls ← inferBorrow decls

  -- logDecls `borrow decls

  let decls ← explicitBoxing decls

  logDecls `boxing decls

  -- let decls ← explicitRC decls

  -- logDecls `rc decls

  -- let decls := decls.map Decl.expandResetReuse

  -- logDecls `expand_reset_reuse decls

  let decls := decls.map Decl.pushProj

  logDecls `push_proj decls

  let decls ← updateSorryDep decls

  logDecls `result decls

  checkDecls decls

  addDecls decls

```

Failing tests:

```

HASK_OPT :: lambdapure/compile/bench/deriv.lean

HASK_OPT :: lambdapure/compile/jmp.lean

```

# May 27

Continuing my debugging saga of getting `USize` and its equalities to work:

For whatever reason, it needs the option `bootstrap.genMatcherCode`:

```

set_option bootstrap.genMatcherCode false in

-- @[extern c inline "#1 == #2"]

@[extern c "lean_uint64_eq"]

def UInt64.decEq (a b : UInt64) : Decidable (Eq a b) :=

  match a, b with

  | ⟨n⟩, ⟨m⟩ =>

    dite (Eq n m) (fun h => isTrue (h ▸ rfl)) (fun h => isFalse (fun h' => UInt64.noConfusion h' (fun h' => absurd h' h)))

```

which is documented in `Meta/Match.lean`

```

stage0/src/Lean/Meta/Match/Match.lean

571:register_builtin_option bootstrap.genMatcherCode : Bool := {

583:  let compile := bootstrap.genMatcherCode.get (← getOptions)

```

On implementing new primops, `stage1` fails compiling because of mismatch, `stage2` fails compiling because of incorrect return type?!

Apparently one should return `uint8_t`, not `bool` (as I did when I wrote the include).

```

../build/stage1/lib/temp/Init/Prelude.c:53:9: error: conflicting types for ‘lean_uint64_eq’

   53 | uint8_t lean_uint64_eq(uint64_t, uint64_t);

      |         ^~~~~~~~~~~~~~

In file included from ../build/stage1/lib/temp/Init/Prelude.c:4:

/home/bollu/work/lean4/build/stage1/bin/../include/lean/lean.h:1316:20: note: previous definition of ‘lean_uint64_eq’ was here

 1316 | static inline bool lean_uint64_eq(uint64_t a, uint64_t b) {

      |                    ^~~~~~~~~~~~~~

../build/stage1/lib/temp/Init/Prelude.c:67:9: error: conflicting types for ‘lean_uint8_eq’

   67 | uint8_t lean_uint8_eq(uint8_t, uint8_t);

      |         ^~~~~~~~~~~~~

In file included from ../build/stage1/lib/temp/Init/Prelude.c:4:

/home/bollu/work/lean4/build/stage1/bin/../include/lean/lean.h:1325:20: note: previous definition of ‘lean_uint8_eq’ was here

 1325 | static inline bool lean_uint8_eq(uint8_t a, uint8_t b) {

      |                    ^~~~~~~~~~~~~

../build/stage1/lib/temp/Init/Prelude.c:77:9: error: conflicting types for ‘lean_usize_eq’

   77 | uint8_t lean_usize_eq(size_t, size_t);

      |         ^~~~~~~~~~~~~

In file included from ../build/stage1/lib/temp/Init/Prelude.c:4:

/home/bollu/work/lean4/build/stage1/bin/../include/lean/lean.h:1313:20: note: previous definition of ‘lean_usize_eq’ was here

 1313 | static inline bool lean_usize_eq(size_t a, size_t b) {

      |                    ^~~~~~~~~~~~~

../build/stage1/lib/temp/Init/Prelude.c:544:9: error: conflicting types for ‘lean_uint32_eq’

  544 | uint8_t lean_uint32_eq(uint32_t, uint32_t);

      |         ^~~~~~~~~~~~~~

In file included from ../build/stage1/lib/temp/Init/Prelude.c:4:

/home/bollu/work/lean4/build/stage1/bin/../include/lean/lean.h:1319:20: note: previous definition of ‘lean_uint32_eq’ was here

 1319 | static inline bool lean_uint32_eq(uint32_t a, uint32_t b) {

      |                    ^~~~~~~~~~~~~~

../build/stage1/lib/temp/Init/Prelude.c:732:9: error: conflicting types for ‘lean_uint16_eq’

  732 | uint8_t lean_uint16_eq(uint16_t, uint16_t);

      |         ^~~~~~~~~~~~~~

In file included from ../build/stage1/lib/temp/Init/Prelude.c:4:

/home/bollu/work/lean4/build/stage1/bin/../include/lean/lean.h:1322:20: note: previous definition of ‘lean_uint16_eq’ was here

 1322 | static inline bool lean_uint16_eq(uint16_t a, uint16_t b) {

      |                    ^~~~~~~~~~~~~~

```

The document at `lean4/doc/make/index.md` was very helpful since it talks about the entire build process.

There is something funky about external calls. In particular, I added a TODO where the previous code

would emit incorrect code, while the new does not (see the `TODO: understand why the line does not work`).

```lean

-- | ps: description of formal parameters to the function f.

def emitSimpleExternalCall (f : String) (ps : Array Param) (ys : Array Arg)

  (tys: HashMap VarId IRType) (retty: IRType) : M Unit := do

  -- let fname <- toCName f; -- added by bollu

  let fname := f;

  emit "call "; emit "@"; emit (escape fname)

  -- We must remove irrelevant arguments to extern calls

  let psys := (ps.zip ys).filter (fun py => not (py.fst.ty.isIrrelevant))

  let ys := Array.map Prod.snd psys

  emit "("

  -- We must remove irrelevant arguments to extern calls.

  ys.size.forM (fun i => do

         if i > 0 then emit ", ";

         emitArg ys[i])

  emit ")"

  -- TODO: understand why the line

  --    emit " : ("; emitArgsOnlyTys ys tys; emit ")";

  -- does not work!!!!

  emit " : ("; 

  psys.size.forM (fun i => do

    if i > 0 then emit ","

    emit (toCType psys[i].fst.ty);

  )

  emit ")";

  emit " -> "; emit "(";  emit (toCType retty);  emit ")";

  emit " // <== ERR: emitSimpleExternalCall";

  emit $ " // <== ys: " ++ toString (toStringArgs ys) ++ "| tys: ";

  emit "\n"

  pure ()

```

My implementation of adding types at the beginning of  function was broken.

Fixing that allows us to codegen `binarytrees.lean`.

One of the big hacks of the day is:

```

/home/bollu/work/lean4/lean4-mode$ cp ~/work/lean4/src/include/lean/lean.h ~/work/lean4/build/stage0/include/lean/lean.h

/home/bollu/work/lean4/lean4-mode$ cp ~/work/lean4/src/include/lean/lean.h ~/work/lean4/build/stage1/include/lean/lean.h

```

Ie, I edit the prelude files and force-copy them into the build system. I'm not too sure if this is necessary,

or even correct, but it works once, and I'm too scared to modify the ritual since.

# May 26:

We get code generated like:

```

//ERR: fnBody.vdecl (non-tail) call

// ERR: Expr.fap

%x_8 = x_6 + x_7;

//^^ ERR: ExternEntry.inline [pat: #1 + #2]

// ^^^ ERR: emitFullApp (Decl.extern)

```

because the pattern compiles directly to `a + b`, not some kind of function call `x(`

```

| some (ExternEntry.inline _ pat)     => do 

       emit (expandExternPattern pat (toStringArgs ys)); emitLn ";"

       emitLn $ "//^^ ERR: ExternEntry.inline [pat: " ++ pat ++ "]"; 

```

This expand pattern thing expands into a pattern that's in C-like syntax. It carries

no semantic information for me to generate an actual function call x(. Can I find out

where this pattern comes from? Looking for the pattern `#1 + #2` gives:

```

/home/bollu/work/lean4$ ag -F "#1 + #2"        

stage0/src/Lean/Compiler/ExternAttr.lean

27:- `@[extern cpp inline "#1 + #2"]`

28:   encoding: ```.entries = [inline `cpp "#1 + #2"]```

stage0/src/Init/Data/UInt.lean

17:@[extern c inline "#1 + #2"]

82:@[extern c inline "#1 + #2"]

148:@[extern c inline "#1 + #2"]

200:@[extern c inline "#1 + #2"]

279:@[extern c inline "#1 + #2"]

stage0/src/Init/Data/Float.lean

35:@[extern c inline "#1 + #2"]  constant Float.add : Float → Float → Float

src/Lean/Compiler/ExternAttr.lean

27:- `@[extern cpp inline "#1 + #2"]`

28:   encoding: ```.entries = [inline `cpp "#1 + #2"]```

src/Init/Data/Float.lean

35:@[extern c inline "#1 + #2"]  constant Float.add : Float → Float → Float

src/Init/Data/UInt.lean

17:@[extern c inline "#1 + #2"]

82:@[extern c inline "#1 + #2"]

148:@[extern c inline "#1 + #2"]

200:@[extern c inline "#1 + #2"]

279:@[extern c inline "#1 + #2"]

```

To get a sense of how much this is going to screw me, I looked for all

instances of `extern c inline`: (I deleted the entries from `stage0`, because

those are double):

```

tmp/PreludeNew.lean:@[extern c inline "lean_nat_sub(#1, lean_box(1))"]

tmp/PreludeNew.lean:@[extern c inline "#1 == #2"]

tmp/PreludeNew.lean:@[extern c inline "#1 == #2"]

tmp/PreludeNew.lean:@[extern c inline "#1 == #2"]

tmp/PreludeNew.lean:@[extern c inline "#1 == #2"]

tmp/PreludeNew.lean:@[extern c inline "#1 == #2"]

tmp/PreludeNew.lean:@[extern c inline "#3"]

tests/compiler/lazylist.lean:@[extern c inline "#2"]

src/Lean/Expr.lean:@[extern c inline "(uint8_t)((#1 << 24) >> 61)"]

src/Lean/Expr.lean:@[extern c inline "(uint64_t)#1"]

src/Lean/Util/Path.lean:@[extern c inline "LEAN_IS_STAGE0"]

src/Lean/Compiler/IR/Checker.lean:@[extern c inline "lean_box(LEAN_MAX_CTOR_FIELDS)"]

src/Lean/Compiler/IR/Checker.lean:@[extern c inline "lean_box(LEAN_MAX_CTOR_SCALARS_SIZE)"]

src/Lean/Compiler/IR/Checker.lean:@[extern c inline "lean_box(sizeof(size_t))"]

src/Init/Core.lean:@[extern c inline "#1 || #2"] def strictOr  (b₁ b₂ : Bool) := b₁ || b₂

src/Init/Core.lean:@[extern c inline "#1 && #2"] def strictAnd (b₁ b₂ : Bool) := b₁ && b₂

src/Init/Prelude.lean:@[extern c inline "lean_nat_sub(#1, lean_box(1))"]

src/Init/Prelude.lean:@[extern c inline "#1 == #2"]

src/Init/Prelude.lean:@[extern c inline "#1 == #2"]

src/Init/Prelude.lean:@[extern c inline "#1 == #2"]

src/Init/Prelude.lean:@[extern c inline "#1 < #2"]

src/Init/Prelude.lean:@[extern c inline "#1 <= #2"]

src/Init/Prelude.lean:@[extern c inline "#1 == #2"]

src/Init/Prelude.lean:@[extern c inline "#1 == #2"]

src/Init/Prelude.lean:@[extern c inline "#3"]

src/Init/Data/Float.lean:@[extern c inline "#1 + #2"]  constant Float.add : Float → Float → Float

src/Init/Data/Float.lean:@[extern c inline "#1 - #2"]  constant Float.sub : Float → Float → Float

src/Init/Data/Float.lean:@[extern c inline "#1 * #2"]  constant Float.mul : Float → Float → Float

src/Init/Data/Float.lean:@[extern c inline "#1 / #2"]  constant Float.div : Float → Float → Float

src/Init/Data/Float.lean:@[extern c inline "(- #1)"]   constant Float.neg : Float → Float

src/Init/Data/Float.lean:@[extern c inline "#1 == #2"] constant Float.beq (a b : Float) : Bool

src/Init/Data/Float.lean:@[extern c inline "#1 < #2"]   constant Float.decLt (a b : Float) : Decidable (a < b) :=

src/Init/Data/Float.lean:@[extern c inline "#1 <= #2"] constant Float.decLe (a b : Float) : Decidable (a ≤ b) :=

src/Init/Data/Float.lean:@[extern c inline "(uint8_t)#1"] constant Float.toUInt8 : Float → UInt8

src/Init/Data/Float.lean:@[extern c inline "(uint16_t)#1"] constant Float.toUInt16 : Float → UInt16

src/Init/Data/Float.lean:@[extern c inline "(uint32_t)#1"] constant Float.toUInt32 : Float → UInt32

src/Init/Data/Float.lean:@[extern c inline "(uint64_t)#1"] constant Float.toUInt64 : Float → UInt64

src/Init/Data/Float.lean:@[extern c inline "(size_t)#1"] constant Float.toUSize : Float → USize

src/Init/Data/UInt.lean:@[extern c inline "#1 + #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 - #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 * #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? 0 : #1 / #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? #1 : #1 % #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 & #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 | #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 ^ #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 << #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 >> #2"]

src/Init/Data/UInt.lean:@[extern c inline "~ #1"]

src/Init/Data/UInt.lean:@[extern c inline "#1 < #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 <= #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 + #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 - #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 * #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? 0 : #1 / #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? #1 : #1 % #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 & #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 | #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 ^ #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 << #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 >> #2"]

src/Init/Data/UInt.lean:@[extern c inline "~ #1"]

src/Init/Data/UInt.lean:@[extern c inline "#1 < #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 <= #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 + #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 - #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 * #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? 0 : #1 / #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? #1 : #1 % #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 & #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 | #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 ^ #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 << #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 >> #2"]

src/Init/Data/UInt.lean:@[extern c inline "((uint8_t)#1)"]

src/Init/Data/UInt.lean:@[extern c inline "((uint16_t)#1)"]

src/Init/Data/UInt.lean:@[extern c inline "((uint32_t)#1)"]

src/Init/Data/UInt.lean:@[extern c inline "~ #1"]

src/Init/Data/UInt.lean:@[extern c inline "#1 + #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 - #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 * #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? 0 : #1 / #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? #1 : #1 % #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 & #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 | #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 ^ #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 << #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 >> #2"]

src/Init/Data/UInt.lean:@[extern c inline "((uint8_t)#1)"]

src/Init/Data/UInt.lean:@[extern c inline "((uint16_t)#1)"]

src/Init/Data/UInt.lean:@[extern c inline "((uint32_t)#1)"]

src/Init/Data/UInt.lean:@[extern c inline "((uint64_t)#1)"]

src/Init/Data/UInt.lean:@[extern c inline "~ #1"]

src/Init/Data/UInt.lean:@[extern c inline "(uint64_t)#1"]

src/Init/Data/UInt.lean:@[extern c inline "#1 < #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 <= #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 + #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 - #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 * #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? 0 : #1 / #2"]

src/Init/Data/UInt.lean:@[extern c inline "#2 == 0 ? #1 : #1 % #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 & #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 | #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 ^ #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 << #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 >> #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1"]

src/Init/Data/UInt.lean:@[extern c inline "((size_t)#1)"]

src/Init/Data/UInt.lean:@[extern c inline "(uint32_t)#1"]

src/Init/Data/UInt.lean:@[extern c inline "~ #1"]

src/Init/Data/UInt.lean:@[extern c inline "#1 < #2"]

src/Init/Data/UInt.lean:@[extern c inline "#1 <= #2"]

src/Init/Meta.lean:@[extern c inline "lean_box(LEAN_VERSION_MAJOR)"]

src/Init/Meta.lean:@[extern c inline "lean_box(LEAN_VERSION_MINOR)"]

src/Init/Meta.lean:@[extern c inline "lean_box(LEAN_VERSION_PATCH)"]

src/Init/Meta.lean:-- @[extern c inline "lean_mk_string(LEAN_GITHASH)"]

src/Init/Meta.lean:@[extern c inline "LEAN_VERSION_IS_RELEASE"]

src/Init/Meta.lean:@[extern c inline "lean_mk_string(LEAN_SPECIAL_VERSION_DESC)"]

src/Init/Fix.lean:@[extern c inline "lean_fixpoint(#4, #5)"]

src/Init/Fix.lean:@[extern c inline "lean_fixpoint2(#5, #6, #7)"]

src/Init/Fix.lean:@[extern c inline "lean_fixpoint3(#6, #7, #8, #9)"]

src/Init/Fix.lean:@[extern c inline "lean_fixpoint4(#7, #8, #9, #10, #11)"]

src/Init/Fix.lean:@[extern c inline "lean_fixpoint5(#8, #9, #10, #11, #12, #13)"]

src/Init/Fix.lean:@[extern c inline "lean_fixpoint6(#9, #10, #11, #12, #13, #14, #15)"]

```

Oh god damn, I can't just directly change a line such as:

```

src/Init/Data/UInt.lean:@[extern c inline "#1 + #2"]

```

into:

```

src/Init/Data/UInt.lean:@[extern c inline "add #1, #2"]

```

because the code is *self-hosting*. So if I try to compile stuff using the MLIR backend,

I need to compile LEAN using the MLIR backend x(

One way to "solve" this is to convert `Uint32` to `Int` (The example comes from `test/lambdapure/compile/bench/deriv.lean`):

```diff

-def count : Expr → UInt32

+def count : Expr → Int

| Val _   => 1

| Var _   => 1

| Add f g   => count f + count g

| Mul f g   => count f + count g

| Pow f g   => count f + count g

| Ln f      => count f

```

I'm now trying to hack into the compiler, and change the meaning `Uint` to be an external call:

```diff

--- a/src/Init/Data/UInt.lean

+++ b/src/Init/Data/UInt.lean

@@ -14,8 +14,11 @@ def UInt8.ofNat (n : @& Nat) : UInt8 := ⟨Fin.ofNat n⟩

 abbrev Nat.toUInt8 := UInt8.ofNat

 @[extern "lean_uint8_to_nat"]

 def UInt8.toNat (n : UInt8) : Nat := n.val.val

-@[extern c inline "#1 + #2"]

+-- @[extern c inline "#1 + #2"]

+-- @[extern c inline "add #1, #2"]

+@[extern "lean_uint8_add"]

 def UInt8.add (a b : UInt8) : UInt8 := ⟨a.val + b.val⟩

+

 @[extern c inline "#1 - #2"]

 def UInt8.sub (a b : UInt8) : UInt8 := ⟨a.val - b.val⟩

 @[extern c inline "#1 * #2"]

@@ -145,7 +148,8 @@ def UInt32.ofNat (n : @& Nat) : UInt32 := ⟨Fin.ofNat n⟩

 @[extern "lean_uint32_of_nat"]

 def UInt32.ofNat' (n : Nat) (h : n < UInt32.size) : UInt32 := ⟨⟨n, h⟩⟩

 abbrev Nat.toUInt32 := UInt32.ofNat

-@[extern c inline "#1 + #2"]

+@[extern "lean_uint32_add"]

+-- @[extern c inline "#1 + #2"]

```

This hack works, and I now generate the "expected" MLIR code. I'm surprised that the compiler

still self-hosts; I expected linker errors for missing reference to `lean_uint32_add`.

It seems like the lean compiler does not in fact use `uint32` all that much.

Remember, if you builds run slower than real LEAN, it's probably because you're encoding equality of `UIntN`

is some extremely stupid way!

```

-- set_option bootstrap.genMatcherCode false in

-- @[extern c inline "#1 == #2"]

```

what is `genMatcherCode`? Will be interesting to find out

# May 23:

should `caseRet` be a terminator? If it is a terminator, how should it lower? if it lowers to a `scf.if`,

what instruction should it keep "after" its insertion? should it emit a `std.return` or a `scf.yield`?

this would depend on the parent (!) 

What is the return type of a `lz.caseRet`? So far, I was doing weird

shit like looking at the case branch. Maybe it's possible to do this in some

other way?

[WIP] give up on SCF.if

The `SCF.if` by default inserts a `scf.yield` which is first of all

annoying.

Furthermore, the problem is that `scf.if` creates a region from which

we need to region values from. This complicates basically everything

about generating code, because I can't generate code with the semantics:

```

int foo(int x) {

    if (x == 1) { return -1; }

    if (x == 2) { return -2; }

    return -42;

}

```

because `return` can only return from a *region*, not escape out of an enclosing region. We would

need this power to be able to useful things.

I'm gonna say fuck this and just directly generate BBs.

Seriously, LLVM is JUST BETTER!

Is `replaceOpWithNewOp(oldOp ...)` supposed to trigger a `isDynamicallyLegal(NewOpTy)`?

It doesn't seem to for me, and I don't know if this me doing something wrong.

There is a failure mode where:

- We are working on some op (say, `LzJoinPoint`)

- This op needs to replace its children (say, `LzJump(lz.construct(..))`)

- `LzJump` is rewritten into a STANDARD op (say, `BranchOp`)

- The ARGUMENT for `BranchOp` is a value `lz.construct(..)` of  illegal type (say, `!lz.value`)

- This type WILL BE FIXED LATER ON! at the end of lowering.

- However, the act of creating a `BranchOp` causes legalization to fail, saying that the type `!lz.value` is illegal.

- What we need is for `BranchOp` to attempt to legalize `lz.construct(..)`

- We can't let `LzJumpOp` be processed in a separate pass, because once `LzJoinPoint` is lowered, `LzJump` does not know

  where to JUMP TO!

- MLIR needs to recognize that sometimes, having a TYPE MISMATCH is not AN ERROR, but is a STAGE of LOWERING.

  Other passes like async seem to emit a `bitcast` and then assume (?) that the bitcast lowers correctly.

- More importantly, MLIR needs to recognize that during lowering, someone might want to generate more illegal

  ops that can be legalized.

God damn it, I can't even directly generate a new BB `^jpblock` for a join point and a `br ^jpblock`, because the

branch instruction may need to jump  outside a region x(

```llvm

func @foo() {

    ^jp12(...):

    ...

    case %x 

    1 -> {

    %foo = ...;

    br ^jp12(...); // ERROR! can't jump to join point that is outside the region.

    }]

}

```

- Observation 1: Okay, screw this, I'm going to disable join points as tobias suggested, and just deal with code bloat.

- Observation 2: model nested case/multi-dimensional case directly within MLIR.

We also lose out on our `case`s. Before, I could generate a `case` as:

```

%x = case v of alt0 -> { y0 }, alt1 -> { y1 }, alt2 -> { y2 }

%user = foo %x

```

when we lower this, we generate sth like:

```

^entry:

  switch v bb0 bb1 bb2

^bb0: br ^landingpad(y0)

^bb1: br ^landingpad(y1)

^bb2: br ^landingpad(y2)

^landingpad(%x):

  %user = foo %x

```

where we have a landing pad. Now imagine we have a case with a `jmp` inside it:

```

joinpoint @jp { ... }

{

    %wrench = jmp @jp

    %x = case v of alt0 -> { y0 }, alt1 -> { y1 }, alt2 -> { jmp @jp }

    %user = foo %x

}

```

This code doesn't make sense, since we're saying "continue executing code from `@jp` at both `%wrench`

and at `%x`. When we lower this, there is no way to get clean control flow, because the control flow of `case`

is no longer CONTAINED WITHIN `case`! we can't always convert a `lz.caseRet` [which is a terminator/continuation]

into a `lz.case` which returns a value, because of these jumps. IDK what this is actually called.

# May 20th, list of jumps:

- occruences of `mkJmp` in `src/`:

```

Lean/Compiler/IR/Basic.lean

288:@[export lean_ir_mk_jmp] def mkJmp (j : JoinPointId) (ys : Array Arg) : FnBody := FnBody.jmp j ys

Lean/Elab/Do.lean

302:def mkJmp (ref : Syntax) (rs : NameSet) (val : Syntax) (mkJPBody : Syntax → MacroM Code) : StateRefT (Array JPDecl) TermElabM Code := do

325:  | rs, Code.«return» ref val      => mkJmp ref rs val (fun y => pure $ Code.«return» ref y)

330:      mkJmp ref rs y (fun yFresh => do pure $ Code.action (← `(Pure.pure $yFresh)))

978:def mkJmp (ref : Syntax) (j : Name) (args : Array Syntax) : Syntax :=

987:  | Code.jmp ref j args     => pure $ mkJmp ref j args

/home/bollu/work/lean4/src$ ag "lean_ir_mk_jmp"

Lean/Compiler/IR/Basic.lean

288:@[export lean_ir_mk_jmp] def mkJmp (j : JoinPointId) (ys : Array Arg) : FnBody := FnBody.jmp j ys

library/compiler/ir.cpp

39:extern "C" object * lean_ir_mk_jmp(object * j, object * ys);

87:    return fn_body(lean_ir_mk_jmp(j.to_obj_arg(), to_array(ys)));

```

- occruences of `jmp` in `src/`

```

Lean/Compiler/IR/OldFormatMLIR.lean:--   | FnBody.jmp j ys            => "jmp " ++ format j ++ formatArray ys

Lean/Compiler/IR/OldFormatMLIR.lean:--     | FnBody.jmp j ys            => (escape "lz.jump") ++ "("  ++ formatArray ys ++ ")"

Lean/Compiler/IR/ResetReuse.lean:       then it must also live in `b` since `j` is reachable from `b` with a `jmp`.

Lean/Compiler/IR/ResetReuse.lean:           since `instr` is not a `FnBody.jmp` (it is not a terminal) nor it is a `FnBody.jdecl`. -/

Lean/Compiler/IR/RC.lean:  | b@(FnBody.jmp j xs), ctx =>

Lean/Compiler/IR/EmitC.lean:  | FnBody.jmp j xs            => emitJmp j xs

Lean/Compiler/IR/Checker.lean:  | FnBody.jmp j ys         => checkJP j *> checkArgs ys

Lean/Compiler/IR/FreeVars.lean:  | FnBody.jmp j ys         => collectJP j >> collectArgs ys

Lean/Compiler/IR/FreeVars.lean:  | FnBody.jmp j ys         => collectJP j >> collectArgs ys

Lean/Compiler/IR/FreeVars.lean:  | FnBody.jmp j ys         => visitJP w j || visitArgs w ys

Lean/Compiler/IR/NormIds.lean:  | FnBody.jmp j ys        => return FnBody.jmp (← normJP j) (← normArgs ys)

Lean/Compiler/IR/NormIds.lean:  | FnBody.jmp j ys              => FnBody.jmp j (mapArgs f ys)

Lean/Compiler/IR/Boxing.lean:  | FnBody.jmp j ys          => do

Lean/Compiler/IR/Boxing.lean:    castArgsIfNeeded ys ps fun ys => pure $ FnBody.jmp j ys

Lean/Compiler/IR/Format.lean:  | FnBody.jmp j ys            => "jmp " ++ format j ++ formatArray ys

Lean/Compiler/IR/Format.lean:    | FnBody.jmp j ys            => "jmp " ++ format j ++ formatArray ys

Lean/Compiler/IR/LiveVars.lean:   jmp block_1 x

Lean/Compiler/IR/LiveVars.lean:  | FnBody.jmp j ys         => visitArgs w ys <||> do

Lean/Compiler/IR/LiveVars.lean:   `FnBody.jmp j ys` and `j` is not local. -/

Lean/Compiler/IR/LiveVars.lean:  | FnBody.jmp j xs,         m => collectJP m j ∘ collectArgs xs

Lean/Compiler/IR/EmitMLIR.lean:  | FnBody.jmp j xs            => do

Lean/Compiler/IR/EmitMLIR.lean:      emitLn "// ERR: FnBody.jmp"

Lean/Compiler/IR/Basic.lean:  | jmp (j : JoinPointId) (ys : Array Arg)

Lean/Compiler/IR/Basic.lean:@[export lean_ir_mk_jmp] def mkJmp (j : JoinPointId) (ys : Array Arg) : FnBody := FnBody.jmp j ys

Lean/Compiler/IR/Basic.lean:  | FnBody.jmp _ _       => true

Lean/Compiler/IR/Basic.lean:  | ρ, FnBody.jmp j₁ ys₁,             FnBody.jmp j₂ ys₂             => j₁ == j₂ && aeqv ρ ys₁ ys₂

Lean/Compiler/IR/ElimDeadBranches.lean:  | FnBody.jmp j xs => do

Lean/Compiler/IR/Borrow.lean:  | FnBody.jmp j ys      => do

library/compiler/ir.cpp:extern "C" object * lean_ir_mk_jmp(object * j, object * ys);

library/compiler/ir.cpp:fn_body mk_jmp(jp_id const & j, buffer const & ys) {

library/compiler/ir.cpp:    return fn_body(lean_ir_mk_jmp(j.to_obj_arg(), to_array(ys)));

library/compiler/ir.cpp:    static bool is_jmp(expr const & e) {

library/compiler/ir.cpp:        return is_llnf_jmp(get_app_fn(e));

library/compiler/ir.cpp:    ir::fn_body visit_jmp(expr const & e) {

library/compiler/ir.cpp:        return ir::mk_jmp(to_jp_id(jp), ir_args);

library/compiler/ir.cpp:        } else if (is_jmp(e)) {

library/compiler/ir.cpp:            return visit_jmp(e);

library/compiler/llnf.cpp:static expr * g_jmp         = nullptr;

library/compiler/llnf.cpp:/* The `_jmp` instruction is a "jump" to a join point. */

library/compiler/llnf.cpp:expr mk_llnf_jmp() { return *g_jmp; }

library/compiler/llnf.cpp:bool is_llnf_jmp(expr const & e) { return e == *g_jmp; }

library/compiler/llnf.cpp:        is_llnf_jmp(e)     ||

library/compiler/llnf.cpp:                return mk_app(mk_app(mk_llnf_jmp(), fn), args);

library/compiler/llnf.cpp:    g_jmp       = new expr(mk_constant("_jmp"));

library/compiler/llnf.cpp:    mark_persistent(g_jmp->raw());

library/compiler/llnf.cpp:    delete g_jmp;

library/compiler/cse.cpp:    expr replace_target(expr const & e, expr const & target, expr const & jmp) {

library/compiler/cse.cpp:                    return some_expr(jmp);

library/compiler/cse.cpp:        buffer> target_jmp_pairs;

library/compiler/cse.cpp:                expr jmp         = mk_app(new_fvar, unit_mk);

library/compiler/cse.cpp:                            target_jmp_pairs.emplace_back(target, jmp);

library/compiler/cse.cpp:                body = replace_target(body, target, jmp);

library/compiler/cse.cpp:        if (is_let && !target_jmp_pairs.empty()) {

library/compiler/cse.cpp:                    for (pair const & p : target_jmp_pairs) {

library/compiler/llnf.h:bool is_llnf_jmp(expr const & e);

library/compiler/ir_interpreter.cpp:jp_id const & fn_body_jmp_jp(fn_body const & b) { lean_assert(fn_body_tag(b) == fn_body_kind::Jmp); return cnstr_get_ref_t(b, 0); }

library/compiler/ir_interpreter.cpp:array_ref const & fn_body_jmp_args(fn_body const & b) { lean_assert(fn_body_tag(b) == fn_body_kind::Jmp); return cnstr_get_ref_t>(b, 1); }

library/compiler/ir_interpreter.cpp:                    fn_body const & jp = *m_jp_stack[get_frame().m_jp_bp + fn_body_jmp_jp(b).get_small_value()];

library/compiler/ir_interpreter.cpp:                    lean_assert(fn_body_jdecl_params(jp).size() == fn_body_jmp_args(b).size());

library/compiler/ir_interpreter.cpp:                        var(param_var(fn_body_jdecl_params(jp)[i])) = eval_arg(fn_body_jmp_args(b)[i]);

Lean/Elab/Do.lean:  - `jmp`       a goto to a join-point

Lean/Elab/Do.lean:  - For every `jmp ref j as` in `C`, there is a `joinpoint j ps b k` and `jmp ref j as` is in `k`, and

Lean/Elab/Do.lean:  | jmp          (ref : Syntax) (jpName : Name) (args : Array Syntax)

Lean/Elab/Do.lean:    | Code.jmp _ j xs             => m!"jmp {j.simpMacroScopes} {xs.toList}"

Lean/Elab/Do.lean:    | Code.jmp _ _ _          => false

Lean/Elab/Do.lean:/- Convert `action _ e` instructions in `c` into `let y ← e; jmp _ jp (xs y)`. -/

Lean/Elab/Do.lean:      let jmpArgs := xs.map $ mkIdentFrom ref

Lean/Elab/Do.lean:      let jmpArgs := jmpArgs.push y

Lean/Elab/Do.lean:      pure $ Code.jmp ref jp jmpArgs

Lean/Elab/Do.lean:    return Code.jmp ref jp #[unit]

Lean/Elab/Do.lean:    return Code.jmp ref jp (xs.map $ mkIdentFrom ref)

Lean/Elab/Do.lean:  pure $ Code.jmp ref jp args

Lean/Elab/Do.lean:  | rs, c@(Code.jmp _ _ _)         => pure c

Lean/Elab/Do.lean:jmp jp x

Lean/Elab/Do.lean:jmp jp x

Lean/Elab/Do.lean:jmp jp y x

Lean/Elab/Do.lean:   and replace the `return _ y` with `jmp us y`

Lean/Elab/Do.lean:   and replace the `break` with `jmp us`.

Lean/Elab/Do.lean:  | Code.jmp ref j args     => pure $ mkJmp ref j args

```

# May 19th

```

src/shell/lean

add_executable(lean lean.cpp)

```

which calls `run_new_frontend`. This gets a `pair_ref`.,

which is then passed to `cout << lean::ir::emit_c(env, *main_module_name).data();`

So the call to the type checker must happen somewhere in the fronend after elaboration.

Where? Also, there is a `src/Lean/Meta/{Basic.lean,InferType.lean}` that seems to contain

everything needed to implement NbE / type checking / inference. So I don't really understand

where the LEAN kernel interfaces. I found a part of the link by looking for `is_def_eq` as

it is part of the public facing API of the type checker:

```

Lean/Environment.lean

namespace Kernel

/- Kernel API -/

/--

  Kernel isDefEq predicate. We use it mainly for debugging purposes.

  Recall that the Kernel type checker does not support metavariables.

  When implementing automation, consider using the `MetaM` methods. -/

@[extern "lean_kernel_is_def_eq"]

constant isDefEq (env : Environment) (lctx : LocalContext) (a b : Expr) : Bool

/--

  Kernel WHNF function. We use it mainly for debugging purposes.

  Recall that the Kernel type checker does not support metavariables.

  When implementing automation, consider using the `MetaM` methods. -/

@[extern "lean_kernel_whnf"]

constant whnf (env : Environment) (lctx : LocalContext) (a : Expr) : Expr

end Kernel

```

Grepping for uses of `lean_kernel_` gave me just this:

```

/home/bollu/work/lean4/src$ ag "lean_kernel_"

Lean/Environment.lean

689:@[extern "lean_kernel_is_def_eq"]

696:@[extern "lean_kernel_whnf"]

kernel/type_checker.cpp

1019:extern "C" uint8 lean_kernel_is_def_eq(lean_object * env, lean_object * lctx, lean_object * a, lean_object * b) {

1023:extern "C" lean_object * lean_kernel_whnf(lean_object * env, lean_object * lctx, lean_object * a) {

```

I now remember my mental model that I had forgotten:

Join points are awkward since we can `jmp` from anywhere, including a `case`. So LEAN can generate

code that looks like:

```llvm

joinpoint {

  // stuff that runs after a jmp

}, {

 // stuff that runs first

 %out = case x of 

   [@Foo -> {

        ...

        jmp

   }]

   [@Bar -> {

        ...

        return 42 (?)

   }]

}

```

This makes CFG generation SUPER awkward, since we generally generate a `case`

as a ladder of `if-then-else` [due to the lack of `mlir.llvm.switch` in MLIR].

So now, we need to generate the equivalent of:

```cpp

out = undef;

if (x.tag == FOO) {

  goto jp; // joinpoint

} else if (x.tag == BAR) {

  ...

  out = 42

}

jp:

 ...

```

I'm now double-checking that my mental model indeed matches reality, because

this seems very convoluted.

This is generated by running my version of `lean` () against `./test/lambdapure/simple/jmp.lean`

```llvm

// -- before converting `caseRet` -> `case`.

"lz.joinpoint"() ( { // :AFTER JUMP CODE.

^bb0(%arg4: !lz.value):  // no predecessors

  "lz.caseRet"(%1) ( {

    %2 = "lz.papExtend"(%arg3, %arg0) : (!lz.value, !lz.value) -> !lz.value

    lz.return %2 : !lz.value

  },  {

    %2 = "lz.papExtend"(%arg3, %arg0) : (!lz.value, !lz.value) -> !lz.value

    lz.return %2 : !lz.value

  },  {

    %2 = "lz.project"(%1) {value = 0 : i64} : (!lz.value) -> !lz.value

    %3 = "lz.papExtend"(%arg1, %0, %2) : (!lz.value, !lz.value, !lz.value) -> !lz.value

    lz.return %3 : !lz.value

  }) : (!lz.value) -> ()

},  {

// BEFORE JUMP CODE.

// For one, I'm unsure if my encoding of "jump" as "jump to enclosing join-point"

// is even correct, because in the LEAN encoding, join points carry labels, so maybe it's

// possible to have nested join-points that one can jump out of. I'll have to re-read

// the GHC paper to be sure...

  "lz.caseRet"(%0) ( {

    %2 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

    "lz.jump"(%2) {value = 12 : i64} : (!lz.value) -> () // <- JUMP out of case

  },  {

    %2 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

    "lz.jump"(%2) {value = 12 : i64} : (!lz.value) -> () // <- JUMP out of case

  },

```

The problem with the code above is that I can't assume that the correct way to

codegen a `case` is to (1) produce a switch-case (2) set "output" variable to

value (3) merge control flow back into landingpad BB.

```llvm

// after converting `caseRet` into `case:

"lz.joinpoint"() ( {

^bb0(%arg1: !lz.value):  // no predecessors

  %3 = "lz.case"(%2) ( {

    %4 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__1} : () -> !lz.value

    lz.return %4 : !lz.value

  },  {

    %4 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__1} : () -> !lz.value

    lz.return %4 : !lz.value

  },  {

    %4 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__2} : () -> !lz.value

    lz.return %4 : !lz.value

  }) {alt0 = @"0", alt1 = @"1", alt2 = @"2"} : (!lz.value) -> !lz.value

  lz.return %3 : !lz.value

},  {

  %3 = "lz.case"(%1) ( {

    %4 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

    // vvv case is supposed to end with a `ret`, not a `jump`.

    "lz.jump"(%4) {value = 8 : i64} : (!lz.value) -> () // <- DOES NOT MAKE SENSE

  },  {

    %4 = "lz.construct"() {dataconstructor = @"0", size = 0 : i64} : () -> !lz.value

    // vvv case is supposed to end with a `ret`, not a `jump`.

    "lz.jump"(%4) {value = 8 : i64} : (!lz.value) -> () // <- DOES NOT MAKE SENSE

  },  {

    %4 = "lz.case"(%2) ( {

      %5 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__3} : () -> !lz.value

      lz.return %5 : !lz.value

    },  {

      %5 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__3} : () -> !lz.value

      lz.return %5 : !lz.value

    },  {

      %5 = "ptr.loadglobal"() {value = @l_Expr_eval___closed__2} : () -> !lz.value

      lz.return %5 : !lz.value

    }) {alt0 = @"0", alt1 = @"1", alt2 = @"2"} : (!lz.value) -> !lz.value

    lz.return %4 : !lz.value

  }) {alt0 = @"0", alt1 = @"1", alt2 = @"2"} : (!lz.value) -> !lz.value

  lz.return %3 : !lz.value

}) : () -> ()

```

See that the above does not make sense. I need to either change a `jump` into

an instruction that generates into a `call` or something. However, then I don't

understand how to deal with local variables/scoping. For example,

this is generated from `./test/lambdapure/bench/const_fold.lean`, the first

`joinpoint` use:

```

%1 = "lz.project"(%arg1) {value = 0 : i64} : (!lz.value) -> !lz.value

%2 = "lz.project"(%arg1) {value = 1 : i64} : (!lz.value) -> !lz.value

"lz.joinpoint"() ( {

^bb0(%arg6: !lz.value):  // no predecessors

  // vvvvUSE of %2 inside the joinpoint BB. vvvv

  "lz.caseRet"(%2) ( { 

    %3 = "lz.papExtend"(%arg5, %arg0, %arg1) : (!lz.value, !lz.value, !lz.value) -> !lz.value

    lz.return %3 : !lz.value

  },  {

    %3 = "lz.project"(%2) {value = 0 : i64} : (!lz.value) -> !lz.value

    %4 = "lz.papExtend"(%arg3, %0, %1, %3) : (!lz.value, !lz.value, !lz.value, !lz.value) -> !lz.value

    lz.return %4 : !lz.value

  },  {

    %3 = "lz.papExtend"(%arg5, %arg0, %arg1) : (!lz.value, !lz.value, !lz.value) -> !lz.value

    lz.return %3 : !lz.value

  },  {

    %3 = "lz.papExtend"(%arg5, %arg0, %arg1) : (!lz.value, !lz.value, !lz.value) -> !lz.value

    lz.return %3 : !lz.value

  }) : (!lz.value) -> ()

},  {

  ...

}

```

I think the simplest way to proceed is to assume that case does not need to return a value,

and simply generate code as nested cases. This makes me somewhat disgruntled, since we 

essentially lose all SSA niceness. We are like, encoding the continuation-style directly

using nested regions or whatever?

# May 4

- Experimenting with LLVM, [trying to understand what example of mutual recursion is useful](https://gist.github.com/bollu/1da616fb3cd13501f1fdb3c44d4370be#file-main-o3-ll-L97-L109).

- Pinged Leo asking for a file that shows off the example he had in mind, so I don't solve a problem

  that's orthogonal to what the LEAN folks have.

- It's a little unclear to me what to work on next. I *think* in terms of

  completing the paper, it would be useful to either have an example where (1) we

  do something non-trivial to LEAN IR by lowering carefully, or (2) embed a DSL

  into Lean for things like `affine.for`. The latter is annoying due to LEANisms.

  I had tried generating combinators on the LEAN level. So a program like this:

```lean

inductive Vec : Type

| VecNum (l: Nat) (i: Nat): Vec

| VecAdd (v: Vec) (w: Vec) : Vec

| VecSum (v: Vec): Vec

open Vec

-- | consume all values so the optimiser doesn't remove everything.

def runvec : Vec -> IO Unit

| VecNum _ _ => IO.print "vecnum"

| VecAdd x y => runvec x *> runvec y

| VecSum v => runvec v

def vecnum (l: Nat) (i: Nat): Vec := (VecNum l i)

def vecadd (v: Vec) (w: Vec): Vec := (VecAdd v w)

def vecsum (v: Vec) : Vec := (VecSum v)

def main (xs: List String) : IO Unit := do

  runvec (vecsum (vecadd (vecnum 10 41) (vecnum 10 1)))

```

This generates lambdapure where all the "computation" of building these

combinators happens in the init functions, and main simply prints the _already

computed_ initialized data. This is a problem, since the "building the

combinators" (which LEAN decides to precompute) is in fact that real

(description of) the computation! For reference, the lambdapure is:

```

def main._closed_1 : obj :=

  let x_1 : obj := 10;

  let x_2 : obj := 41;

  let x_3 : obj := ctor_0[Vec.VecNum] x_1 x_2;

  ret x_3

def main._closed_2 : obj :=

  let x_1 : obj := 10;

  let x_2 : obj := 1;

  let x_3 : obj := ctor_0[Vec.VecNum] x_1 x_2;

  ret x_3

def main._closed_3 : obj :=

  let x_1 : obj := main._closed_1;

  let x_2 : obj := main._closed_2;

  let x_3 : obj := ctor_1[Vec.VecAdd] x_1 x_2;

  ret x_3

def main._closed_4 : obj :=

  let x_1 : obj := main._closed_3;

  let x_2 : obj := ctor_2[Vec.VecSum] x_1;

  ret x_2

def main (x_1 : obj) (x_2 : obj) : obj :=

  let x_3 : obj := main._closed_4;

  let x_4 : obj := runvec x_3 x_2;

  ret x_4// Lean compiler output

```

which on lowering naively, becomes something like:

```cpp

lean_object* _lean_main(lean_object* x_1, lean_object* x_2) {

_start:

{

lean_object* x_3; lean_object* x_4;

x_3 = l_main___closed__4;

x_4 = l_runvec(x_3, x_2);

return x_4;

}

}

```

So the interesting part of "build the combinator" (which for us, is the real

description of the computation itself) is relegated to initialization. 

I asked Leo if there

an easy way to generate the code differently, where LEAN emits actual calls for

the combinators. He replied tersely, saying that

> you can disable the extraction of closed terms using the option.

```

set_option compiler.extract_closed false

```

This comment was perfect; If I enable the option, I generate the following MLIR:

```llvm

  func @main_lean_custom_entrypoint_hack(%arg0: !lz.value, %arg1: !lz.value) -> !lz.value {

    %0 = "lz.int"() {value = 10 : i64} : () -> !lz.value

    %1 = "lz.int"() {value = 41 : i64} : () -> !lz.value

    %2 = "lz.construct"(%0, %1) {dataconstructor = @"0", name = "Vec.VecNum", size = 2 : i64} : (!lz.value, !lz.value) -> !lz.value

    %3 = "lz.int"() {value = 1 : i64} : () -> !lz.value

    %4 = "lz.construct"(%0, %3) {dataconstructor = @"0", name = "Vec.VecNum", size = 2 : i64} : (!lz.value, !lz.value) -> !lz.value

    %5 = "lz.construct"(%2, %4) {dataconstructor = @"1", name = "Vec.VecAdd", size = 2 : i64} : (!lz.value, !lz.value) -> !lz.value

    %6 = "lz.construct"(%5) {dataconstructor = @"2", name = "Vec.VecSum", size = 1 : i64} : (!lz.value) -> !lz.value

    %7 = call @l_runvec(%6, %arg1) : (!lz.value, !lz.value) -> !lz.value

    lz.return %7 : !lz.value

  }

```

where we can clearly see that the data structure representing the computation is built, followed up by a call to `l_runvec.`

This is IR that I can optimize!

Also, I've been reading through the LEAN sources. It seems their parsing framework is very extensible. In particular,

[`src/Parser/Basic.lean](https://github.com/leanprover/lean4/blob/master/src/Lean/Parser/Basic.lean#L34-L41) says:

> * flexibility: Lean's grammar is complex and includes indentation and other whitespace sensitivity. It should be

>   possible to introduce such custom "tweaks" locally without having to adjust the fundamental parsing approach.

> * extensibility: Lean's grammar can be extended on the fly within a Lean file, and with Lean 4 we want to extend this

>   to cover embedding domain-specific languages that may look nothing like Lean, down to using a separate set of tokens.

> Given these constraints, we decided to implement a combinatoric, non-monadic,

> lexer-less, memoizing recursive-descent parser. Using combinators instead of

> some more formal and introspectible grammar representation ensures ultimate

> flexibility as well as efficient extensibility: there is (almost) no

> pre-processing necessary when extending the grammar with a new parser.

Their `do`-notation as well as all function call related features such as implicit arguments are implemented

directly in the elaborator. I almost wonder if it's possible to embed generic MLIR into LEAN directly, since the MLIR

grammar is quite straightforward.

# May 3

- Lean has a `#if defined(LEAN_LLVM)`. What for? How do I make it better? `:)`.

- Who calls the type checker? Start at `shell/lean.cpp`. No leads

- Look for `infer_type`. Find it in `library/compiler/lcnf.cpp`. Now find users of `to_lcnf` (lambda-core-normal-form)?

```

/home/bollu/work/lean4/src$ ag "to_lcnf"

library/compiler/compiler.cpp

202:    ds = apply(to_lcnf, env, ds);

```

```

library/compiler/compiler.cpp

162 environment compile(environment const & env, options const & opts, names cs) {

202    ds = apply(to_lcnf, env, ds);

203    ds = apply(find_jp, env, ds);

```

and `compile` is called at:

```

library/compiler/compiler.cpp

extern "C" object * lean_compile_decl(object * env, object * opts, object * decl) {

    return catch_kernel_exceptions([&]() {

            return compile(environment(env), options(opts, true), get_decl_names_for_code_gen(declaration(decl, true)));

        });

}

```

I am completely unsure as to how this relates to the actual `shell`?

It's also in the header:

```

library/compiler/compiler.h

environment compile(environment const & env, options const & opts, names cs);

inline environment compile(environment const & env, options const & opts, name const & c) {

    return compile(env, opts, names(c));

}

```

so maybe shell calls it?

```

shell/lean.cpp

contents = read_file(mod_fn);

main_module_name = module_name_of_file(mod_fn, root_dir, /* optional */ !olean_fn && !c_output);

if (!main_module_name)

    main_module_name = name("_stdin");

pair_ref r = run_new_frontend(contents, opts, mod_fn, *main_module_name);

if (run && ok) {

    uint32 ret = ir::run_main(env, opts, argc - optind, argv + optind);

    // environment_free_regions(std::move(env));

    return ret;

}

```

- `run_new_frontent` calls into `Lean/Elab/Frontend.lean`, the entrypoint for frontend related shenanigans.

```

shell/lean.cpp

326:extern "C" object * lean_run_frontend(object * input, object * opts, object * filename, object * main_module_name, object * w);

329:        lean_run_frontend(mk_string(input), opts.to_obj_arg(), mk_string(file_name), main_module_name.to_obj_arg(), io_mk_world()));

Lean/Elab/Frontend.lean

91:@[export lean_run_frontend]

```

Righto, so shell calls `lean_run_frontend` which somehow magically calls `compile`. Let's see the exports!

```

/home/bollu/work/lean4$ ag "lean_compile_decl"

...

stage0/src/Lean/Environment.lean

132:@[extern "lean_compile_decl"]

src/Lean/Environment.lean

133:@[extern "lean_compile_decl"]

src/library/compiler/compiler.cpp

261:extern "C" object * lean_compile_decl(object * env, object * opts, object * decl) {

stage0/src/library/compiler/compiler.cpp

262:extern "C" object * lean_compile_decl(object * env, object * opts, object * decl) {

```

Look at what this `lean_compile_decl` looks like in `src/Lean/Environment.lean`:

```

src/lean/Environment.lean

namespace Environment

/- Type check given declaration and add it to the environment -/

@[extern "lean_add_decl"]

constant addDecl (env : Environment) (decl : @& Declaration) : Except KernelException Environment

/- Compile the given declaration, it assumes the declaration has already been added to the environment using `addDecl`. -/

@[extern "lean_compile_decl"]

constant compileDecl (env : Environment) (opt : @& Options) (decl : @& Declaration) : Except KernelException Environment

```

Great, now let's go see who calls compileDecl.

```

/home/bollu/work/lean4/src$ ag "compileDecl" --type-add "lean:*.lean" -t lean

Lean/Environment.lean

134:constant compileDecl (env : Environment) (opt : @& Options) (decl : @& Declaration) : Except KernelException Environment

138:  compileDecl env opt decl

Lean/MonadEnv.lean

133:def compileDecl [Monad m] [MonadEnv m] [MonadError m] [MonadOptions m] (decl : Declaration) : m Unit := do

134:  match (← getEnv).compileDecl (← getOptions) decl with

142:  compileDecl decl

Lean/Elab/Declaration.lean

86:        compileDecl decl

Lean/Elab/BuiltinNotation.lean

101:  compileDecl decl

Lean/Elab/PreDefinition/Basic.lean

113:      compileDecl decl

134:    compileDecl decl

Lean/Elab/Binders.lean

71:    compileDecl decl

Lean/Meta/Match/Match.lean

600:      compileDecl decl

Lean/Meta/Closure.lean

364:    compileDecl decl

```

So it seems like the users are in `Lean/Elab` (sensible) and `Lean/Meta` (unsure what this is).

Here's what `Lean/Meta/Basic.lean` says:

```

src/Lean/Meta/Basic.lean

/-

This module provides four (mutually dependent) goodies that are needed for building the elaborator and tactic frameworks.

1- Weak head normal form computation with support for metavariables and transparency modes.

2- Definitionally equality checking with support for metavariables (aka unification modulo definitional equality).

3- Type inference.

4- Type class resolution.

They are packed into the MetaM monad.

-/

```

# April 12

Try and use passes:

```

  --mlir-pretty-debuginfo                               - Print pretty debug info in MLIR output

  --mlir-print-debug-counter                            - Print out debug counter information after all counters have been accumulated

  --mlir-print-debuginfo                                - Print debug info in MLIR output

```

# April 4

- Measure how much "flattening the IR" by lifting stuff out of case improves cyclomatic complexity

# April 2

Quick update: I can now roundtrip the simplest of LEAN programs using the LEAN

runtime. We generate MLIR (`lz`, though not a lot of it because the example has

no, say, data constructors), lower to `mlir-llvm`, convert to `llvm`, then link

against the LEAN runtime + [eldritch horrors](https://github.com/bollu/lz/blob/master/lean-linking-incantations/lean-shell.c)

to produce an executable. 

1. We have a [`lean-linking-incantations/library/library.c/o`](https://github.com/bollu/lz/blob/master/lean-linking-incantations/lib-includes/library.c).

   This has separate copy of [`include/lean/lean.h`](https://github.com/bollu/lz/blob/master/lean-linking-incantations/lib-includes/lean/lean.h)

  **which has be un-`static`d** so we still have symbols in the object files.

  Compare the ORIGINAL [`include/lean/lean.h`](https://github.com/leanprover/lean4/blob/master/stage0/src/include/lean/lean.h#L553-L562)

  where all declarations are `static`, and thus won't be present in the final `lean-shell.o` we build.

1. We have a 

  [`lean_shell.c/o`](https://github.com/bollu/lz/blob/master/lean-linking-incantations/lean-shell.c)

  that contains the `main()`, along with the LEAN C preamble that does not change across files.

  This is compiled into a `lean-shell.o`. 

1. This `lean-shell.o` defines two entrypoints:

  `main_lean_custom_entrypoint_hack(lean_io_mk_world())`, and

  `init_lean_custom_entrypoint_hack(lean_io_mk_world())`.

   `init_...` is used to perform initialisation of constants, 

   while `main_...` is the entrypoint.

1. We generate MLIR which that does the "obvious" things; 

  (1) Defines forward declarations for all the LEAN runtime functions that it uses, 

  (2) generates a `init_lean_custom....` and dumps all initialization code there, 

  (3) generates a `main_lean_custom...` and dumps all the runtime code there. 

1. From this point, we are up and running. 

   We generate MLIR, convert to `mlir-llvm`,

   translate our way to `llvm`, use `llc` to generate an object file.

   We link this object file against the lean runtime plus our wrapper entry

   point from `lean-shell.o`. This produces an executable that runs :) 

```

// lz: a4f28b4

$ /home/bollu/work/lz/test/lambdapure/simple$ cat run-lean.sh

#!/usr/bin/env bash

set -e

set -o xtrace

lean $1 -c exe.c 2>&1 | \

        hask-opt | tee exe.mlir | \

        hask-opt --lean-lower --ptr-lower | \

        mlir-translate --mlir-to-llvmir | tee exe.ll  | llc -filetype=obj -o exe.o

c++ -D LEAN_MULTI_THREAD -I/home/bollu/work/lean4/build/stage1/include \

    exe.o \

    /home/bollu/work/lz/lean-linking-incantations/lean-shell.o \

    /home/bollu/work/lz/lean-linking-incantations/lib-includes/library.o  \

    -no-pie -Wl,--start-group -lleancpp -lInit -lStd -lLean -Wl,--end-group \

    -L/home/bollu/work/lean4/build/stage1/lib/lean -lgmp -ldl -pthread \

    -Wno-unused-command-line-argument -o exe.out

./exe.out

```

We can run the simplest lean program:

```

// main-print.lean

set_option trace.compiler.ir.init true

def main (xs: List String) : IO Unit := IO.println (7.9)

```

to produce the output:

```

/home/bollu/work/lz/test/lambdapure/simple$ ./run-lean.sh main-print.lean 2>/dev/null

7.900000

```

So, this is good, since we have a solid foundation of talking to the runtime

that I can now easily extend. I don't need anything special now, I should be

able to mechanically translate all of the rest of the LEAN ops into vanilla

MLIR. The risky part of the process seems to work.

- Teach `LEAN` to work with out-of-order definitions?

# April 1

- I understand the problem. The definitions in `lean/lean.h` are all defined IN THE HEADER FILE,

  so I need to recompile this into a shared object which I need to link in. call it `libleanheader` or something.

  It makes sense why they do this (inlined code for performance). Not sure if it's worth the trouble.

  Will need to see if LEAN optimises out `Nat.sub(x, x)` or not!

# March 30

I do need to know the differences between types at the LLVM level.

So for example, if there is code

```

%c0 = lz.int(0): () -> !lz.value

%out = std.call @Nat.decEq(%c0, %c0) : (!lz.value, !lz.value) -> !lz.value

```

on lowering, I do need the `lz.int(0)` to become an `llvm.i64`. But then

at the call site `@Nat.decEq`, I don't know what type it should be!

This is quite untenable.

Will learn how to retain type info from lambdapure -> MLIR

# March 29

- Fixup the `run-optimised.sh` code.  

- `affine-loop-fusion` stopped working??

```

$ hask-opt lower-lz-and-affine.mlir --lz-worker-wrapper --affine-loop-fusion

func @main() -> i64 {

  %c1024 = constant 1024 : index

  %c0_i64 = constant 0 : i64

  %0 = memref.alloc(%c1024) : memref

  affine.for %arg0 = 0 to 1024 {

    %2 = index_cast %arg0 : index to i64

    affine.store %2, %0[%arg0] : memref

  }

  %1 = affine.for %arg0 = 0 to 1024 iter_args(%arg1 = %c0_i64) -> (i64) {

    %2 = affine.load %0[%arg0] : memref

    %3 = addi %arg1, %2 : i64

    affine.yield %3 : i64

  }

  call @printInt(%1) : (i64) -> ()

  return %c0_i64 : i64

}

```

I don't understand why this is not optimised away. Makes no sense.

#### Why tests in `lambdapure/bench` break now:

```

********************

Failed Tests (4):

  HASK_OPT :: lambdapure/bench/const_fold.lean

  HASK_OPT :: lambdapure/bench/deriv.lean

  HASK_OPT :: lambdapure/bench/qsort.lean

  HASK_OPT :: lambdapure/bench/rbmap_checkpoint.lean

Testing Time: 2.38s

  Passed    : 4

  Unresolved: 9

  Failed    : 4

```

##### `const_fold.lean`:

```

:163:55: error: duplicate key 'alt3' in dictionary attribute

    "lz.return"(%x_8): (!lz.value) -> ()}){alt3=@"3", alt3=@default}:(!lz.value)->()

```

##### `deriv.lean`:

```

--

:1939:78: error: duplicate key 'alt5' in dictionary attribute

    "lz.return"(%x_10): (!lz.value) -> ()}){alt0=@"0", alt1=@"1", alt5=@"5", alt5=@default}:(!lz.value)->()

```

##### `qsort.lean`:  

```

--interpreting function:|UInt32.ofNat|--

:16:1: error: incorrect number of arguments to region. Given: |1|.Expected: |0| 

func private@"UInt32.ofNat"(!lz.value)->(!lz.value)

^

hask-opt: ../hask-opt/Interpreter.cpp:941: llvm::Optional Interpreter::interpretRegion(mlir::Region&, llvm::ArrayRef, Env): Assertion `false && "unable to interpret region"' failed.

```

##### `rbmap_checkpoint.lean`:

```

:72:12: error: unregistered operation 'lz.sproj' found in dialect ('lz') that does not allow unknown operations

    %x_6 = "lz.sproj"(%x_1){ix=3, offset=0}:(!lz.value)->(!lz.value)

           ^

```

##### `render.lean`

```

:1008:3: error: unregistered operation 'lz.sset' found in dialect ('lz') that does not allow unknown operations

  "lz.sset"(%x_71,%x_70){ix=7, offset=0}:(!lz.value, !lz.value)->()

  ^

```

# March 26

```

analyzing constructor: |

analyzing constructor: |%%51 =  = ""lzl.zc.ocnonssttrruucctt"("()) { {dataconstructordataconstructor =  = @"@"00""}} :  : (() -> ) -> !!lzlz.value.value|

```

- Something really funky is going on, either with memory or parallelism!.

# March 25

- Need to fix `lambdapure/simple/jmp.lean` that makes use of default alternative case.

- `ClosedTermCache` keeps tracks of names.

terms for render.lean

```

func private@"Float.add"(!lz.value, !lz.value)->(!lz.value)

func private@"Float.div"(!lz.value, !lz.value)->(!lz.value)

func private@"Nat.decLe"(!lz.value, !lz.value)->(!lz.value)

func private@"IO.Prim.Handle.putStr"(!lz.value, !lz.value, !lz.value)->(!lz.value)

func private@"IO.Prim.fopenFlags"(!lz.value, !lz.value)->(!lz.value)

func private@"IO.Prim.Handle.mk"(!lz.value, !lz.value, !lz.value)->(!lz.value)

func private@"USize.decLt"(!lz.value, !lz.value)->(!lz.value)

```

- Move the code gen to be after some simplifications:

```

private def compileAux (decls : Array Decl) : CompilerM Unit := do

  log (LogEntry.message "// compileAux")

  -- logDecls `init decls

  logPreamble (LogEntry.message mlirPreamble)

  -- logDeclsUnconditional decls

  checkDecls decls

  let decls ← elimDeadBranches decls

  logDecls `elim_dead_branches decls

  let decls := decls.map Decl.pushProj

  logDecls `push_proj decls

  --vvvvvv DISABLE THIS

  -- let decls := decls.map Decl.insertResetReuse

  -- logDecls `reset_reuse decls

  let decls := decls.map Decl.elimDead

  logDecls `elim_dead decls

  let decls := decls.map Decl.simpCase

  logDecls `simp_case decls

  let decls := decls.map Decl.normalizeIds

  logDeclsUnconditional decls <- CODEGEN HERE 

```

# March 24

- Got simple examples to work.

- TODO: get `render.lean` to work!

- Next: immutable beans tomorrow.

# March 23: Lean compiler entrypoint

```

src/shell/lean.cpp

int main() { ...

```

```

/home/bollu/work/lean4/src$ ag initialize_compiler

library/compiler/compiler.cpp

267:void initialize_compiler() {

```

```

/home/bollu/work/lean4/src$ vim library/compiler/extern_attribute.cpp 

/home/bollu/work/lean4/src$ ag "lean_get_extern_attr_data"

Lean/Compiler/ExternAttr.lean

75:@[export lean_get_extern_attr_data]

library/compiler/extern_attribute.cpp

22:extern "C" object* lean_get_extern_attr_data(object* env, object* n);

25:    return to_optional(lean_get_extern_attr_data(env.to_obj_arg(), fn.to_obj_arg()));

```

# March 19th

MLIR strangeness of the day: Walking over uses of an argument does not give me the first use!

```

/home/bollu/work/lz/test/lambdapure/simple$ hask-opt error.mlir --lz-lazify

forcifying user: |%6 = call @Nat_dot_decEq(%arg0, %4) : (!lz.thunk, !lz.value) -> !lz.value

        after: |%6 = call @Nat_dot_decEq(%5, %4) : (!lz.value, !lz.value) -> !lz.value

vvvvv:module:vvvvv

module  {

  func private @Nat_dot_sub(!lz.value, !lz.value) -> !lz.value

  func private @Nat_dot_decEq(!lz.value, !lz.value) -> !lz.value

  func @ackermann_dot_match_1_dot__rarg(%arg0: !lz.thunk) {

    %0 = "lz.int"() {value = 0 : i64} : () -> !lz.value

    %1 = call @Nat_dot_decEq(%arg0, %0) : (!lz.thunk, !lz.value) -> !lz.value

    %2 = "lz.tagget"(%1) : (!lz.value) -> i64

    %3 = "lz.case"(%2) ( {

      %4 = "lz.int"() {value = 1 : i64} : () -> !lz.value

      %5 = "lz.force"(%arg0) : (!lz.thunk) -> !lz.value

      %6 = call @Nat_dot_decEq(%5, %4) : (!lz.value, !lz.value) -> !lz.value

      lz.return %6 : !lz.value

    }) {alt0 = @"0"} : (i64) -> !lz.value

    return %3 : !lz.value

  }

}

^^^^^^

error.mlir:6:10: error: 'std.call' op operand type mismatch: expected operand

type '!lz.value', but provided '!lz.thunk' for operand number 0

    %1 = call @Nat_dot_decEq(%arg0, %0) : (!lz.value, !lz.value) -> !lz.value

         ^

error.mlir:6:10: note: see current operation:

%1 = "std.call"(%arg0, %0) {callee = @Nat_dot_decEq} : (!lz.thunk, !lz.value) -> !lz.value

```

- See that it never gave me `forcifying user: |%1 = ...|` even though it clearly uses `%arg0`! What's going on?!

- Fix worker/wrapper bugs so that I can worker/wrapper the example code at "ackermann-ought-to-be-output.mlir"

   

# March 18th

- So (1) I was lowering indirect calls to `lz.ap` that is completely wrong, because it doesn't

  actually makes the call, just creates a thunk. The "problem" is that since the dialect is type erased,

  I need to use an `lz.ap` :(. So I guess I do need to revive the `lz.apeager` after all.

  I can't use `std.indirectcall` because it needs me to typecast the `func : !lz.value`

  into `func: (!lz.value, !lz.value) -> !lz.value` which is just as stupid. Matt avoided

  this by having only one type in his dialect, and a separate `ApEagerOp` as I am cornered

  into doing.

- Note to self: DO NOT BUY INTO MLIR DESIGN PRINCIPLES. Just do whatever is convenient, as attempting

  to appease MLIR is simply pain.

- Can't believe I burned half an hour on this; the correct way to use pass options is to say

  `--lz-interpret='qwerty=foo'` where `qwerty` is the option declared in `lz-interpret`.

- I need to fix the lexer and parser eventually. Some programs can't be run because they don't lex properly.

- [ ] `binarytrees.lean` needs `Task` to be implement to be able to run. Will look into this later, maybe use `async` 

  dialect? unclear!

- `deriv.lean` needs the parser to be fixed? (`let x_18 : obj := prec(_)._closed_3;`)  

- `qsort.lean` needs the parser to be fixed (`let x_1 : obj := "term↑__1";`) Can't parse uparrow (`↑`) right now.

- `rbmap_checkpointlean` uses a new kind of projection: `let x_6 : u8 := sproj[3, 0] x_1;` I don't know the semantics

   of this, sadly.

  

- `unionfind.lean` needs me to implement `String_dot_instInhabitedString`

- From the file `loop.lean`:

```

def mkRandomArray : Nat -> Array Nat -> Array Nat

...

| i+1, as => mkRandomArray i (as.push (i+1))

```

generates the MLIR:

```mlir

func @mkRandomArray(%arg0: !lz.value, %arg1: !lz.value) -> !lz.value {

  ...

  %7 = "lz.erasedvalue"() : () -> !lz.value -- | what is this a proof *of*?

  %8 = call @Array_dot_push(%7, %arg1, %6) : (!lz.value, !lz.value, !lz.value) -> !lz.value

```

So the call to `push` generates an erased value. This means I shouldn't *crash* on erased

values. I should, well, *erase* them when I code generate useful code. Strange.

Similarly for `get`:

```

func @sumAux(%arg0: !lz.value, %arg1: !lz.value) -> !lz.value {

  ...

  %5 = call @Nat_dot_sub(%arg0, %4) : (!lz.value, !lz.value) -> !lz.value

  %6 = call @instInhabitedNat() : () -> !lz.value

  %7 = "lz.erasedvalue"() : () -> !lz.value

  %8 = call @Array_dot_get_bang_(%7, %6, %arg1, %5) : (!lz.value, !lz.value, !lz.value, !lz.value) -> !lz.value

```

The first two arguments `%7, %6` are proof terms of stuff being inhabited. The `%arg1` is the array, and `%5`

is the index.

#### Parser bug

I'm a little depressed. The lexer/parser do something incorrect. They generate the AST:

```

test (object -> object -> -> object) x_1 x_2 

  Let object x_3 = 2

  Let int x_4 = Call Nat_dot_decLt x_1 x_3

  Case  on x_4 : 

      Let object x_5 = Call mkNodes x_1 x_2

      Case  on x_5 : 

          Let object x_6 = Proj[0] x_5

          Let object x_7 = Proj[1] x_5

          Case  on x_6 : 

              Let object x_8 = Proj[0] x_6

              Let object x_9 = Ctor 0 x_8

              Let object x_10 = Ctor 0 x_9 x_7

              return x_10

                Let object x_11 = 50000

                Let object x_12 = Call mergePack x_11 x_7

                Case  on x_12 : 

                    Let object x_13 = Proj[0] x_12

                    Let object x_14 = Proj[1] x_12

                    Case  on x_13 : 

                        Let object x_15 = Proj[0] x_13

                        Let object x_16 = Ctor 0 x_15

                        Let object x_17 = Ctor 0 x_16 x_14

                        return x_17

                          Let object x_18 = 10000

                          Let object x_19 = Call mergePack x_18 x_14

                          Case  on x_19 : 

                              Let object x_20 = Proj[0] x_19

                              Let object x_21 = Proj[1] x_19

                              Case  on x_20 : 

                                  Let object x_22 = Proj[0] x_20

                                  Let object x_23 = Ctor 0 x_22

                                  Let object x_24 = Ctor 0 x_23 x_21

                                  return x_24

                                    Let object x_25 = 5000

                                    Let object x_26 = Call mergePack x_25 x_21

                                    Case  on x_26 : 

                                        Let object x_27 = Proj[0] x_26

                                        Let object x_28 = Proj[1] x_26

                                        Case  on x_27 : 

                                            Let object x_29 = Proj[0] x_27

                                            Let object x_30 = Ctor 0 x_29

                                            Let object x_31 = Ctor 0 x_30 x_28

                                            return x_31

                                              Let object x_32 = 1000

                                              Let object x_33 = Call mergePack x_32 x_28

                                              Case  on x_33 : 

                                                  Let object x_34 = Proj[0] x_33

                                                  Let object x_35 = Proj[1] x_33

                                                  Case  on x_34 : 

                                                      Let object x_36 = Proj[0] x_34

                                                      Let object x_37 = Ctor 0 x_36

                                                      Let object x_38 = Ctor 0 x_37 x_35

                                                      return x_38

                                                        Let object x_39 = Call numEqs x_35

                                                        return x_39

                                                          Let object x_40 = Call test_dot__closed_2

                                                          Let object x_41 = Ctor 0 x_40 x_2

                                                          return x_41

```

which has no matching "case or" branches (these are only the happy paths).

The real AST is:

```

def test (x_1 : obj) (x_2 : obj) : obj :=

  let x_3 : obj := 2;

  let x_4 : u8 := Nat.decLt x_1 x_3;

  case x_4 : obj of

  Bool.false →

    let x_5 : obj := mkNodes x_1 x_2;

    case x_5 : obj of

    Prod.mk →

      let x_6 : obj := proj[0] x_5;

      let x_7 : obj := proj[1] x_5;

      case x_6 : obj of

      Except.error →

        let x_8 : obj := proj[0] x_6;

        let x_9 : obj := ctor_0[Except.error] x_8;

        let x_10 : obj := ctor_0[Prod.mk] x_9 x_7;

        ret x_10

      Except.ok →

        let x_11 : obj := 50000;

        let x_12 : obj := mergePack x_11 x_7;

        case x_12 : obj of

        Prod.mk →

          let x_13 : obj := proj[0] x_12;

          let x_14 : obj := proj[1] x_12;

          case x_13 : obj of

          Except.error →

            let x_15 : obj := proj[0] x_13;

            let x_16 : obj := ctor_0[Except.error] x_15;

            let x_17 : obj := ctor_0[Prod.mk] x_16 x_14;

            ret x_17

          Except.ok →

            let x_18 : obj := 10000;

            let x_19 : obj := mergePack x_18 x_14;

            case x_19 : obj of

            Prod.mk →

              let x_20 : obj := proj[0] x_19;

              let x_21 : obj := proj[1] x_19;

              case x_20 : obj of

              Except.error →

                let x_22 : obj := proj[0] x_20;

                let x_23 : obj := ctor_0[Except.error] x_22;

                let x_24 : obj := ctor_0[Prod.mk] x_23 x_21;

                ret x_24

              Except.ok →

                let x_25 : obj := 5000;

                let x_26 : obj := mergePack x_25 x_21;

                case x_26 : obj of

                Prod.mk →

                  let x_27 : obj := proj[0] x_26;

                  let x_28 : obj := proj[1] x_26;

                  case x_27 : obj of

                  Except.error →

                    let x_29 : obj := proj[0] x_27;

                    let x_30 : obj := ctor_0[Except.error] x_29;

                    let x_31 : obj := ctor_0[Prod.mk] x_30 x_28;

                    ret x_31

                  Except.ok →

                    let x_32 : obj := 1000;

                    let x_33 : obj := mergePack x_32 x_28;

                    case x_33 : obj of

                    Prod.mk →

                      let x_34 : obj := proj[0] x_33;

                      let x_35 : obj := proj[1] x_33;

                      case x_34 : obj of

                      Except.error →

                        let x_36 : obj := proj[0] x_34;

                        let x_37 : obj := ctor_0[Except.error] x_36;

                        let x_38 : obj := ctor_0[Prod.mk] x_37 x_35;

                        ret x_38

                      Except.ok →

                        let x_39 : obj := numEqs x_35;

                        ret x_39

  Bool.true →

    let x_40 : obj := test._closed_2;

    let x_41 : obj := ctor_0[Prod.mk] x_40 x_2;

    ret x_41

```

That is, it has both a `Bool.false -> ` branch and a `Bool.true ->` branch (which

is missing). The real code is:

```

def test (x_1 : obj) (x_2 : obj) : obj :=

  let x_3 : obj := 2;

  let x_4 : u8 := Nat.decLt x_1 x_3;

  case x_4 : obj of

  Bool.false →

    let x_5 : obj := mkNodes x_1 x_2;

    case x_5 : obj of

    ...

  Bool.true → <= MISSING

    let x_40 : obj := test._closed_2;

    let x_41 : obj := ctor_0[Prod.mk] x_40 x_2;

    ret x_41

```

# March 16th

```

// interesting: semantics of jump is determined by "enclosing block",

// something that regions help make precise!

if (auto jumpop = dyn_cast(op)) { }

```

- Wow WTF, I have no idea how to generate a `lz.pap` in a "real function".

  It seems to only show up in proof erased terms?!

# March 15th

```

-- encoding of OK

let x_4 : obj := ctor_0[EStateM.Result.ok] x_3 x_2;

```

- Todo for tomorrow: get larger problem sizes working in `const_fold.lean`.

- Add support for arrays in `quicksort.lean`.

- Add checks for `IncOp/DecOp` by interpreting the immutable beans rewrites.

# March 12th

- `const_fold.lean` fails by `prec(_).`

- `deriv.lean` fails by `let x_18 : obj := prec(_)._closed_3;`

- `qsort.lean` fails at  ` let x_1 : obj := "term↑__1"`. It dies at the "up arrow".

   Need to make lexer robust.

- `rbmap_checkpoint.lean` fails at `error: expected command, but found term;

  this error may be due to parsing precedence levels, consider parenthesizing

  the term` (that is, the file is corrupt?)

- Get `unionfind.lean` running

- Generate `.c` using `lean --c=foo.c`

# Match 11th, 2021

- Another use case a `%x = mlir.sese { ... }` instruction. You can't have a child region

  jump to a BB of its parent. What you can have is a `regioncall %x` instruction to "call"

  the region `%x`

We pass more tests now:

```

********************

Failed Tests (4):

  HASK_OPT :: lambdapure/bench/const_fold.lean

  HASK_OPT :: lambdapure/bench/deriv.lean

  HASK_OPT :: lambdapure/bench/qsort.lean

  HASK_OPT :: lambdapure/bench/rbmap_checkpoint.lean

Testing Time: 5.72s

  Passed    : 13

```

- Tomorrow: bring the interpreter online, for both the "regular ops", and the `Inc/Dec`

  reference counting ops.

# March 9th, 2021

- Formatting is defined in `Lean/Compiler/IR/Format.lean`:

```

Lean/Compiler/IR/Format.lean

private def formatExpr : Expr → Format

  | Expr.ctor i ys      => format i ++ formatArray ys

  | Expr.reset n x      => "reset[" ++ format n ++ "] " ++ format x

  | Expr.reuse x i u ys => "reuse" ++ (if u then "!" else "") ++ " " ++ format x ++ " in " ++ format i ++ formatArray ys

  | Expr.proj i x       => "proj[" ++ format i ++ "] " ++ format x

  | Expr.uproj i x      => "uproj[" ++ format i ++ "] " ++ format x

  | Expr.sproj n o x    => "sproj[" ++ format n ++ ", " ++ format o ++ "] " ++ format x

  | Expr.fap c ys       => format c ++ formatArray ys

  | Expr.pap c ys       => "pap " ++ format c ++ formatArray ys

  | Expr.ap x ys        => "app " ++ format x ++ formatArray ys

  | Expr.box _ x        => "box " ++ format x

  | Expr.unbox x        => "unbox " ++ format x

  | Expr.lit v          => format v

  | Expr.isShared x     => "isShared " ++ format x

  | Expr.isTaggedPtr x  => "isTaggedPtr " ++ format x

```

- How `logDecls` works: it creates a `step` that is `format`d. Look for `format` of a `Decl`

```

inductive LogEntry where

  | step (cls : Name) (decls : Array Decl)

  | message (msg : Format)

namespace LogEntry

protected def fmt : LogEntry → Format

  | step cls decls => Format.bracket "[" (format cls) "]" ++ decls.foldl (fun fmt decl => fmt ++ Format.line ++ format decl) Format.nil

  | message msg    => msg

```

- Where logging is used: `Lean/Compiler/IR.lean`

```

Lean/Compiler/IR.lean

28:  logDecls `init decls

```

- Where logging is defined: `CompilerM.lean`:

```

Lean/Compiler/IR/CompilerM.lean:

def tracePrefixOptionName := `trace.compiler.ir

private def isLogEnabledFor (opts : Options) (optName : Name) : Bool :=

  match opts.find optName with

  | some (DataValue.ofBool v) => v

  | other => opts.getBool tracePrefixOptionName

private def logDeclsAux (optName : Name) (cls : Name) (decls : Array Decl) : CompilerM Unit := do

  let opts ← read

  if isLogEnabledFor opts optName then

    log (LogEntry.step cls decls)

```

- `src/Lean/Compiler/IR/EmitC.lean`

I believe the `(_).closed_3` comes from cached closed terms:

- `/home/bollu/work/lean4/src$ git grep "get_closed_term_name"`

- `library/compiler/closed_term_cache.cpp:extern "C" object * lean_get_closed_term_name(object * env, object * e);`

- `library/compiler/closed_term_cache.cpp:    return to_optional(lean_get_closed_term_name(env.to_obj_arg(), e.to_obj_arg()));`

- `library/compiler/closed_term_cache.h:optional get_closed_term_name(environment const & env, expr const & e);`

- `library/compiler/extract_closed.cpp:        if (optional c = get_closed_term_name(m_env, e)) {`

- `library/compiler/lambda_lifting.cpp:        if (optional opt_new_fn = get_closed_term_name(m_env, e)) {`

# March 5th, 2021

Need to add support for join points:

```

block_14 (x_24 : obj) :=

  case x_13 : obj of

  Expr.Var →

    let x_25 : obj := app x_6 x_1 x_2;

    ret x_25

  Expr.Val →

    let x_26 : obj := proj[0] x_13;

    let x_27 : obj := app x_4 x_8 x_12 x_26;

    ret x_27

  Expr.Add →

    let x_28 : obj := app x_6 x_1 x_2;

    ret x_28

  Expr.Mul →

    let x_29 : obj := app x_6 x_1 x_2;

    ret x_29;

case x_12 : obj of

Expr.Var →

  let x_15 : obj := ctor_0[PUnit.unit];

  jmp block_14 x_15

```

###### `deriv.lean`

```

let x_18 : obj := prec(_)._closed_3;

```

WTF, so it seems like the 'name' of a thing on the RHS can have whatever the fuck? what is the actual

grammar for lambdapure?

##### `qsort.lean`:

```

def term↑__1._closed_1 : obj :=

  let x_1 : obj := "term↑__1";

  ret x_1

```

ROFLmao, OK, I need unicode support, or at least a grammar to consult if I

am going to continue using `char*`.

# March 1, 2021

```cpp

if (desUpdates) {

pm.addNestedPass(mlir::lambdapure::createDestructiveUpdatePattern());

}

if (refCount) {

pm.addNestedPass(mlir::lambdapure::createReferenceRewriterPattern());;

}

if (runtimeLowering) {

pm.addPass(mlir::lambdapure::createLambdapureToLeanLowering());

}

```

So, this is the order we need to run these passes in. First destructive

updates, then reference rewriter.

# Feb 19 2021

- Fix lexer/parser to be able to parse input LEAN file.

- Pull lowering to c++ pass into `mlir-translate`.

- Generate code correctly (?) from lowering.

# Feb 17, 2021

- Cool, seems like I have lambdapure working. LEAN program:

```

set_option trace.compiler.ir.init true

inductive L

| Nil

| Cons : Nat -> L -> L

open L

instance : Inhabited L := ⟨Nil⟩

def filter : L -> L

| Nil => Nil

| Cons n l => if n > 5 then filter l else Cons n (filter l)

partial def make' : Nat -> Nat -> L

| n,d =>

  if d = 0 then Cons n Nil

  else Cons (n-d) (make' n (d -1))                     

def make (n : Nat) : L := make' n n

unsafe def main : List String → IO UInt32

| _ => let x := make 100; pure 0

def main2 : L := make 100 

```

All I had to change in the generated code from matt's lambdapure:

```

sed -i "s|runtime/lean.h|lean/lean.h|g"  out.cpp

sed -i "s|return 0;|main2(); return 0;|g" out.cpp

leanc out.cpp -o out

```

- It appears that `leanc` knows what paths to use to get things working.

- Time to pull all code from `lambdapure` into `lz`.

- I also want to overhaul the part of `lambdapure` that generates the MLIR

  to deal with the erased stuff (the boxes).

# Fri, 29th Jan

- GHC-wpc feedback: consider splitting into a `Maybe AltDefault`? This type of

  factoring of the default is quite ungainly to work with

```

[Alt' idBnd idOcc dcOcc tcOcc]      -- The DEFAULT case is always *first*

                                     -- if it is there at all

```

# Wed, 27th Jan

- `ghc-wpc` is amazing, it's tooling that _actually works_.

```

bollu@cantordust:~/temp/ > cat foo.hs  

{-# LANGUAGE NoImplicitPrelude #-}

module Foo where

data Bar = MkBar

foo :: Bar

foo = MkBar

bollu@cantordust:~/temp/ > lsfoo  foo.ghc_stgapp  foo.hi  foo.hs  foo.o  foo.o_modpak

bollu@cantordust:~/temp/ > rm foo.hi foo.o_modpak foo.o foo.ghc_stgapp 

bollu@cantordust:~/temp/ > ~/work/ghc-whole-program-compiler-project/ghc-wpc/_build/stage1/bin/ghc foo.hs

[1 of 1] Compiling Foo              ( foo.hs, foo.o )

bollu@cantordust:~/temp/ > /home/bollu/work/ghc-whole-program-compiler-project/external-stg/dist-newstyle/build/x86_64-linux/ghc-8.8.3/external-stg-0.1.0.1/x/ext-stg/build/ext-stg/ext-stg show foo.o_modpak

{- stg -}

package main

module Foo where

using ghc-prim : GHC.Types

using main : Foo

externals

  (ghc-prim_GHC.Types.[] : LiftedRep (forall a. [a]))

  (ghc-prim_GHC.Types.krep$* : LiftedRep (KindRep))

type

  ghc-prim_GHC.Types.KindRep

    ghc-prim_GHC.Types.KindRepTyConApp :: AlgDataCon [LiftedRep,LiftedRep]

    ghc-prim_GHC.Types.KindRepVar :: AlgDataCon [IntRep]

    ghc-prim_GHC.Types.KindRepApp :: AlgDataCon [LiftedRep,LiftedRep]

    ghc-prim_GHC.Types.KindRepFun :: AlgDataCon [LiftedRep,LiftedRep]

    ghc-prim_GHC.Types.KindRepTYPE :: AlgDataCon [LiftedRep]

    ghc-prim_GHC.Types.KindRepTypeLitS :: AlgDataCon [LiftedRep,AddrRep]

    ghc-prim_GHC.Types.KindRepTypeLitD :: AlgDataCon [LiftedRep,LiftedRep]

  

  ghc-prim_GHC.Types.Module

    ghc-prim_GHC.Types.Module :: AlgDataCon [LiftedRep,LiftedRep]

  

  ghc-prim_GHC.Types.TrName

    ghc-prim_GHC.Types.TrNameS :: AlgDataCon [AddrRep]

    ghc-prim_GHC.Types.TrNameD :: AlgDataCon [LiftedRep]

  

  ghc-prim_GHC.Types.TyCon

    ghc-prim_GHC.Types.TyCon :: AlgDataCon [WordRep,WordRep,LiftedRep,LiftedRep,IntRep,LiftedRep]

  

  ghc-prim_GHC.Types.[]

    ghc-prim_GHC.Types.[] :: AlgDataCon []

    ghc-prim_GHC.Types.: :: AlgDataCon [LiftedRep,LiftedRep]

  

  main_Foo.Bar

    main_Foo.MkBar :: AlgDataCon []

  

(main_Foo.MkBar : LiftedRep (Bar)) =

  main_Foo.MkBar :: AlgDataCon [] 

(main_Foo.$tc'MkBar1 : AddrRep (Addr#)) =

  "'MkBar"

(main_Foo.$tc'MkBar2 : LiftedRep (TrName)) =

  ghc-prim_GHC.Types.TrNameS :: AlgDataCon [AddrRep] main_Foo.$tc'MkBar1

(main_Foo.$tcBar1 : AddrRep (Addr#)) =

  "Bar"

(main_Foo.$tcBar2 : LiftedRep (TrName)) =

  ghc-prim_GHC.Types.TrNameS :: AlgDataCon [AddrRep] main_Foo.$tcBar1

(main_Foo.$trModule3 : AddrRep (Addr#)) =

  "Foo"

(main_Foo.$trModule4 : LiftedRep (TrName)) =

  ghc-prim_GHC.Types.TrNameS :: AlgDataCon [AddrRep] main_Foo.$trModule3

(main_Foo.$trModule1 : AddrRep (Addr#)) =

  "main"

(main_Foo.$trModule2 : LiftedRep (TrName)) =

  ghc-prim_GHC.Types.TrNameS :: AlgDataCon [AddrRep] main_Foo.$trModule1

(main_Foo.$trModule : LiftedRep (Module)) =

  ghc-prim_GHC.Types.Module :: AlgDataCon [LiftedRep,LiftedRep] main_Foo.$trModule2 main_Foo.$trModule4

(main_Foo.$tcBar : LiftedRep (TyCon)) =

  ghc-prim_GHC.Types.TyCon :: AlgDataCon [WordRep,WordRep,LiftedRep,LiftedRep,IntRep,LiftedRep] #Word#9087065647546038855 #Word#17224336406314564570 main_Foo.$trModule main_Foo.$tcBar2 #Int#0 ghc-prim_GHC.Types.krep$*

(main_Foo.$krep : LiftedRep (KindRep)) =

  ghc-prim_GHC.Types.KindRepTyConApp :: AlgDataCon [LiftedRep,LiftedRep] main_Foo.$tcBar ghc-prim_GHC.Types.[]

(main_Foo.$tc'MkBar : LiftedRep (TyCon)) =

  ghc-prim_GHC.Types.TyCon :: AlgDataCon [WordRep,WordRep,LiftedRep,LiftedRep,IntRep,LiftedRep] #Word#3328458281052523173 #Word#5691101527919328307 main_Foo.$trModule main_Foo.$tc'MkBar2 #Int#0 main_Foo.$krep

(main_Foo.foo : LiftedRep (Bar)) =

  main_Foo.MkBar :: AlgDataCon [] 

foreign stub C header {

}

foreign stub C source {

}

foreign files

```

# Tue, 26th Jan

thinking more carefully about the story, I realise that I can't actually

compare the results of the """demand analysis""" because I don't have an

analysis in the first place. I can only witness the results of the analysis by

proxy, by witnessing whether the worker/wrapper took place or not. So in a

sense, we don't actually compute any intermediate information!

```

testsuite/tests/stranal

testsuite/tests/cpranal

testsuite/tests/arityanal

```

- To read: [CPR analysis](http://research.microsoft.com/~simonpj/Papers/cpr/index.htm)

# Tue, 19th Jan

- https://github.com/csabahruska/p4f-control-flow-analysis

- https://twitter.com/csaba_hruska/status/1287701943863980032

- https://github.com/grin-compiler/haskell-code-spot

- https://www.patreon.com/posts/introducing-ghc-38173710

- https://github.com/u235axe/FeOFu

- https://github.com/grin-compiler/souffle-cfa-optimization-experiment

- https://github.com/grin-compiler/ghc-grin/tree/master/lambda-grin/test

 

# Saturday, Jan 17th

- Trying to fix an MLIR bug with respect to use-after-def and regions.

- When we ops defined one after the other in the wrong order, it gives the 

  error:

> bollu@cantordust:~/work/mlir/llvm-project/mlir/lib/ > cat ../test/IR/reuse-name-later.mlir // RUN: mlir-opt %s 

> 

> func @main() {

>     %one = constant 1 : i64

>     %x = addi %y, %y : i64

>     %y = constant 10 : i64

>     return

> }

>

> 

> ../test/IR/reuse-name-later.mlir:5:10: error: operand #0 does not dominate this use

>     %x = addi %y, %y : i64

>          ^

> ../test/IR/reuse-name-later.mlir:5:10: note: see current operation: %0 = GENERIC OP

> 

> ../test/IR/reuse-name-later.mlir:6:10: note: operand defined here (op in the same block)

>     %y = constant 10 : i64

but this does not work for  stuff in a region.

# Wednesday, Jan 6th

- Tensor is insufficient for my purposes because I see no way to express something like a `zip` in any way that MLIR 

  will know how to optimize. This is because it seems that all the effort in MLIR is spent on affine/linalg/.. all of which

  work on `memref`s.

- I'm going to weaken the checks and balances on `memref`s to see how far it takes me.

  In particular, I'm commenting out:

```diff

diff --git a/mlir/include/mlir/IR/BuiltinTypes.h b/mlir/include/mlir/IR/BuiltinTypes.h

index 3bfb3ce4c79b..ace5b1a24c05 100644

--- a/mlir/include/mlir/IR/BuiltinTypes.h

+++ b/mlir/include/mlir/IR/BuiltinTypes.h

@@ -445,7 +445,8 @@ public:

   /// Return true if the specified element type is ok in a memref.

   static bool isValidElementType(Type type) {

-    return type.isIntOrIndexOrFloat() || type.isa();

+    return true;

+    // return type.isIntOrIndexOrFloat() || type.isa();

   }

   /// Methods for support type inquiry through isa, cast, and dyn_cast.

diff --git a/mlir/lib/Parser/TypeParser.cpp b/mlir/lib/Parser/TypeParser.cpp

index ab7f85a645e4..f777963fd9a7 100644

--- a/mlir/lib/Parser/TypeParser.cpp

+++ b/mlir/lib/Parser/TypeParser.cpp

@@ -217,9 +217,10 @@ Type Parser::parseMemRefType() {

     return nullptr;

   // Check that memref is formed from allowed types.

-  if (!elementType.isIntOrIndexOrFloat() &&

-      !elementType.isa())

-    return emitError(typeLoc, "invalid memref element type"), nullptr;

+  // allow arbitrary element types.

+  // if (!elementType.isIntOrIndexOrFloat() &&

+  //     !elementType.isa())

+  //   return emitError(typeLoc, "invalid memref element type"), nullptr;

```

which checks that the memref has an element type of `int/float/complex/index`.

# Tuesday, Jan 5th

- There's a [`for_with_yield`](https://github.com/llvm/llvm-project/blob/main/mlir/test/Conversion/AffineToStandard/lower-affine.mlir#L29)

  which is what I need. time to update MLIR!

- Seems like `OneResult` needs a separate trait to get result types.

```

commit 9eb3e564d3b1c772a64eef6ecaa3b1705d065218

Author: Chris Lattner 

Date:   Wed Dec 23 18:13:39 2020 -0800

    [ODS] Make the getType() method on a OneResult instruction return a specific type.

```

- Seems also that the LLVM dialet's helpers like `LLVMType::getI64Type` re

  removed.

- `StandardTypes.h` no longer exists, moved to `BuiltinTypes.h`.

OK this is failing to link against stuff:

```

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::LLVM::LLVMType::getFunctionParamType(unsigned int)'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::MutableDictionaryAttr::set(mlir::Identifier, mlir::Attribute)'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::LLVM::LLVMType::getVectorElementCount()'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::LLVM::LLVMType::getVectorElementType()'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::LLVM::LLVMType::getIntegerBitWidth()'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::LLVM::LLVMType::isArrayTy()'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::LLVM::LLVMType::isVectorTy()'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::IntegerType::get(unsigned int, mlir::MLIRContext*)'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::LLVM::LLVMType::getArrayNumElements()'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::LLVM::LLVMFuncOp::build(mlir::OpBuilder&, mlir::OperationState&, llvm::StringRef, mlir::LLVM::LLVMType, mlir::LLVM::Linkage, llvm::Ar

te> >, llvm::ArrayRef)'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::NamedAttrList::append(mlir::Identifier, mlir::Attribute)'

/home/bollu/work/mlir/llvm-project/build/lib/libMLIRTargetLLVMIR.so.12git: undefined reference to `mlir::LLVM::LLVMType::getArrayElementType()'

clang: error: linker command failed with exit code 1 (use -v to see invocation)

ninja: build stopped: subcommand failed.

```

Mh, I should not only build `mlir-opt`, but just run `ninja` on the

top directory. There seem to be things I depend on (what?) that aren't

used my `mlir-opt`.

- Great, I can lower affine!

# Friday

- Got my tests working for end-to-end

- TODO (1): test memref

- TODO (2): test all other examples in Lowering

- TODO (3): implement vector benchmarks

- TODO (10): implement unification

- TODO (12): implement GRIN optimisations

- TODO (13): implement tabled typeclass resolution

- TODO (14): read call by push value

# Thursday, 31st Dec

- Lower `maybe-int-non-tail-recursive` and see that the generated LLVM is optimized.

- Lower `memref`.

- Listen to [coffee house sounds](https://www.youtube.com/watch?v=gaGrHUekGrc) for 

  productivity!

# Monday 28th Dec

For whatever reason, on trying to lower my `Ptr+Standard` dialect to LLVM,

I get a failure in a nonsensical location:

```cpp

class OffsetSizeAndStrideOpInterface : public

::mlir::OpInterface {

```

This happens at the line:

```cpp

failed(mlir::verify(getOperation()))

```

It seems like `mlir::verify()` picks up some nonsensical `OpInterface`?!

The exact traceback is:

```

0.	Program arguments: hask-opt lower-ap.mlir --lz-lower --ptr-lower

 #0 0x000000000046dceb backtrace (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x46dceb)

 #1 0x0000000000632c6c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x632c6c)

 #2 0x0000000000630844 llvm::sys::RunSignalHandlers() (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x630844)

 #3 0x00000000006309b3 SignalHandler(int) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x6309b3)

 #4 0x00007f9f79ee0980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)

 #5 0x00007f9f7892ffb7 raise /build/glibc-S7xCS9/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0

 #6 0x00007f9f78931921 abort /build/glibc-S7xCS9/glibc-2.27/stdlib/abort.c:81:0

 #7 0x00007f9f7892148a __assert_fail_base /build/glibc-S7xCS9/glibc-2.27/assert/assert.c:89:0

 #8 0x00007f9f78921502 (/lib/x86_64-linux-gnu/libc.so.6+0x30502)

 #9 0x0000000000985f8a mlir::detail::Interface, mlir::OpTrait::TraitBase>::Interface(mlir::Operation*) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x985f8a)

#10 0x0000000000985de0 mlir::OpInterface::OpInterface(mlir::Operation*) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x985de0)

#11 0x000000000097cf40 mlir::OffsetSizeAndStrideOpInterface::OffsetSizeAndStrideOpInterface(mlir::Operation*) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x97cf40)

#12 0x000000000097b767 LowerPointerPass::runOnOperation() (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x97b767)

#13 0x0000000000a9eda1 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0xa9eda1)

#14 0x0000000000a9f051 mlir::detail::OpToOpPassAdaptor::runPipeline(llvm::iterator_range >*, mlir::Pass> >, mlir::Operation*, mlir::AnalysisManager, bool) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0xa9f051)

#15 0x0000000000aa1a11 mlir::PassManager::run(mlir::ModuleOp) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0xaa1a11)

#16 0x00000000009d5525 performActions(llvm::raw_ostream&, bool, bool, llvm::SourceMgr&, mlir::MLIRContext*, mlir::PassPipelineCLParser const&) (.constprop.101) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x9d5525)

#17 0x00000000009d5a97 processBuffer(llvm::raw_ostream&, std::unique_ptr >, bool, bool, bool, bool, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x9d5a97)

#18 0x00000000009d5c66 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr >, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&, bool, bool, bool, bool, bool) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x9d5c66)

#19 0x00000000009d60ad mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&, bool) (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x9d60ad)

#20 0x00000000004db274 main (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x4db274)

#21 0x00007f9f78912bf7 __libc_start_main /build/glibc-S7xCS9/glibc-2.27/csu/../csu/libc-start.c:344:0

#22 0x00000000004373fa _start (/home/bollu/work/mlir/lz/build/bin/hask-opt+0x4373fa)

fish: Job 1, 'hask-opt lower-ap.mlir --lz-l...' terminated by signal SIGABRT (Abort)

```

The solution is:

```cpp

ModuleOp mod = mlir::cast(getOperation());

if (failed(mod.verify())) {

  llvm::errs() << "===Ptr lowering failed at Verification===\n";

  getOperation()->print(llvm::errs());

  llvm::errs() << "\n===\n";

  signalPassFailure();

}

```

I find this sort of thing disturing, since it implies that the heavy use of

"C++ interface" that is very common in MLIR may start failing peculiarly?

- Why can LLVM ops only have LLVM types? This is so annoying. I want to use `LLVMUndefOp`

  but I need some song and dance to use it because I can't say `undef : !ptr.void`

  or `undef: !lz.value`. 

  ```cpp

auto undef = rewriter.create(

  rewriter.getUnknownLoc(),

  typeConverter->convertType(caseop.getResult().getType()));

```

fails with:

```

    //===-------------------------------------------===//

  } -> FAILURE : generated operation 'llvm.mlir.undef'(0x0000607000006D80) was illegal

} -> FAILURE : no matched legalization pattern

```

because for whatever reason, `LLVM::UndefOp` can only take a 

but succeeds with:

```cpp

auto undef = rewriter.create(

  rewriter.getUnknownLoc(),

  typeConverter->convertType(caseop.getResult().getType()));

```

Because `llvm.mlir.undef` can only return LLVM types. What is this nonsense?

# Thursday 24th Dec

```

===

"func"() ( {

^bb0(%arg0: !ptr.void, %arg1: !ptr.void):  // no predecessors

  %0 = "lz.force"(<>) : (!lz.thunk) -> i64

  %1 = "lz.force"(<>) : (!lz.thunk) -> i64

  "std.return"(%0) : (i64) -> ()

}) {sym_name = "f", type = (!ptr.void, !ptr.void) -> !ptr.void} : () -> ()

===

```

- The call `rewriter.applySignatureConversion(&newFuncOp.getBody(), inputs);` seems

  to completely fuck up arguments?! Without it, the new function is identical to the

  old one.

- The correct API is `if (failed(rewriter.convertRegionTypes(&newFuncOp.getBody(), *typeConverter, &inputs)))`,

  as I learnt from `StandardToLLVM::FuncOpConversionBase`.

- In a `ForceOp`, should I *manually* call the source materializer? That seems janky as fuck!

   Becase the `ForceOp` lowers to a thing that returns `!ptr.void`, but it should be `!i64`.

```

    // vvv HACK: I shouldn't have to call this manually?!

    typeConverter->materializeSourceConversion(builder, out.getLoc(), )

```

What the blazes is this error now?

```

** Insert  : 'ptr.ptrtoint'(0x60c0000040c0)

hask-opt: /usr/local/include/mlir/IR/Builders.h:400: OpTy mlir::OpBuilder::create(mlir::Location, Args &&...) [OpTy = mlir::ptr::PtrPtrToIntOp, Args = ]: Assertion `result && "builder didn't return the right type"' failed.

```

The location it fails at:

```cpp

// MLIR/IR/Builders.h

OpTy create(Location location, Args &&... args) {

    ...

    OpTy::build(*this, state, std::forward(args)...);

    auto *op = createOperation(state);

    auto result = dyn_cast(op);

    assert(result && "builder didn't return the right type");

    ...

}

```

How the fuck can `create` fail?! 

OK, fucking CRTP mistakes.

- Now I have a real puzzler: Do I create a `FunctionPtrToVoidPtr` in my `ptr` dialect?

  Or do I treat this wit more delicacy? Hmm. Unsure what the right way to do this is.

- Also, should the casting of `FunctionPtr` to `VoidPtr` be *automatic*? Ie,

  do we agree that at this stage we indeed lose all knowledge of function pointer

  types? Decisions, decisions;

- Fuck me, so apparently `std.constant` function references ALSO don't get 

  rewrittern. Storms, has anyone written ANYTHING nontrivial with this shit?

```cpp

lower-ap-and-force.mlir:20:10: error: 'std.constant' op reference to function with mismatched type

    %f = constant @f : (!lz.thunk, !lz.thunk) -> i64

         ^

lower-ap-and-force.mlir:20:10: note: see current operation:

%f_0 = "std.constant"() {value = @f} : () -> ((!lz.thunk, !lz.thunk) -> i64)

===Hask -> LLVM lowering failed at Verification===

module  {

  func private @mkClosure_capture0_args2(!ptr.void, !ptr.void, !ptr.void) -> !ptr.void

  func private @mkClosure_capture0_args0(!ptr.void) -> !ptr.void

  func private @evalClosure(!ptr.void) -> !ptr.void

  func @f(%arg0: !ptr.void, %arg1: !ptr.void) -> i64 {

    ...

  }

  func @main() -> i64 {

    // vvv UNCHANGED!

    %f_0 = constant @f : (!lz.thunk, !lz.thunk) -> i64

    // ^^^ UNCHANGED!

    ...

  }

}

```

This is just sad.

# Monday 21st december

- Then you read the API to see the nugget:

```cpp

  /// Replaces the result op with a new op that is created without verification.

  /// The result values of the two ops must be the same types.

  template 

  void replaceOpWithNewOp(Operation *op, Args &&... args) {

    auto newOp = create(op->getLoc(), std::forward(args)...);

    replaceOpWithResultsOfAnotherOp(op, newOp.getOperation());

  }

```

- `The result values of the two ops must be the same types.` :(

- What a waste of a day, I lost an entire night on this?! Fuck me.

- Why does `async` have a `CallOpConversionPattern`? This is bizarre.

# Friday Dec 18th

```

lower-case-single-alt-no-args.mlir:9:10: error: 'scf.if' op expects region #0 to have 0 or 1 blocks

    %y = lz.case @Maybe %boxedx

         ^

lower-case-single-alt-no-args.mlir:9:10: note: see current operation: %8 = "scf.if"(%7) ( {

^bb1:  // no predecessors

  %9 = "llvm.mlir.addressof"() {global_name = @Nothing} : () -> !llvm.ptr>

  %10 = "llvm.mlir.constant"() {value = 0 : index} : () -> !llvm.i64

  %11 = "llvm.getelementptr"(%9, %10, %10) : (!llvm.ptr>, !llvm.i64, !llvm.i64) -> !llvm.ptr

  %12 = "llvm.call"(%11) {callee = @mkConstructor0} : (!llvm.ptr) -> !llvm.ptr

  "scf.yield"(%12) : (!llvm.ptr) -> ()

},  {

}) : (!llvm.i1) -> !llvm.ptr

```

What does it mean! It clearly has a single basic block `^bb1`!

For reasons I don't understand well, 

```

module {

func @main() -> i1 {

    %foo = constant 1 : i1

    lz.return %foo : i1

}

}

```

- can NEVER be lowered directly to LLVM? The problem is that essentially, if I 

   convert an `lz.return` to an `std.return` the `std.return` fails??? I have

   no idea why. Anyway, it seems like I should only use `lz.return` inside a `case`.

    Fuck my pattern matches.

- Joy, this means that I need to convert the outermost `lz.return` to an `std.return`

  everywhere.

##### The solution

- First lower `lz` to `standard+SCF`. Have that then be lowered to `LLVM`.

# Friday Dec 18th

##### Pretty sure I have an MLIR bug. 

Say I have the source IR:

```

module {

  func @main() {

    %y = constant 42 : i64

    %boxy = lz.construct(@Just, %y: i64) // check box of i64

    return

  }

}

```

What needs to happen is for us to insert a lower form of `construct` that

deals with pointers directly, and for the `%y : i64` to get converted

with an `inttoptr` operation. On adding the correct materialization:

```cpp

// int -> !ptr.void

addTargetMaterialization([&](OpBuilder &rewriter, ptr::VoidPtrType resultty,

                              ValueRange vals,

                              Location loc) -> mlir::Optional {

  if (vals.size() != 1 || !vals[0].getType().isa()) {

    return {};

  }

  ptr::PtrIntToPtrOp op = rewriter.create(loc, vals[0]);

  return op.getResult();

});

```

I get the failure:

```cpp

module  {

  func private @mkConstructor1(!ptr.char, !ptr.void) -> !ptr.void

  func @main() {

    %c42_i64 = constant 42 : i64

    %0 = "ptr.inttoptr"(%c42_i64) : (i64) -> !ptr.void

    %1 = "ptr.string"() {value = "Just"} : () -> !ptr.char

    %2 = "lz.construct"(%c42_i64) {dataconstructor = @Just} : (i64) -> !lz.value

    // vvv %c42_i64 should be %0 vvv!

    %3 = call @mkConstructor1(%1, %c42_i64) : (!ptr.char, i64) -> !ptr.void

    return

  }

}

```

which is illegal! The `%c42_i64` should have been _replaced_ by `%0` at its use

site in `@mkConstructor1: (!ptr.char, !ptr.void) -> !ptr.void` but it is not!  

- I need to perform the *technically illegal* (according to what MLIR asks us to do):

```cpp

// int -> !ptr.void

addTargetMaterialization([&](OpBuilder &rewriter, ptr::VoidPtrType resultty,

                              ValueRange vals,

                              Location loc) -> mlir::Optional {

  if (vals.size() != 1 || !vals[0].getType().isa()) {

    return {};

  }

  ptr::PtrIntToPtrOp op = rewriter.create(loc, vals[0]);

  llvm::SmallPtrSet exceptions;

  exceptions.insert(op);

  // vvv isn't this a hack? why do I need this?

  vals[0].replaceAllUsesExcept(op.getResult(), exceptions);

  return op.getResult();

});

```

to get the correct IR:

```cpp

module  {

  func private @mkConstructor1(!ptr.char, !ptr.void) -> !ptr.void

  func @main() {

    %c42_i64 = constant 42 : i64

    %0 = "ptr.inttoptr"(%c42_i64) : (i64) -> !ptr.void

    %1 = "ptr.string"() {value = "Just"} : () -> !ptr.char

    %2 = call @mkConstructor1(%1, %0) : (!ptr.char, !ptr.void) -> !ptr.void

    return

  }

}

```

# Thursday Dec 17th

Wow, does the MLIR legalizer literally erase ops it doesn't understand?!

```

    Legalizing operation : 'scf.if'(0x60f0000009a8) {

      * Fold {

      } -> FAILURE : unable to fold

      * Pattern : 'scf.if -> ()' {

        ** Insert  : 'std.br'(0x60c0000046c0)

        ** Insert  : 'std.br'(0x60e000000820)

        ** Erase   : 'lz.return'(0x60c000004480)

```

which leads me down the line to "unknown terminator". Yeah, if you erase

my terminator, you sure as hell won't know! This leads to the error:

```

Error: KNOWN NON TERMINATOR:%14 = scf.if %13 -> (!lz.value) {

^bb1(%15: i64):  // no predecessors

  %c1_i64 = constant 1 : i64

  %16 = addi %15, %c1_i64 : i64

  %17 = "lz.construct"(%16) {dataconstructor = @Just} : (i64) -> !lz.value

  lz.return %17 : !lz.value

}

ERROR IN OPERATION:

%9 = scf.if %6 -> (!lz.value) {

} else {

  %10 = llvm.mlir.addressof @Just : !llvm.ptr>

  %11 = llvm.mlir.constant(0 : index) : !llvm.i64

  %12 = llvm.getelementptr %10[%11, %11] : (!llvm.ptr>, !llvm.i64, !llvm.i64) -> !llvm.ptr

  %13 = llvm.call @isConstructorTagEq(%1, %12) : (!lz.value, !llvm.ptr) -> !llvm.i1

  %14 = scf.if %13 -> (!lz.value) {

  ^bb1(%15: i64):  // no predecessors

    %c1_i64 = constant 1 : i64

    %16 = addi %15, %c1_i64 : i64

    %17 = "lz.construct"(%16) {dataconstructor = @Just} : (i64) -> !lz.value

    lz.return %17 : !lz.value

  }

}

```

- MLIR todo: Add `hasTerminator()` and `getOptionalTerminator()` to deal with still-in-construction OPS

I was doing something naive like:

```cpp

void convertReturnsToYields(mlir::Region *r, mlir::PatternRewriter &rewriter) {

  for (Block &b : r->getBlocks()) {

    HaskReturnOp ret = mlir::dyn_cast(b.getTerminator());

    if (!ret) {

      continue;

    }

    llvm::errs() << "vvvvvbefore convertReturnsToYields:vvvvv\n";

    b.print(llvm::errs());

    rewriter.setInsertionPointAfter(ret);

    rewriter.replaceOpWithNewOp(ret.getOperation(),

                                                    ret.getOperand());

    llvm::errs() << "===after convertReturnsToYields:=====\n";

    b.print(llvm::errs());

    llvm::errs() << "\n^^^^^\n";

  }

}

```

against an op:

```

Error: EMPTY BB!

ERROR IN OPERATION:

%7 = "scf.if"(%6) ( {

^bb1:  // no predecessors

  %8 = "lz.construct"() {dataconstructor = @Nothing} : () -> !lz.value

  "lz.return"(%8) : (!lz.value) -> ()

},  {

}) : (!llvm.i1) -> !lz.value

hask-opt: /home/bollu/work/mlir/llvm-project/mlir/lib/IR/Block.cpp:231: mlir::Operation* mlir::Block::getTerminator(): Assertion `!empty() && !back().isKnownNonTerminator()' faile

```

which ofc will not work since the else block is empty `x(`.

- PROTIP: switch between `applyPartialConversion` and `applyFullConversion` to debug

  what the fuck is going on. They give _different_ error messages. When one is useless,

  the other tends to be useful.

- So the API can only deal with incorrect types from `src -> target` not `target -> target`.

  This is _really_ problematic.  For example, say that we want `i64 -> !llvm.i64`. But sometimes,

  we want to have a `!llvm.i64 -> !llvm.ptr` since we are marshalling an `int` into a pointer

  using `inttoptr`. This conversion is impossible to perform(?) using the MLIR lowering.

```

lower-case.mlir:6:15: error: 'llvm.call' op operand type mismatch for operand 1: '!llvm.i64' != '!llvm.ptr'

    %boxedx = lz.construct(@Just, %x: i64)

              ^

lower-case.mlir:6:15: note: see current operation: %4 = "llvm.call"(%3, %0) {callee = @mkConstructor1} : (!llvm.ptr, !llvm.i64) -> !llvm.ptr

===Hask -> LLVM lowering failed===

module  {

  llvm.mlir.global internal constant @Nothing("Nothing")

  llvm.func @isConstructorTagEq(!llvm.ptr, !llvm.ptr) -> !llvm.i1

  llvm.mlir.global internal constant @Just("Just")

  llvm.func @mkConstructor1(!llvm.ptr, !llvm.ptr) -> !llvm.ptr

  llvm.func @main() -> !llvm.ptr {

    %0 = llvm.mlir.constant(42 : i64) : !llvm.i64

    %1 = llvm.mlir.addressof @Just : !llvm.ptr>

    %2 = llvm.mlir.constant(0 : index) : !llvm.i64

    %3 = llvm.getelementptr %1[%2, %2] : (!llvm.ptr>, !llvm.i64, !llvm.i64) -> !llvm.ptr

    %4 = llvm.call @mkConstructor1(%3, %0) : (!llvm.ptr, !llvm.i64) -> !llvm.ptr

    %5 = llvm.mlir.addressof @Nothing : !llvm.ptr>

    %6 = llvm.mlir.constant(0 : index) : !llvm.i64

    %7 = llvm.getelementptr %5[%6, %6] : (!llvm.ptr>, !llvm.i64, !llvm.i64) -> !llvm.ptr

    %8 = llvm.call @isConstructorTagEq(%4, %7) : (!llvm.ptr, !llvm.ptr) -> !llvm.i1

    llvm.cond_br %8, ^bb1, ^bb3

  ^bb1:  // pred: ^bb0

  ^bb2:  // no predecessors

    %9 = "lz.construct"() {dataconstructor = @Nothing} : () -> !lz.value

    %10 = llvm.inttoptr %9 : !lz.value to !llvm.ptr

    llvm.br ^bb8(%10 : !llvm.ptr)

  ^bb3:  // pred: ^bb0

    %11 = llvm.mlir.addressof @Just : !llvm.ptr>

    %12 = llvm.mlir.constant(0 : index) : !llvm.i64

    %13 = llvm.getelementptr %11[%12, %12] : (!llvm.ptr>, !llvm.i64, !llvm.i64) -> !llvm.ptr

    %14 = llvm.call @isConstructorTagEq(%4, %13) : (!llvm.ptr, !llvm.ptr) -> !llvm.i1

    llvm.cond_br %14, ^bb4, ^bb6

  ^bb4:  // pred: ^bb3

  ^bb5(%15: !llvm.i64):  // no predecessors

    %c1_i64 = constant 1 : i64

    %16 = "std.addi"(%15, %c1_i64) : (!llvm.i64, i64) -> i64

    %17 = "lz.construct"(%16) {dataconstructor = @Just} : (i64) -> !lz.value

    %18 = llvm.inttoptr %17 : !lz.value to !llvm.ptr

    llvm.br ^bb6(%18 : !llvm.ptr)

  ^bb6(%19: !llvm.ptr):  // 2 preds: ^bb3, ^bb5

    llvm.br ^bb7

  ^bb7:  // pred: ^bb6

    %20 = llvm.inttoptr %19 : !llvm.ptr to !llvm.ptr

    llvm.br ^bb8(%20 : !llvm.ptr)

  ^bb8(%21: !llvm.ptr):  // 2 preds: ^bb2, ^bb7

    llvm.br ^bb9

  ^bb9:  // pred: ^bb8

    llvm.return %21 : !llvm.ptr

  }

}

```

# Wednesday Dec 16th

```

lower-linalg.mlir:14:3: error: failed to legalize operation 'func'

  func @sum(%buffert: !lz.thunk>) -> i64 {

```

- WHAT DOES IT FUCKING MEAN? If I try to ask it to lower a dummy `foo.mlir`:

```

//foo.mlir

module {

  func @main () -> i64 {

    %size = std.constant 1024 : i64

    return %size : i64

  }

}

```

it succeeds!

```

[I] /home/bollu/work/mlir/lz/test/ToLLVM > ninja -C ~/work/mlir/lz/build/ &&  hask-opt --lz-lower-to-llvm foo.mlir

module  {

  llvm.func @main() -> !llvm.i64 {

    %0 = llvm.mlir.constant(1024 : i64) : !llvm.i64

    llvm.return %0 : !llvm.i64

  }

}

```

I suspect that it's because of the `lz.thunk` in the type?

Yes indeed. Consider this:

```

[I] /home/bollu/work/mlir/lz/test/ToLLVM > ninja -C ~/work/mlir/lz/build/ &&  hask-opt --lz-lower-to-llvm foo.mlir

foo.mlir:6:3: error: failed to legalize operation 'func'

  func @main (%x: !lz.thunk) -> i64 {

  ^

foo.mlir:6:3: note: see current operation: "func"() ( {

^bb0(%arg0: !lz.thunk):  // no predecessors

  %c1024_i64 = "std.constant"() {value = 1024 : i64} : () -> i64

  "std.return"(%c1024_i64) : (i64) -> ()

}) {sym_name = "main", type = (!lz.thunk) -> i64} : () -> ()

===Hask -> LLVM lowering failed===

module  {

  func @main(%arg0: !lz.thunk) -> i64 {

    %c1024_i64 = constant 1024 : i64

    return %c1024_i64 : i64

  }

}

```

See that `%arg0` is `!lz.thunk`. I guess I need to teach the `LLVMTypeConverter`

than a `!lz.thunk` is a void pointer? This kind of thing is deeply annoying.

Amazing, it seems the `linalg` dialect doesn't know how to legalize `dim`?

Or I'm being amazingly stupid. Don't know which:

```

// lower-linalg.mlir

lower-linalg.mlir:17:10: error: failed to legalize operation 'std.dim'

    %N = dim %buffer, %c0 : memref

         ^

lower-linalg.mlir:17:10: note: see current operation: %3 = "std.dim"(%1, %c0) : (memref, index) -> index

module  {

  func @sum(%arg0: !lz.thunk>) -> i64 {

    %0 = "lz.force"(%arg0) : (!lz.thunk>) -> memref

    %c0 = constant 0 : index

    %1 = dim %0, %c0 : memref

    %c0_i64 = constant 0 : i64

    %2 = affine.for %arg1 = 0 to %1 iter_args(%arg2 = %c0_i64) -> (i64) {

      %3 = affine.load %0[%arg1] : memref

      %4 = addi %arg2, %3 : i64

      affine.yield %4 : i64

    }

    return %2 : i64

  }

  func @seq(%arg0: i64) -> memref {

    %0 = index_cast %arg0 : i64 to index

    %1 = alloc(%0) : memref

    affine.for %arg1 = 0 to %0 {

      %2 = index_cast %arg1 : index to i64

      affine.store %2, %1[%arg1] : memref

    }

    return %1 : memref

  }

  func @main() -> i64 {

    %f = constant @seq : (i64) -> memref

    %c1024_i64 = constant 1024 : i64

    %0 = "lz.ap"(%f, %c1024_i64) : ((i64) -> memref, i64) -> !lz.thunk>

    %f_0 = constant @sum : (!lz.thunk>) -> i64

    %1 = "lz.ap"(%f_0, %0) : ((!lz.thunk>) -> i64, !lz.thunk>) -> !lz.thunk

    %2 = "lz.force"(%1) : (!lz.thunk) -> i64

    return %2 : i64

  }

}

```

# Thursday Dec 11th

```

    mlir::FuncOp outlinedFn = parentfn.clone();

    outlinedFn.setName(outlinedFnName);

    outlinedFn.setType(outlinedFnty);

```

this *does not set the type* correctly?!

The full module:

```

"module"() ( {

  "func"() ( {

  ^bb0(%arg0: !lz.value):  // no predecessors

    %c42_i64 = "std.constant"() {value = 42 : i64} : () -> i64

    %c1_i64 = "std.constant"() {value = 1 : i64} : () -> i64

    %f = "std.constant"() {value = @f_outline_case_arg} : () -> ((i64) -> !lz.value)

    %0 = "lz.case"(%arg0) ( {

    ^bb0(%arg1: i64):  // no predecessors

      %1 = "lz.caseint"(%arg1) ( {

        %2 = "lz.construct"(%c42_i64) {dataconstructor = @SimpleInt} : (i64) -> !lz.value

        "lz.return"(%2) : (!lz.value) -> ()

      },  {

        %2 = "std.subi"(%arg1, %c1_i64) : (i64, i64) -> i64

        %3 = "lz.apEager"(%f, %2) : ((i64) -> !lz.value, i64) -> !lz.value

        "lz.return"(%3) : (!lz.value) -> ()

      }) {alt0 = 0 : i64, alt1 = @default} : (i64) -> !lz.value

      "lz.return"(%1) : (!lz.value) -> ()

    }) {alt0 = @SimpleInt, constructorName = @SimpleInt} : (!lz.value) -> !lz.value

    "std.return"(%0) : (!lz.value) -> ()

  }) {sym_name = "f", type = (!lz.value) -> !lz.value} : () -> ()

  "func"() ( {

    %c3_i64 = "std.constant"() {value = 3 : i64} : () -> i64

    %f = "std.constant"() {value = @f} : () -> ((!lz.value) -> !lz.value)

    %0 = "lz.construct"(%c3_i64) {dataconstructor = @SimpleInt} : (i64) -> !lz.value

    %1 = "lz.apEager"(%f, %0) : ((!lz.value) -> !lz.value, !lz.value) -> !lz.value

    "std.return"(%1) : (!lz.value) -> ()

  }) {sym_name = "main", type = () -> !lz.value} : () -> ()

  "func"() ( {

  ^bb0(%arg0: !lz.value):  // no predecessors

    %0 = "lz.construct"(%arg0) {dataconstructor = @SimpleInt} : (!lz.value) -> !lz.value

    %1 = "lz.case"(%0) ( {

    ^bb0(%arg1: i64):  // no predecessors

      %2 = "lz.caseint"(%arg1) ( {

        %c42_i64 = "std.constant"() {value = 42 : i64} : () -> i64

        %3 = "lz.construct"(%c42_i64) {dataconstructor = @SimpleInt} : (i64) -> !lz.value

        "lz.return"(%3) : (!lz.value) -> ()

      },  {

        %c1_i64 = "std.constant"() {value = 1 : i64} : () -> i64

        %3 = "std.subi"(%arg1, %c1_i64) : (i64, i64) -> i64

        %f = "std.constant"() {value = @f_outline_case_arg} : () -> ((i64) -> !lz.value)

        %4 = "lz.apEager"(%f, %3) : ((i64) -> !lz.value, i64) -> !lz.value

        "lz.return"(%4) : (!lz.value) -> ()

      }) {alt0 = 0 : i64, alt1 = @default} : (i64) -> !lz.value

      "lz.return"(%2) : (!lz.value) -> ()

    }) {alt0 = @SimpleInt, constructorName = @SimpleInt} : (!lz.value) -> !lz.value

    "std.return"(%1) : (!lz.value) -> ()

  }) {sym_name = "f_outline_case_arg", type = (i64) -> !lz.value} : () -> ()

  "module_terminator"() : () -> ()

}) : () -> ()

```

See that we had:

- `{sym_name = "f_outline_case_arg", type = (i64) -> !lz.value} : () -> ()`

- BUT the fucking entry BB type of this function is: `^bb0(%arg0: !lz.value):  // no predecessors`

- I was hoping the `TypeConverter` did the "sensible thing" on trying to lower `FunctionType`.

  But alas, it does not, probably for flexibility.

```cpp

Type resultTy = typeConverter->convertType(fnty);

```

```cpp

===

===

asked to lower incorrect constant op:

===

%f_0 = "std.constant"() {value = @f} : () -> ((!lz.thunk, !lz.thunk) -> i64)

new type: |(!lz.thunk, !lz.thunk) -> i64|

```

- I guess I need to manually convert the FunctionType using the `TypeConverter`.

- Another MLIR bug (?) Anything whose legality is checked with `isDynamicallyLegal` needs to be `rewrite.erase`d and then `rewriter.create`d. It seems like you can't use `rewriter.replaceOpWithNewOp` for such operations.

# Wednesday Dec 10th

What are the guarantees of the use/def chain? Does it guarantee us that the

order of visiting expressions? If I have:

```

%a = ... 

%b = f(%a) {

    g(%a)

  }

```

am I guaranteed that I will visit use `f` before use `g`? Ie, what is

the semantics of the walk-the-use-*chain* with respect to nesting of regions.

If I want to find the "first outermost use", how do I do so?

- Seems like all I need to do was to go from `return` to `lz.return` and all

  is well `:)`. 

# Tuesday, Dec 9th

- I'm not tracking number of thunkifies correctly: `ap` should also count as

  thunkify! Fixed this.

- Now I'm trying to debug what's wrong with my `CaseOfFnInput`. Man it's a real

  doozy. Consider the code:

```rust

def f(si: SimpleInt) -> SimpleInt {

  let out = match si {

    SimpleInt(i) =>

      match i {

        0 => return 42;

        _ => {

          let sidec = SimpleInt(i - 1);

          let sj = f(sidec);

          return match sj {

            SimleInt(j) => return (j + 1); 

          } 

        }

      }

  };

  outWrap = SimpleInt(out);

  return out;

}

```

Now the part that makes this annoying is that naively, what we want to do

is to peel the case of `si` into a `fSimpleInt`, giving us

of the `match si { SimpleInt(i) => fSimpleInt(i)}`. Then we want to

replace all recursive calls `f(SimpleInt(y))` with `fSimpleInt(y)`. If

we *naively* perform the translation, here's what we get:

```rust

def fSimpleInt(i: int) -> int {

      match i {

        0 => return 42;

        _ => {

          // let sidec = SimpleInt(i - 1);

          let idec = i - 1;

          // vvv WRONG TYPE! fSimpleInt returns an `int`,

          // vvv but sj is a `SimpleInt`.

          let sj = fSimpleInt(i - 1); 

          return match sj {

            SimpleInt(j) => return (j + 1); 

          } 

        }

}

def f(si: SimpleInt) -> SimpleInt {

  let out = match si {

    SimpleInt(i) => return fSimpleInt(i);

  };

  outWrap = SimpleInt(out);

  return out;

}

```

where did we go wrong?! A little thought shows us that the problem is that

we didn't perform transfer of control flow correctly. What we *should* do

is to copy *all the instructions in `f`* after the `match si { ... }` to

decide how to return from `fSimpleInt`. aa;a;k

# Friday, Dec 5th

- Printing an operation in its generic form:

```cpp

llvm::errs() << "outlinedFn:\n";

mlir::OpPrintingFlags flags;

outlinedFn.print(llvm::errs(), flags.printGenericOpForm());

assert(false);

```

# Thursday, Dec 4th

- Our transformation of outline/inline is very similar to converting a `while(c){..}`

  into an `if(c) { do{..}while(c)}`. What other "classical loop knowledge"

  can we take?

- Is this inlining/outlining nonsense literally just performing CPS? Aren't

  we encoding things as "continuations" when we outline+call? isn't SSA supposed

  to free us from this? why isn't it freeing us from this? is it because of

  recursion? If so, should we convert a `scf.recurse`?

On giving haskell the complicated program I'm interested in:

```hs

{-# LANGUAGE MagicHash #-}

module GHCMaybeIntNonTailRecursive(main) where

import GHC.Int

import GHC.Prim

data MaybeHash = JustHash Int# | NothingHash deriving(Show)

f :: MaybeHash -> MaybeHash

f mi = case mi of

        JustHash i# -> 

               case i# of

                 0# -> JustHash 5#

                 _ -> case f (JustHash (i# -# 1#)) of

                      NothingHash -> NothingHash

                      JustHash j# -> JustHash (j# +# 7#)

        NothingHash -> NothingHash

main :: IO ()

main = print (f (JustHash 100#))

```

and compiled with `-O2 -ddump-simple` it produces the core:

- Entry point: `main`

```

main

  = GHC.IO.Handle.Text.hPutStr'

      GHC.IO.Handle.FD.stdout

      GHCMaybeIntNonTailRecursive.main1

      GHC.Types.True

```

- `GHCMaybeIntNonTailRecursive.main1`: calls `GHCMaybeIntNonTailRecursive.main_$sf` and then does the printing work here.

```

GHCMaybeIntNonTailRecursive.main1

  = case GHCMaybeIntNonTailRecursive.main_$sf 100# of {

      JustHash b1_aLr ->

        ++

          @ Char

          GHCMaybeIntNonTailRecursive.$fShowMaybeHash6

          (case GHC.Show.$wshowSignedInt

                  0# b1_aLr GHCMaybeIntNonTailRecursive.$fShowMaybeHash8

           of

           { (# ww5_a1RD, ww6_a1RE #) ->

           GHC.Types.: @ Char ww5_a1RD ww6_a1RE

           });

      NothingHash -> GHCMaybeIntNonTailRecursive.$fShowMaybeHash3

    }

```

- `GHCMaybeIntNonTailRecursive.main_$sf`: GHC does not optimize this. It just bakes a stupid

   recursive call. It's unable to prove that the wrapper in un-necessary!

   Please let us be able to prove this using `SCEV`ness?

```

GHCMaybeIntNonTailRecursive.main_$sf [Occ=LoopBreaker]

  :: Int# -> MaybeHash

[GblId, Arity=1, Caf=NoCafRefs, Str=, Unf=OtherCon []]

GHCMaybeIntNonTailRecursive.main_$sf

  = \ (sc_s1X4 :: Int#) ->

      case sc_s1X4 of ds_d1IV {

        __DEFAULT ->

          case GHCMaybeIntNonTailRecursive.main_$sf (-# ds_d1IV 1#) of {

            JustHash j#_aHM ->

              GHCMaybeIntNonTailRecursive.JustHash (+# j#_aHM 7#);

            NothingHash -> GHCMaybeIntNonTailRecursive.NothingHash

          };

        0# -> lvl_r1XW

      }

```

- TODO for tomorrow: write equivalent C code and see what LLVM generates.

# Thursday, Nov 26th

- it seems like using `clang++` is **mandatory** to get correct builds with MLIR. When

  anurudh was attempting to compile the project, we were getting divergent results

  till we both standardized on clang. Super super weird. It seems like `g++` miscompiles.

# Wednesday, Nov 25

- I need to allow my `Identifier` to keep pointers to values...

- I want to read the rust implementation of [pretty error message printing](https://github.com/rust-lang/rust/blob/master/compiler/rustc_errors/src/emitter.rs)

- And the [Elm error reporting code](https://github.com/elm/compiler/blob/master/compiler/src/Reporting/Error/Main.hs)

OK, I'm using the program

```hs

module Main where (main)

main = print (sum ([1..4040] :: [Int]))

```

# Mon, Nov 23

- Function body: see that `%arg0` v/s `%arg1`:

```

  func @factorial(%arg0: !lz.thunk) -> !lz.value {

    %0 = "lz.caseint"(%arg1) ( {

      %c1_i64 = constant 1 : i64

      %2 = "lz.construct"(%c1_i64) {dataconstructor = @SimpleInt} : (i64) -> !lz.value

      return %2 : !lz.value

    },  {

      %c1_i64 = constant 1 : i64

      %2 = subi %arg1, %c1_i64 : i64

      %3 = "lz.ref"() {sym_name = "factorial"} : () -> !lz.fn<(i64) -> i64>

      %4 = "lz.ap"(%3, %2) : (!lz.fn<(i64) -> i64>, i64) -> !lz.thunk

      %5 = "lz.force"(%4) : (!lz.thunk) -> i64

      %6 = "lz.ref"() {sym_name = "mulSimpleInt"} : () -> !lz.fn<(i64, i64) -> i64>

      %7 = "lz.ap"(%6, %arg1, %5) : (!lz.fn<(i64, i64) -> i64>, i64, i64) -> !lz.thunk

      return %7 : !lz.thunk

    }) {alt0 = 0 : i64, alt1 = @default} : (i64) -> i64

    return %0 : i64

    %1 = "lz.caseint"(%arg0) ( {

    ^bb0(%arg1: i64):  // no predecessors

    }) {alt0 = @SimpleInt} : (!lz.thunk) -> i64

    return %1 : i64

}

```

```

unable to find key: ||

```

```

owning block:

^bb0(%arg1: i64):  // no predecessors

```

```

owning op:

%1 = "lz.caseint"(%arg0) ( {

^bb0(%arg1: i64):  // no predecessors

}) {alt0 = @SimpleInt} : (!lz.thunk) -> i64

```

- I have no idea where it hallucinates a `%arg1` attached to the basic block?

  Why does it do this?

- I suspect it's because it comes from a `match fc of SimpleInt(..) => ... `

  which means that we have a `SimpleInt` we are matching on, which tries to

  generate an identifier for the case value. However, the fact that this

  region argument is not printed now worries me.

- FIXED! wasn't moving builder to the correct location.

- TODO: 0. add a custom `caseint` into the surface lang to quickly check that our

           codegen *actually* works.

- TODO: 1. get type info to decide if I need a `caseint` or a `caseconstrcutor`.

- TODO: 2. Track types in the surface lang to have enough info to deduce this.

# Friday: Nov 11th

- https://dl.acm.org/doi/abs/10.1145/99583.99590

- Goal: code generate examples I have, along with new `vector.rs`.

- What do we do with mutually recursive lambdas? x( This kind of thing is annoying.

- Make strictness annotations "less strict". Strictness annotations have

  a side-effect. We don't have side effects on strictness. They have an 

  effect on performance. 

# Thursday, Nov 10th

- Stuff GHC could do better: Nested CPR, deeper dataflow analysis,

  data parallel haskell (Manuel Charkravarty, Jeff Mainland) : Stream fusion with

  packetization, optimizing using laziness. Refcounting?

  For debugging, can compile in slow path; due to purity, allows for

  precise effect tracking. Stream fusion was important.

  What do I call my semantics? There's no easy way to defined the semantics

  that we have in mind, we can only provide an /operational/ description.

- Argument order matters for worker/wrapper, because GHC can only partially

  apply functions in the worker/wrapper, and not reorder parameters. So if we

  have `f x y` where `x` is reused, we can worker/wrapper around `y`.

# Friday. Nov 6th

- [Core Spec](https://gitlab.haskell.org/ghc/ghc/-/blob/master/docs/core-spec/core-spec.pdf)

```

data X = X1 !Int | X2 !Char

foo :: X -> X;

foo x = case x of X1 i -> X1 (i * 2); X2 c = X2 (c + 'a')

(%x1, %x2) = lz.variant(%x)

%x1_plus_1 = add(%x1, 1)

%y1 = lz.construct(@X1, %x1_plus_1)

%x2_plus_a = add(%x1, 97) -- ord(a) = 97

%y2 = lz.construct(@X2, %x2_plus_a)

%out = lz.union(%y1, %y2)

```

# Monday, Nov 2nd

#### From `fast-math`:

> How does this interact with LLVM?  The LLVM backend can perform a number of

> these optimizations for us as well if we pass it the right flags. It does not

> perform all of them, however. (Possibly GHC's optimization passes remove the

> opportunity?) In any event, executables from the built-in code generator and

> llvm generator will both see speed improvements.

- [GHC page on rewrite rules has more food for thought](https://wiki.haskell.org/GHC/Using_rules)

# Friday, Oct 30th

# Thursday, Oct 29th

- Email arnaud, we're talking Nov 10th, maybe? He's busy because of his

  haskell exchange talk. Until then, I can shore up my code and implement

  things properly.

- Getting a generic MLIR printer infrastructure up in Haskell. Will use it for

  my GHC plugin, as well as to optimize cherry picked examples from `Data.Vector.Unboxed`

- I maybe able to perform [`freeJIT`](https://github.com/bollu/freejit) now that MLIR exists!

```

module {

  hask.func @two () -> !hask.adt<@SimpleInt>  {

     %v = hask.make_i64(2)

     %boxed = hask.construct(@SimpleInt, %v:!hask.value) : !hask.adt<@SimpleInt> 

     hask.return(%boxed): !hask.adt<@SimpleInt> 

  }

  hask.func @main (%v: !hask.adt<@SimpleInt>, %wt: !hask.thunk>) -> !hask.adt<@SimpleInt> {

      %number = hask.make_i64(0 : i64)

      %reti = hask.case @SimpleInt %v 

           [@SimpleInt -> { ^entry(%ival: !hask.value):

              %w = hask.force(%wt):!hask.adt<@SimpleInt>

               %number43 = hask.make_i64(43 : i64)

               hask.return(%number43):!hask.value

            }]

            [@default -> { 

               ^entry:

                   %number42 = hask.make_i64(42 : i64)

                   hask.return(%number42):!hask.value

            }]

      %two = hask.ref(@two):  !hask.fn<() -> !hask.adt<@SimpleInt>>

      %twot = hask.ap(%two :  !hask.fn<() -> !hask.adt<@SimpleInt>> )

      hask.return(%reti) : !hask.value

  }

}

```

```

module {

  "hask.func"() ( {

    %0 = "hask.make_i64"() {value = 2 : i64} : () -> !hask.value

    %1 = "hask.construct"(%0) {dataconstructor = @SimpleInt} : (!hask.value) -> !hask.adt<@SimpleInt>

    "hask.return"(%1) : (!hask.adt<@SimpleInt>) -> ()

  }) {retty = !hask.adt<@SimpleInt>, sym_name = "two"} : () -> ()

  "hask.func"() ( {

  ^bb0(%arg0: !hask.adt<@SimpleInt>, %arg1: !hask.thunk>):  // no predecessors

    %0 = "hask.make_i64"() {value = 0 : i64} : () -> !hask.value

    %1 = "hask.case"(%arg0) ( {

    ^bb0(%arg2: !hask.value):  // no predecessors

      %4 = "hask.force"(%arg1) : (!hask.thunk>) -> !hask.adt<@SimpleInt>

      %5 = "hask.make_i64"() {value = 43 : i64} : () -> !hask.value

      "hask.return"(%5) : (!hask.value) -> ()

    },  {

      %4 = "hask.make_i64"() {value = 42 : i64} : () -> !hask.value

      "hask.return"(%4) : (!hask.value) -> ()

    }) {alt0 = @SimpleInt, alt1 = @default, constructorName = @SimpleInt} : (!hask.adt<@SimpleInt>) -> !hask.value

    %2 = "hask.ref"() {sym_name = "two"} : () -> !hask.fn<() -> !hask.adt<@SimpleInt>>

    %3 = "hask.ap"(%2) : (!hask.fn<() -> !hask.adt<@SimpleInt>>) -> !hask.thunk>

    "hask.return"(%1) : (!hask.value) -> ()

  }) {retty = !hask.adt<@SimpleInt>, sym_name = "main"} : () -> ()

}

```

# Wednesday, Oct 28th

- `PeelCommonConstructorsInCase` miscompiles `:(`

- OK I am more and more sure that this is an MLIR bug. When I rewrite the IR,

  the type of the result does not change?!

- **Old**:

- **New**: [It thinks the type is still `hask.adt`!]

```

%1 = hask.case @Maybe %0 [@Nothing ->  {

  %3 = hask.make_i64(0 : i64)

  hask.return(%3) : !hask.value

}]

 [@Just ->  {

^bb0(%arg1: !hask.value):  // no predecessors

  %3 = hask.make_i64(1 : i64)

  hask.return(%3) : !hask.value

}]

!hask.adt<@Maybe>

```

- So it seems that `getType()` returns whatever the type was *at construction*.

  So my semantics of the 'return type' of a `case` is actually based on what

  the branches return. MLIR has no notion of this, though. So if you ever have

  anything that returns, you should make the return type some kind of attribute,

  and not *infer* it.

- In fact, I'm not even sure that that suffices. I might have to build an entirely

  new instruction just to fix the result type of the `case`?

- This is beyond fucked. 

- Got some paper writing done!

- Reading the generic parsing code to write a small haskell API for it, I'm done

  gluing the fucking printing together by hand...

  [parseGenericOperation](https://github.com/llvm/llvm-project/blob/735ab4be35695df9f9da7ae8b584cec28eabf1fe/mlir/lib/Parser/Parser.cpp#L727)

```hs

import Data.Vector.Unboxed as V

-- a * x + b

a, x, b :: Vector Int

a = fromList [1, 2, 3, 4, 5, 6, 7, 8, 9]

x = fromList [3, 1, 4, 1, 5, 1, 6, 1, 7]

b = fromList [10, 20, 30, 40, 50, 60, 70]

outv = V.zipWith (+) (V.zipWith (*) a x) b

outf = V.foldl (+) 0 outv

main :: IO ()

main = print outv >> print outf

```

GHC performs no constant folding on this. On the other hand, MLIR should be able

to reduce the above program to a single constant

# Friday, Oct 23rd

- It looks like the dinky pass I wrote, with bugs fixed, can actually eliminate

  all laziness in the toy examples I have.

- Next, I'm going to implement elimnating boxing. So I can 'unwrap' a function

  that uses `SimpleInt int#` (with no laziness, mind you, not `thunk`

  into a function that uses only a `int#`. Let's see how well this does.

- MLIR TODO: Add `arg.getSingleUse()` API

- MLIR TODO: Add `getNumArguments()` and `getArgument(int i)` API to any `callable`.

- Consider making `case` a terminator of a block? Seems to make a lot of rewrites

  way easier. Not sure.

- I think I have a good reason to make a `hask.case` instruction a terminator. I can be sure

  that I can transform :

```

====

hask.func @f {

^bb0(%arg0: !hask.thunk>):  // no predecessors

  %0 = hask.force(%arg0):!hask.adt<@SimpleInt>

  %1 = hask.case @SimpleInt %0 [@SimpleInt ->  {

  ^bb0(%arg1: !hask.value):  // no predecessors

    %2 = hask.caseint %arg1 [0 : i64 ->  {

      %3 = hask.make_i64(5 : i64)

      %4 = hask.construct(@SimpleInt, %3 : !hask.value) : !hask.adt<@SimpleInt>

      hask.return(%4) : !hask.adt<@SimpleInt>

    }]

 [@default ->  {

      %3 = hask.make_i64(1 : i64)

      %4 = hask.primop_sub(%arg1,%3)

      %5 = hask.construct(@SimpleInt, %4 : !hask.value) : !hask.adt<@SimpleInt>

      %6 = hask.thunkify(%5 :!hask.adt<@SimpleInt>):!hask.thunk>

      %7 = hask.ref(@f) : !hask.fn<(!hask.thunk>) -> !hask.adt<@SimpleInt>>

      %8 = hask.apEager(%7 :!hask.fn<(!hask.thunk>) -> !hask.adt<@SimpleInt>>, %6)

      %9 = hask.ap(%7 :!hask.fn<(!hask.thunk>) -> !hask.adt<@SimpleInt>>, %6)

      %10 = hask.case @SimpleInt %8 [@SimpleInt ->  {

      ^bb0(%arg2: !hask.value):  // no predecessors

        %11 = hask.make_i64(1 : i64)

        %12 = hask.primop_add(%arg2,%11)

        %13 = hask.construct(@SimpleInt, %12 : !hask.value) : !hask.adt<@SimpleInt>

        hask.return(%13) : !hask.adt<@SimpleInt>

      }]

      hask.return(%10) : !hask.adt<@SimpleInt>

    }]

    hask.return(%2) : !hask.adt<@SimpleInt>

  }]

  hask.return(%1) : !hask.adt<@SimpleInt>

}

```

easily. If `hask.case` is a terminator, then I can be sure that my transform

# Thursday, Oct 22nd

- [GPU Outlining function](https://github.com/llvm/llvm-project/blob/366d8435b41dcc01013c507681523c65cdee2180/mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp#L234)

- I'm going to write an outlining pass so I can perform my outline rewrites.

- Amazing, so MLIR hangs on trying to print my newly minted outlined function,

and the backtrace is at:

```

0x00005555565c4854 in mlir::Block::getParentOp() ()

(gdb) bt

#0  0x00005555565c4854 in mlir::Block::getParentOp() ()

#1  0x00005555565b5da6 in mlir::Operation::print(llvm::raw_ostream&, mlir::OpPrintingFlags) ()

#2  0x0000555556437dd7 in mlir::OpState::print (this=0x7fffffffd728, os=..., flags=...) at /usr/local/include/mlir/IR/OpDefinition.h:127

#3  0x0000555556437e34 in mlir::operator<< (os=..., op=...) at /usr/local/include/mlir/IR/OpDefinition.h:265

#4  0x0000555556454911 in mlir::standalone::OutlineUknownForcePattern::matchAndRewrite (this=0x555558f95bd0, force=..., rewriter=...)

    at /home/bollu/work/mlir/coremlir/lib/Hask/WorkerWrapperPass.cpp:127

#5  0x00005555564597ec in mlir::OpRewritePattern::matchAndRewrite (this=0x555558f95bd0, op=0x555558f918a0, rewriter=...)

    at /usr/local/include/mlir/IR/PatternMatch.h:213

#6  0x000055555662c27b in mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::RewritePattern const&, mlir::PatternRewriter&, llvm::function_ref, llvm::function_ref, llvm::function_ref) ()

#7  0x000055555662c58f in mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref, llvm::function_ref, llvm::function_ref) ()

#8  0x0000555556785f6c in mlir::applyPatternsAndFoldGreedily(llvm::MutableArrayRef, mlir::OwningRewritePatternList const&) ()

#9  0x000055555645532a in mlir::standalone::WorkerWrapperPass::runOnOperation (this=0x555558f189e0)

    at /home/bollu/work/mlir/coremlir/lib/Hask/WorkerWrapperPass.cpp:223

#10 0x000055555667f0a2 in mlir::Pass::run(mlir::Operation*, mlir::AnalysisManager) ()

#11 0x000055555667f182 in mlir::OpPassManager::run(mlir::Operation*, mlir::AnalysisManager) ()

#12 0x0000555556687a96 in mlir::PassManager::run(mlir::ModuleOp) ()

#13 0x0000555555881eae in main (argc=4, argv=0x7fffffffe4e8) at /home/bollu/work/mlir/coremlir/hask-opt/hask-opt.cpp:157

(gdb) Quit

```

- (1) I'm not even sure anymore that I should be doing this in a `RewritePattern`, because I'm not

  actually going to be deleting the `force`. Rather, I'm going to be replacing stuff

  that follows the `force` with other stuff. So I should really be using an

  MLIR pass

- (2) Alternatively, I should in fact rewrite at the `ApEagerOp` by noticing that

  it is a function call, and then checking if the argument is being forced etc.

- I'm going to try the (2) option, since it seems more local-rewrite-y,

  and it seems too painful to attempt to write a `Pass`.

- Amazing, so I now have outlining that works, but it now crashes inside `PatternRewriter.h`:

```

hask-opt: /usr/local/include/llvm/Support/Casting.h:269: typename llvm::cast_retty::ret_type llvm::cast(Y*) [with X = mlir::standalone::ApEagerOp; Y = mlir::Operation; typename llvm::cast_retty(Val) && \"cast() argument of incompatible

    file=file@entry=0x555557fd6168 "/usr/local/include/llvm/Support/Casting.h", line=line@entry=269,

    function=function@entry=0x555557fd85e0 ::ret_type llvm::cast(mlir::Operation*)::__PRETTY_F

ir::standalone::ApEagerOp; Y = mlir::Operation; typename llvm::cast_retty::ret_type = mlir::standalone::ApEagerOp]") at assert.c:92

#3  0x00007ffff660a4a2 in __GI___assert_fail (assertion=0x555557fd6198 "isa(Val) && \"cast() argument of incompatible type!\"", file=0x555557fd6168 "/usr/local/include/llvm/Support/Casting.h", line

    function=0x555557fd85e0 ::ret_type llvm::cast(mlir::Operation*)::__PRETTY_FUNCTION__> "typ

:ApEagerOp; Y = mlir::Operation; typename llvm::cast_retty::ret_type = mlir::standalone::ApEagerOp]") at assert.c:101

#4  0x0000555556418afd in llvm::cast (Val=0x555558f1adf0) at /usr/local/include/llvm/Support/Casting.h:269

#5  0x0000555556459e0d in mlir::OpRewritePattern::matchAndRewrite (this=0x555558f98a50, op=0x555558f1adf0, rewriter=...) at /usr/local/include/mlir/IR/PatternMatch.h:213

#6  0x000055555662ca4b in mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::RewritePattern const&, mlir::PatternRewriter&, llvm::function_ref, llvm::func

lt (mlir::RewritePattern const&)>) ()

#7  0x000055555662cd5f in mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref, llvm::function_ref) ()

#8  0x000055555678673c in mlir::applyPatternsAndFoldGreedily(llvm::MutableArrayRef, mlir::OwningRewritePatternList const&) ()

#9  0x00005555564558aa in mlir::standalone::WorkerWrapperPass::runOnOperation (this=0x555558f189e0) at /home/bollu/work/mlir/coremlir/lib/Hask/WorkerWrapperPass.cpp:316

#10 0x000055555667f872 in mlir::Pass::run(mlir::Operation*, mlir::AnalysisManager) ()

#11 0x000055555667f952 in mlir::OpPassManager::run(mlir::Operation*, mlir::AnalysisManager) ()

#12 0x0000555556688266 in mlir::PassManager::run(mlir::ModuleOp) ()

#13 0x0000555555881eae in main (argc=4, argv=0x7fffffffe4e8) at /home/bollu/work/mlir/coremlir/hask-opt/hask-opt.cpp:157

```

- [The suspect code is `mlir/IR/PatternMatch.h:213`](https://github.com/llvm/llvm-project/blob/63c58c2b934525c9863e624cf39ec542dd84ca78/mlir/include/mlir/IR/PatternMatch.h#L212):

```

LogicalResult matchAndRewrite(Operation *op,

                              PatternRewriter &rewriter) const final {

  return matchAndRewrite(cast(op), rewriter);

}

```

- I'm at MLIR commit [63c58c2](https://github.com/llvm/llvm-project/commit/63c58c2b934525c9863e624cf39ec542dd84ca78).

- This maybe because of my assumption that `failure()` was supposed to undo *all* intermediate changes.

  Maybe there's a bug in the bail-out infrastructure, because this bug happens when / after

  a bail out in my pattern.

- OK, so it's either me mis-understanding the invariant of `failure()`, or there's an MLIR bug where you can't

  back out with a `failure()` in the middle of a transform.

- I "fixed" the bug by [moving all my checking code to the beginning in commit 7c90bd](7c90bdc66ce8fad833d45061833150d4aa0dca72)

# Wednesday, Oct 21st

- Power stable again, yay!

- [It seems like MLIR's inlining infrastructure isn't "up" yet?](https://github.com/joker-eph/mlir/pull/3#issuecomment-538687366)

- The commut is a year old. We seem to have a `CallInterface` now. It's unclear what the correct way to call the

  thing, though.

- [Seems I need to talk to `DialectInlinerInterface`](https://github.com/llvm/llvm-project/blob/22219cfc6a2a752c53238df4ceea342672392818/mlir/include/mlir/Transforms/InliningUtils.h)

- [Toy Ch4](https://github.com/llvm/llvm-project/blob/1b012a9146b85d30083a47d4929e86f843a5938d/mlir/docs/Tutorials/Toy/Ch-4.md)

- [LEAN header file for runtime](https://github.com/leanprover/lean4/blob/master/src/include/lean/lean.h)

# Friday, Oct 16th

- Can we do demand analysis by phrasing it as a dependence analysis problem (RAW?)

- The workhorse was SCEV, which allows us to recover loops

- The workhorse of that was definition of a natural loop, which told us what

  types of programs we can analyze

- What is the functional equivalent of a natural loop?

- The naive guess is "tail calls". I'm not so sure. Consider the loop:

```cpp

sum = 0; for(int i = n; i > 0; i--) { sum += i*i ; }

```

- versus the haskell program:

```hs

f 0 = 0; f n = n*n + f (n - 1)

```

- The above is 'clearly' a natural loop, while the program below is not. What gives?

- We can transform the above into accumulator style:

```hs

f 0 k = k; f n k = f (n-1) (k + n*n)

```

- When can we convert something into accumulator style? How do we know how to

  convert something into accumulator style?

- Naively, I feel that this involves something about 'destination passing style'.

  We first go from:

```hs

f 0 = 0; f n = n*n + f(n-1)

```

- into destination passing style:

```hs

f 0 slot = write slot 0;

f n slot = do f (n - 1) slot; out <- read slot; write (n*n + out) slot

```

- which is then purified into:

```

f 0 slot = 0

f n slot = f (n - 1) (slot + n*n)

```

- Of course, this is all incohate rambling.

# Thursday, Oct 15th

- Wow, another amazing nit:

``` cpp

Type retty = 

  this->getAttrOfType(HaskFuncOp::getReturnTypeAttributeKey())

    .getType();

// retty will be null!

```

- The correct invocation is:

```cpp

Type retty = 

  this->getAttrOfType(HaskFuncOp::getReturnTypeAttributeKey())

  .getValue();

```

- Because the _value_ of the `TypeAttr` is the type. The `Type` is `none`! It's

  forced to have a `Type` because, well, that's how inheritance works. It should

  just return `Type` so we have `Type : Type` and we're set :)

# Friday Oct 9th 

- [Meeting docs](https://docs.google.com/document/d/1tbeqlwunRKomN8WdfxuCJxVQUMuLd5kRqpIsGxF-w6o/preview)

#### Compiling without continuations

 - [Compiling without continuations video](https://www.youtube.com/watch?v=LMTr8yw0Gk4)

We might intially be tempted to convert

 ```hs

 case (case xs of [] -> T; _ -> F) of

   T -> BIG1; F -> BIG2

 ```

 into:

 

 ```hs

 case xs of

   [] -> case T of T -> BIG1; F -> BIG2

   _ -> case F of T -> BIG1; F -> BIG2

 ```

 of course this involves copying. so we should rather transform

 this into

 ```

let j1 () = BIG1; j2 () = BIG2

in case xs of

     [] -> case T of T -> j1 (); F -> j2 ()

     _ -> case F of T -> j1 (); F -> j2 ()

 ```

 Essentially, they once again outline code into a common

 names called `(j1, j2)` and then convert the rest into

 function invocations.

 Clearly this also works for pattern bound variables. We can transform:

 ```hs

 case (case xs of [] -> Nothing; (p:ps) -> Just p) of

   Nothing -> BIG1; Just x -> BIG2 x

 ```

 into:

 ```

 let j1 () = BIG11; j2 x = BIG2 x

 in case (case xs of [] -> Nothing; (p:ps) -> Just p) of

      Nothing -> j1; Just x -> j2 x

 ```

#### What is a join point?

 - All calls are saturated tail calls, 

 - They are not captured in a thunk/closure, so they can be compiled

   efficiently

#### We don't want to lose join points: A bad transformation example

We case-of-case on this program:

```hs

case (let j x = E1

       in case xs of Just x -> j x; Nothing -> E2) of

  True -> R1; False - R2

```

to get this program:

```hs

let j x = E1

in case xs of 

  Just x ->  case j x of True -> R1; False -> R2 

  Nothing -> case E2 of  True -> R1; False -> R2

```

- Note that this `j x` is now case-scrutinized, and is thus not a tail. The

   R1/R2 case does not actually use `E1`?

- So what we do is to perform this transformation:

#### Join based:

- Original `let` based starting program, deprecated, shown for comparison:

```hs

-- | original `let`

case (let j x = E1

       in case xs of Just x -> j x; Nothing -> E2) of

  True -> R1; False - R2

```

- New starting program with `let` changed to `join!`. We are yet to sink the

  `case` inside.

```hs

-- | original with `join!` instead of `let`

case (join! j x = E1

       in case xs of Just x -> j x; Nothing -> E2) of

  True -> R1; False - R2

```

- Since we have a `Just x -> j x` where `j = join! ... `,

  we're going to try to preserve the tail call `j x`.

  when we push the outer `case` inside, (1) we don't push the `case` *around* the `join`.

  Rather, we push the `case` _into_ the `join!`. (2) we push the `case` around 

  the `Nothing -> E2` as usual. This gives us the program:

```hs

-- | case pushed inside origin with `join!`

join j x = case E1 of  True -> R1; False -> R2

in case xs of 

    Just x -> j x; 

    Nothing -> case E2 of True -> R1; False -> R2

```

- Peyton jones says that "this slide is the **slide to remember**"

- (1) We want to move the outer evaluation context into the body of the join-point.

- (2) For E2, since the body eats `E2`, we push it in.

- Formalize join points as a language construct.

- Add join-point bindings and jumps into the language.

- This has deep relationships to [sequent calculus](https://ncatlab.org/nlab/show/sequent+calculus)

- Infer which `let` bindings are join-points: `contification` is the keyword

  to look for.

- Automagically allows `Stream`s to fuse without needing an extraneous `Skip`

  constructor. Don't know what this is referring to.

# Friday Oct 2nd 2020

- [Meeting documentation](https://docs.google.com/document/d/1JD2RgNbRoztiuSQtN8IskyaYDF4Yd9WWjwecyyVzum4/edit?usp=sharing)

#### What is a loop-breaker?

- [Taken from `mpickering`'s blog](https://mpickering.github.io/posts/2017-03-20-inlining-and-specialisation.html)

> In general, if we were to inline recursive definitions without care we could

> easily cause the simplifier to diverge. However, we still want to inline as

> many functions which appear in mutually recursive blocks as possible. GHC

> statically analyses each recursive groups of bindings and chooses one of them

> as the loop-breaker. Any function which is marked as a loop-breaker will

> never be inlined. Other functions in the recursive group are free to be

> inlined as eventually a loop-breaker will be reached and the inliner will

> stop.

He continues to write:

> Sometimes people ask if GHC is smart enough to unroll a recursive definition

> when given a static argument. For example, if we could define sum using

> direct recursion:

```hs

sum :: [Int] -> Int

sum [] = 0

sum (x:xs) = x + sum xs

```

- I have no idea if this continues to be the case.

- EDIT: I do know! I implemented the above program. GHC still has

  this behaviour, so the above program does not become a single constant.

# Tuesday, Sep 29 2020

- Apparently, I can't print a `mlir::Value` from an `mir::InFlightDiagnostic`.

- `mlir::Value` does not implement a `<`, so you can't use it as a key in a `std::map` for a

  decent interpreter.

# Friday, Sep 25 2020

- [Meeting google doc link](https://docs.google.com/document/d/10cgXbXME0D_SV0VJTrQrz0obhUBa5kdM74crWDXbDgU/edit?usp=sharing)

- [GHC was unable to optimise a top level list!](https://docs.google.com/spreadsheets/d/1YhZlDRGvnCtN8UQf_0ItmgRWI9MhL21HDTlBEKqgWHc/edit?usp=sharing)

- GHC is unable to remove laziness from `data A = B | C | D`: there is no way

  to ask for this to be unboxed.

- https://www.scs.stanford.edu/16wi-cs240h/slides/ghc-compiler.html

# Thursday, Sep 24 2020

- [What optimizations can GHC be expected to perform reliably](https://stackoverflow.com/questions/12653787/what-optimizations-can-ghc-be-expected-to-perform-reliably)

Also, it seems I was wrong. Haskell only guarantees non-strict (call by name),

not lazy (call by need):

> The language spec promises non-strict semantics; it does not promise anything

> about whether or not superfluous work will be performed  ~ Dan Burton

- [sketch of worker wrapper](reading/sep-24-worker-wrapper-sketch.md)

# Wednesday, Sep 23 2020

```

%0 = hask.lambda(%arg0:!hask.value) {

  %1 = hask.transmute(%arg0 :!hask.value):i64

  %2 = hask.caseint %1 [0 : i64 ->  {

  ^bb0(%arg1: i64):  // no predecessors

    %3 = hask.transmute(%1 :i64):!hask.value

    hask.return(%3) : !hask.value

  }]

  ...

running TransmuteOpConversionPattern on: hask.transmute | loc("./case-int-roundtrip.mlir":7:12)

transmute:%0 = hask.transmute(<> :!hask.value):i64

in: 

inRemapped: 

inType:!hask.value

```

- I find this `<>` thing extremely tiresome.

  It makes debugging way harder than it ought to be.

- Strangely, when I try to print the `in`put directly, it says ``

  which is SO MUCH MORE HELPFUL! It would be evern more helpful if it says *which block* argument.

- I also don't understand how to print _regions_ in MLIR. Region can't be `llvm::errs() << region`,

  nor do they have a `dump()` method. This is garbage.

- I also don't understand how to print a basic block correctly. You can't

  `llvm::errs() << *bb`. Fortunately, at least basic block has a `dump()`. 

- Unfortunately, this `dump()` is less than helpful when you are moving BBs around. For exmple,

  on trying to print:

```cpp

Block *bb = new Block();

llvm::errs() << "newBB:"; bb->dump();

``` 

it says:

```

newBB: <>

```

what the hell kind of answer is that? just print the BB! So, if one has a block that's unlinked to a Region, you can't

even _print_ the block! 

- It doesn't [seem like `addTargetMaterialization` is used a lot?](https://github.com/llvm/llvm-project/search?q=addTargetMaterialization)

  only one "real" use in `StandardToLLVM.cpp`. I have strange errors:

  

```

case-int.mlir:10:14: error: failed to materialize conversion for result #0 of operation 'hask.transmute' that remained live after conversion

     %ival = hask.transmute(%ihash : !hask.value): i64

             ^

case-int.mlir:10:14: note: see current operation: %1 = "hask.transmute"(<>) : (!hask.value) -> i64

case-int.mlir:10:14: note: see existing live user here: %6 = llvm.inttoptr %1 : i64 to !llvm.ptr

```

The materialization code is:

```cpp

    addTargetMaterialization([](OpBuilder &builder, LLVM::LLVMIntegerType, ValueRange vals, Location loc) {

      if (vals.size() > 1) {

        assert(false && "trying to lower more than 1 value into an integer");

      }

      Value in = vals[0];

      Value out = builder.create(loc, LLVM::LLVMType::getInt64Ty(builder.getContext()), in).getResult();

      return out;

    });

```

- I'm quite confused abot why the result is live after conversion, isn't the fucking framework supposed to kill the result?

- OK, the sequence of calls is _very weird_. It's as follows:

```

---materialization %0 = hask.make_i64(42 : i64) ->  pointer

running TransmuteOpConversionPattern on: hask.transmute | loc("playground.mlir":9:17)

transmute:%2 = hask.transmute(%0 :!hask.value):i64

in: %0 = hask.make_i64(42 : i64)

inRemapped: %0 = hask.make_i64(42 : i64)

inType:!hask.value

convert(inType):!llvm.ptr

retty:i64

rettyRemapped:!llvm.i64

---materialization %0 = hask.make_i64(42 : i64) ->  int

ret: %2 = llvm.ptrtoint %0 : !hask.value to !llvm.i64

===mod:==

llvm.func @main() -> !llvm.ptr {

  %0 = hask.make_i64(42 : i64)

  %1 = llvm.inttoptr %0 : !hask.value to !llvm.ptr

  %2 = llvm.ptrtoint %0 : !hask.value to !llvm.i64

  %3 = hask.transmute(%0 :!hask.value):i64

  hask.return(%3) : i64

}

playground.mlir:9:17: error: failed to materialize conversion for result #0 of operation 'hask.transmute' that remained live after conversion

        %ival = hask.transmute(%lit_42 : !hask.value): i64

                ^

playground.mlir:9:17: note: see current operation: %3 = "hask.transmute"(%0) : (!hask.value) -> i64

playground.mlir:10:9: note: see existing live user here: hask.return(%3) : i64

        hask.return(%ival) : i64

```

So it: 

1. tries to materialize `make_i64` using the target conversion pattern 

2. THEN asks me to lower transmute

3. where I lower the input using `%2 = llvm.ptrtoint %0 : !hask.value to !llvm.i64`

4. I then call `replaceOp(transmute, ret)`, but for whatever reason, that doesn't take!

5. It complains about `failed to materialize conversion for result #0 of operation 'hask.transmute'`??? what does that

   fucking _mean_? You shouldn't even _have_ a `hask.transmute`! I asked you to _replace_ it! WTF.

So even before I start to lower `transmute`, the target conversion pattern has decided that I need

to lower the `i64`, because I don't have a `makeI64ConversionPattern` enabled? It then complains that the

result 

A backtrace shows:

```

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51

#1  0x00007ffff661a8b1 in __GI_abort () at abort.c:79

#2  0x00007ffff660a42a in __assert_fail_base (fmt=0x7ffff6791a38 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x555557e44738 "false && \"want to see backtrace\"",

    file=file@entry=0x555557e44048 "/home/bollu/work/mlir/coremlir/lib/Hask/HaskOps.cpp", line=line@entry=1182,

    function=function@entry=0x555557e4d200  "mlir::standalone::HaskToLLVMTypeConverter::HaskToLLVMTypeConverter(mlir::MLIRContext*)::") at assert.c:92

#3  0x00007ffff660a4a2 in __GI___assert_fail (assertion=0x555557e44738 "false && \"want to see backtrace\"", file=0x555557e44048 "/home/bollu/work/mlir/coremlir/lib/Hask/HaskOps.cpp",

    line=1182,

    function=0x555557e4d200  "mlir::standalone::HaskToLLVMTypeConverter::HaskToLLVMTypeConverter(mlir::MLIRContext*)::") at assert.c:101

#4  0x0000555556398f10 in mlir::standalone::HaskToLLVMTypeConverter::HaskToLLVMTypeConverter(mlir::MLIRContext*)::{lambda(mlir::OpBuilder&, mlir::LLVM::LLVMPointerType, mlir::ValueRange, mlir::Location)#5}::operator()(mlir::OpBuilder&, mlir::LLVM::LLVMPointerType, mlir::ValueRange, mlir::Location) const (__closure=0x7fffffffd440, builder=..., ptrty=..., vals=..., loc=...)

    at /home/bollu/work/mlir/coremlir/lib/Hask/HaskOps.cpp:1182

#5  0x00005555563a5948 in std::function (mlir::OpBuilder&, mlir::Type, mlir::ValueRange, mlir::Location)> mlir::TypeConverter::wrapMaterialization(mlir::standalone::HaskToLLVMTypeConverter::HaskToLLVMTypeConverter(mlir::MLIRContext*)::{lambda(mlir::OpBuilder&, mlir::LLVM::LLVMPointerType, mlir::ValueRange, mlir::Location)#5}&&)::{lambda(mlir::OpBuilder&, llvm::Optional, mlir::ValueRange, mlir::Location)#1}::operator()(mlir::OpBuilder&, llvm::Optional, mlir::ValueRange, mlir::Location) const

    (__closure=0x7fffffffd440, builder=..., resultType=..., inputs=..., loc=...) at /usr/local/include/mlir/Transforms/DialectConversion.h:288

#6  0x00005555563adfa5 in std::_Function_handler (mlir::OpBuilder&, mlir::Type, mlir::ValueRange, mlir::Location), std::function (mlir::OpBuilder&, mlir::Type, mlir::ValueRange, mlir::Location)> mlir::TypeConverter::wrapMaterialization(mlir::standalone::HaskToLLVMTypeConverter::HaskToLLVMTypeConverter(mlir::MLIRContext*)::{lambda(mlir::OpBuilder&, mlir::LLVM::LLVMPointerType, mlir::ValueRange, mlir::Location)#5}&&)::{lambda(mlir::OpBuilder&, mlir::Type, mlir::ValueRange, mlir::Location)#1}>::_M_invoke(std::_Any_data const&, mlir::OpBuilder&, mlir::Type&&, mlir::ValueRange&&, mlir::Location&&) (__functor=..., __args#0=..., __args#1=..., __args#2=..., __args#3=...)

    at /usr/include/c++/7/bits/std_function.h:302

#7  0x0000555556617c0b in mlir::TypeConverter::materializeConversion(llvm::MutableArrayRef (mlir::OpBuilder&, mlir::Type, mlir::ValueRange, mlir::Location)> >, mlir::OpBuilder&, mlir::Location, mlir::Type, mlir::ValueRange) ()

#8  0x000055555661e482 in mlir::detail::ConversionPatternRewriterImpl::remapValues(mlir::Location, mlir::PatternRewriter&, mlir::TypeConverter*, mlir::OperandRange, llvm::SmallVectorImpl&) ()

#9  0x000055555661e712 in mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const ()

#10 0x000055555658668b in mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::RewritePattern const&, mlir::PatternRewriter&, llvm::function_ref, llvm::function_ref, llvm::function_ref) ()

#11 0x000055555658699f in mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref, llvm::function_ref, llvm::function_ref) ()

#12 0x0000555556624e54 in (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*, mlir::ConversionPatternRewriter&) ()

#13 0x0000555556627c3e in (anonymous namespace)::OperationConverter::convertOperations(llvm::ArrayRef) ()

#14 0x000055555662a074 in mlir::applyPartialConversion(llvm::ArrayRef, mlir::ConversionTarget&, mlir::OwningRewritePatternList const&, llvm::DenseSet >*) ()

#15 0x000055555662a1a1 in mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget&, mlir::OwningRewritePatternList const&, llvm::DenseSet >*) ()

#16 0x000055555639409c in mlir::standalone::(anonymous namespace)::LowerHaskToStandardPass::runOnOperation (this=0x555559096370) at /home/bollu/work/mlir/coremlir/lib/Hask/HaskOps.cpp:2251

#17 0x00005555565d9472 in mlir::Pass::run(mlir::Operation*, mlir::AnalysisManager) ()

#18 0x00005555565d9552 in mlir::OpPassManager::run(mlir::Operation*, mlir::AnalysisManager) ()

#19 0x00005555565e1e66 in mlir::PassManager::run(mlir::ModuleOp) ()

#20 0x00005555557ef47e in main (argc=4, argv=0x7fffffffdd18) at /home/bollu/work/mlir/coremlir/hask-opt/hask-opt.cpp:408

```

- The error "failed to materialize conversion for result" is

  [from `DialectConversion.cpp`](https://github.com/llvm/llvm-project/blob/deb99610ab002702f43de79d818c2ccc80371569/mlir/lib/Transforms/DialectConversion.cpp#L2321).

- Reading the sources:

```cpp

LogicalResult OperationConverter::legalizeChangedResultType(

    Operation *op, OpResult result, Value newValue,

    TypeConverter *replConverter, ConversionPatternRewriter &rewriter,

    ConversionPatternRewriterImpl &rewriterImpl) {

  // Walk the users of this value to see if there are any live users that

  // weren't replaced during conversion.

  auto liveUserIt = llvm::find_if_not(result.getUsers(), [&](Operation *user) {

    return rewriterImpl.isOpIgnored(user);

  });

  if (liveUserIt == result.user_end())

    return success();

  // If the replacement has a type converter, attempt to materialize a

  // conversion back to the original type.

  if (!replConverter) {

    // TODO: We should emit an error here, similarly to the case where the

    // result is replaced with null. Unfortunately a lot of existing

    // patterns rely on this behavior, so until those patterns are updated

    // we keep the legacy behavior here of just forwarding the new value.

    return success();

  }

  // Track the number of created operations so that new ones can be legalized.

  size_t numCreatedOps = rewriterImpl.createdOps.size();

  // Materialize a conversion for this live result value.

  Type resultType = result.getType();

  Value convertedValue = replConverter->materializeSourceConversion(

      rewriter, op->getLoc(), resultType, newValue);

  if (!convertedValue) {

    InFlightDiagnostic diag = op->emitError()

                              << "failed to materialize conversion for result #"

                              << result.getResultNumber() << " of operation '"

                              << op->getName()

                              << "' that remained live after conversion";

    diag.attachNote(liveUserIt->getLoc())

        << "see existing live user here: " << *liveUserIt;

    return failure();

  }

```

- I see no implementations of [`materializeSourceConversion`](https://github.com/llvm/llvm-project/search?q=materializeSourceConversion)

  

- [`IsOpIgnored`](https://github.com/llvm/llvm-project/blob/deb99610ab002702f43de79d818c2ccc80371569/mlir/lib/Transforms/DialectConversion.cpp#L1096)

```cpp

bool ConversionPatternRewriterImpl::isOpIgnored(Operation *op) const {

  // Check to see if this operation was replaced or its parent ignored.

  return replacements.count(op) || ignoredOps.count(op->getParentOp());

}

```

- OK, whatever, I give up for today. For whatever reason, it doesn't seem to choose to recursively convert 

  the inner region. 

# Monday, Sep 21 2020

I vote `replaceOpWithNewOp` to be the worst named function in MLIR! 

This fucking thing depnds on the state of the `Rewriter`. I feel

like any sane human being would assume it would create a new

`Op` **at the location of the old `Op`**. FML, I wasted

two hours on trying to debug this!

```cpp

// replace altRhsRet with a BrOp that is created

// **AT THE LOCATION** of the rewriter.

 rewriter.replaceOpWithNewOp(altRhsRet, altRhsRet.getOperand(),

                                              afterCaseBB);

```

Seriously, **fuck the entire MLIR API design.** Why does

everything have to carry so much state? Didn't we learn from

LLVM?

# Friday, Sep 18th 2020

- [Link to google doc](https://docs.google.com/document/d/1nkcM3o3D7G6stkxdCbJIEXdgbqRMdiL38x-oJiA6fJQ/edit?usp=sharing)

# Monday, Sep 14th 2020

- I need to fix functions/globals ASAP. I think it should be like this:

0. Lazy functions are denoted by `a ~> b`. Strict functions by `a => b`.

1. `apLazy(a ~> b)` can peel off arguments, leaving one with finally `() ~> b`.

2. `force` can 'invoke' a `() => b` leaving one with a `b`.

2. `apStrict(a -> b)` can peel off arguments, leaving one with finally `b`.

This is hard to write an example for. But basically, there is a difference

between a value that is expected to be forced and the value itself.

for example, should `plus` be:

```

  hask.func @plus {

    %lam = hask.lambdaSSA(%i : !hask.thunk, %j: !hask.thunk) {

      %icons = hask.force(%i: !hask.thunk): !hask.value

      %reti = hask.caseSSA %icons 

           [@MkSimpleInt -> { ^entry(%ival: !hask.value):

              %jcons = hask.force(%j: !hask.thunk):!hask.value

              %retj = hask.caseSSA %jcons 

                  [@MkSimpleInt -> { ^entry(%jval: !hask.value):

                        %sum_v = hask.primop_add(%ival, %jval)

                        %boxed = hask.construct(@MkSimpleInt, %sum_v)

                        // do we return the box?

                        hask.return(%boxed) :!hask.thunk

                        // or do we return a closure that holds the box?

                        // this matters to callees. In one case, they can

                        // `case` case on the box. In the other case, they need

                        // to `force`, and then `case`.

                        hask.suspend(%boxed) :!hask.thunk

                  }]

              hask.return(%retj):!hask.thunk

           }]

      hask.return(%reti): !hask.thunk

    }

    hask.return(%lam): !hask.fn>

  }

```

# Friday, Sep 11 2020

- [doc to call](https://docs.google.com/document/d/1AMTo9cTpPTVzLrBAnzE9NS5wJcQ6Jo8PeMKO7-foHEg/edit?usp=sharing)

# Wednesday, Sep 9 2020

##### `k-lazy`: MLIR

```

module {

  // k x y = x

  hask.func @k {

    %lambda = hask.lambdaSSA(%x: !hask.thunk, %y: !hask.thunk) {

      hask.return(%x) : !hask.thunk

    }

    hask.return(%lambda) :!hask.fn>

  }

  // loop a = loop a

  hask.func @loop {

    %lambda = hask.lambdaSSA(%a: !hask.thunk) {

      %loop = hask.ref(@loop) : !hask.fn

      %out_t = hask.apSSA(%loop : !hask.fn, %a)

      // HACK! This will emit an `evalClosure` though it is nowhere 

      // reachable from hask.return (%out_t).

      // We need to rework the type system...

      %out_v = hask.force(%out_t)

      hask.return(%out_t) : !hask.thunk

    }

    hask.return(%lambda) : !hask.fn

  }

  hask.adt @X [#hask.data_constructor<@MkX []>]

  // k (x:X) (y:(loop X)) = x

  // main = 

  //     let y = loop x -- builds a closure.

  //     in k x y

  hask.func @main {

    %lambda = hask.lambdaSSA(%_: !hask.thunk) {

      %x = hask.construct(@X)

      %k = hask.ref(@k) : !hask.fn>

      %loop = hask.ref(@loop) :  !hask.fn

      %y = hask.apSSA(%loop : !hask.fn, %x)

      %out_t = hask.apSSA(%k: !hask.fn>, %x, %y)

      %out = hask.force(%out_t)

      hask.return(%out) : !hask.value

    }

    hask.return(%lambda) :!hask.fn

  }

}

```

#### `k-lazy`: LLVM

```

declare i8* @malloc(i64)

declare void @free(i8*)

declare i8* @mkClosure_capture0_args2(i8*, i8*, i8*)

declare i8* @malloc__(i32)

declare i8* @evalClosure(i8*)

declare i8* @mkClosure_capture0_args1(i8*, i8*)

define i64 @k(i64 %0, i64 %1) !dbg !3 {

  ret i64 %0, !dbg !7

}

define i64 @loop(i64 %0) !dbg !9 {

  %2 = inttoptr i64 %0 to i8*, !dbg !10

  %3 = call i8* @mkClosure_capture0_args1(i8* bitcast (i64 (i64)* @loop to i8*), i8* %2), !dbg !10

  %4 = call i8* @evalClosure(i8* %3), !dbg !12

  ret i8* %3, !dbg !13

}

define i64 @main(i64 %0) !dbg !14 {

  %2 = call i8* @malloc__(i32 4200), !dbg !15

  %3 = ptrtoint i8* %2 to i64, !dbg !15

  %4 = inttoptr i64 %3 to i8*, !dbg !17

  %5 = call i8* @mkClosure_capture0_args1(i8* bitcast (i64 (i64)* @loop to i8*), i8* %4), !dbg !17

  %6 = inttoptr i64 %3 to i8*, !dbg !18

  %7 = call i8* @mkClosure_capture0_args2(i8* bitcast (i64 (i64, i64)* @k to i8*), i8* %6, i8* %5), !dbg !18

  %8 = call i8* @evalClosure(i8* %7), !dbg !19

  ret i8* %8, !dbg !20

}

```

- I can reduce the `inttoptr`/`ptrtoint` noise by assuming everything will

  always be `i8*`.

- I need to write some code that prints the final answer. Then I can have

  testing with `FileCheck`. Can steal from `simplexhc-cpp`.

- What's annoying is that we're back to making closures and having saturated

  function applications. I was hoping I could avoid both, but no dice.

- Also, our type system is broken. Note the definition of `loop`:

```

  // loop a = loop a

  hask.func @loop {

    %lambda = hask.lambdaSSA(%a: !hask.thunk) {

      %loop = hask.ref(@loop) : !hask.fn

      %out_t = hask.apSSA(%loop : !hask.fn, %a)

      // HACK! This will emit an `evalClosure` though it is nowhere 

      // reachable from hask.return (%out_t).

      // We need to rework the type system...

      %out_v = hask.force(%out_t)

      hask.return(%out_t) : !hask.thunk

    }

    hask.return(%lambda) : !hask.fn

  }

```

I want to be able to return `%out_v` but I cannot. The *actual* types

that I have are:

- Stuff that is on the heap, which is created by `mkConstructor` [constructors]

  and `apSSA` [closures]

- Stuff that we get 'after forcing', which is going to be either

  constructors or raw values. This is because every time we force, we evaluate

  upto WHNF: the outermost thing must be either a constructor or a raw

  value.

- We are also lucky: in the above example, we don't actually capture

  any variables. If we were capturing things, then we would have had

  to work harder when building closures :(

# Tuesday, Sep 8 2020

### Naive compilation

Consider how we wish to lower

```

f :: Int -> Int -> Int

f = plus x y

```

we lower this to:

```

fn @f = lambda (%x) {

  return lambda (%y) {

    %plus_ref = ref(@plus)

    %x_plus = ap(%plus_ref, %x)

    %x_plus_y = ap(%x_plus, %y)

    return %x_plus_y

  }

}

global @g {

  %f = ref(@f)

  %one = ref(@one)

  %two = ref(@two)

  %fx = ap(%f, %one)

  %fxy = ap(fx, %two)

  %fxy_val = force(%fxy) //value is forced here

  case %fxy_val {

    ...

  }

}

```

Let's compile this:

```

f: 

  x = pop(); y = pop();

  push(y); push(x); enter(plus)

g: 

  push(two); push(one);

  enter(f);

  // assumes control flow returns here: this is another "?". Compiling naively

  // like this may not work, because stack space is too small is the STG wisdom.

  fxy_val = pop();

  case(fxy_val, ... )

```

#### Partial application

Now consider a slightly different program:

```

fn @f = lambda (%x) {

  return lambda (%y) {

    %plus_ref = ref(@plus)

    %x_plus = ap(%plus_ref, %x)

    %x_plus_y = ap(%x_plus, %y)

    return %x_plus_y

  }

}

fn @h = lambda (%x) {

  %f = ref(@f)

  %fortytwo = ref(@fortytwo)

  %fx = ap(%f, %x)

  %fx42 = ap(%x, %x, %fortytwo) // this is a value, not a thunk (?)

  return %fx42

}

global @g2 {

  %h = ref(@h)

  %one = ref(@one)

  %hone = ap(%h, %one)

  %honeval = force(%hone) //value is forced here

  case %honeval {

    ...

  }

}

```

How do we compile this? 

```

f: 

  x = pop(); y = pop();

  push(y); push(x); enter(plus)

h:

  x = pop()

  push(fortytwo)

  push(x)

  enter(f)

g2:

  push(one)

  enter(h)

  honeval = pop()

  case(honeval, ... )

```

#### Strictness

Consider we wish to call `+#`. The difference is that such a function does not

need? want? a 'force' call [in theory]. So, naively, we would want:

```

fn @fstrict = lambda (%x) {

  return lambda (%y) {

    %plus#_ref = ref(@plus#)

    %x_plus# = ap(%plus#_ref, %x)

    %x_plus#_y = ap(%x_plus#, %y) <- VALUE COMPUTED HERE

    return %x_plus_y

  }

}

```

ie, the value is 'computed' at the step of

```

%x_plus#_y = ap(%x_plus#, %y) <- VALUE COMPUTED HERE

```

and does not in fact wait for a `force`. In theory, we should compile such a thing

as:

```

f:

  x = pop(); y = pop();

  z = x + y;

  push(z) 

```

However, this is nonsensical. Before, we knew *when* to generate a sequence of

`pop`s: whenever there was a `force`. Now, however, this is not the case. Consider

the code:

```

fn @h = lambda (%x) {

  %plus# = ref(@plus#)

  %fortytwo = ref(@fortytwo)

  %fx = ap(%plus#, %x)

  %fx42 = ap(%x, %x, %fortytwo) // this is a value, not a thunk (?)

  return %fx42

}

global @g2 {

  %h = ref(@h)

  %one = ref(@one)

  %hone = ap(%h, %one) // <- should the value be computed here? automatically?

  %honeval = force(%hone) // <- or should the value be computed here?

  case %honeval {

    ...

  }

```

If we say that the value should be computed at

```

%hone = ap(%h, %one)

```

then how would we discover such a thing? How do we know that `h` calls `@plus#`?

It's impossible. So we can only compile the code in such a way that

```

%honeval = force(%hone) // <- or should the value be computed here?

```

must return the right value. But this forces us to eschew strict

semantics everywhere, even for seemingly 'strict' operations like

addition of integers? It's unclear to me what this means, and why there's a

difference between STG and our implementation.

#### Compiling lambdas

Inside STG, a `lambda` is not an *expression*. We can only have bindings at particular

*binding sites*. These binding sites create ("lambdas" closures). For this week,

we can assume that none of our lambdas have any free variables, so we don't

need to implement closure capturing immediately. That will come next week ;)

#### How do we compile constructors?

```

hask.func @minus {

%lami = hask.lambdaSSA(%i: !hask.thunk) {

     %lamj = hask.lambdaSSA(%j :!hask.thunk) {

          %icons = hask.force(%i)

          %reti = hask.caseSSA %icons 

               [@SimpleInt -> { ^entry(%ival: !hask.value):

                  %jcons = hask.force(%j)

                  %retj = hask.caseSSA %jcons 

                      [@SimpleInt -> { ^entry(%jval: !hask.value):

                            %minus_hash = hask.ref (@"-#") : !hask.fn>

                            %i_sub = hask.apSSA(%minus_hash : !hask.fn>, %ival)

                            %i_sub_j_thunk = hask.apSSA(%i_sub : !hask.fn, %jval)

                            %i_sub_j = hask.force(%i_sub_j_thunk)

                            %mk_simple_int = hask.ref (@MkSimpleInt) :!hask.fn

                            %boxed = hask.apSSA(%mk_simple_int:!hask.fn, %i_sub_j)

                            hask.return(%boxed) :!hask.thunk

                      }]

                  hask.return(%retj) :!hask.thunk

               }]

          hask.return(%reti):!hask.thunk

      }

      hask.return(%lamj): !hask.fn

}

hask.return(%lami): !hask.fn>

}

```

Note that this is problematic: nowhere do we 'force' the call to `mk_simple_int`.

So why should such a call be compiled?

The only way out that I can see is to actually do the damn thing that

STG does, and always ask for saturated function calls. That way, when

we see an `ap`, we know that it should compile to a `push-enter`. Otherwise,

we seem to get into thorny issues of 'when do we force an `ap`?

All of this seems to force us into considering saturated function calls.

#### What does GRIN do?

GRIN compiles each partial application as a separate function.

True to the GRIN philosophy, also function objects are represented by node

values. Just like the G-machine and most other combinator-based abstract ma-

chines, function objects in GRIN programs exist in the form of curried applica-

tions of functions with too few arguments. Consider again the function upto of

our running example, which takes two arguments. We represent the function ob-

ject of upto by a node Pupto_2 , and an application of upto to one argument by

a node Pupto_1 e . The naming convention we use is that prefix `P` indicates

a partial application, and `_2` etc. is the number of missing arguments.

In analogy with the generic eval procedure, programs which use higher or-

der functions must also have a generic apply procedure, which must cover pos-

sible function nodes that might appear in the program. An example is shown

in Figure. apply returns the value of a function value (node) applied to one

additional argument. Generally, apply just returns the next version of the func-

tion node with one more argument present, except when the final argument is

supplied: then the call of the procedure takes place.

GRIN does not provide a way to do a function application of a variable in

a lazy context directly, e.g., build a representation of f x where f is a variable,

instead a closure must be wrapped around it; this is the purpose of the ap2

procedure.

#### What do we do?

It would have been lovely to have an IR that can automatically deal with partial

applications. For now, I'm switching to having saturated function calls.

# Monday, Sep 7 2020

- I am not sure if I need a new operation called as `haskConstruct(, )`. Intuitively,

  I ought not have such a thing, because of indirection:

```

data X = MkX Int

f :: Int -> X; f = MkX 

o :: Int; o = 1

x :: X; x = f o

```

- we will see an `apSSA(f, o)` with no sight of the `haskConstruct` call.

  However, perhaps we should normalize `apSSA(f, o)` into `haskConstruct(@MkX, 1)`,

  because this will allow us to analyze the idea of a 'constructor application'

  separately from a 'function application'. So we should have a normalization

  rule from:

```

 %cons = hask.ref(@Constructor)

 %result = hask.apSSA(%cons, %v1, ..., %vn)

```

into:

```

 %result = hask.construct(@Constructor, %v1, ..., %vn)

```

- It is very unclear what the type of `lambda`, `ap` ought to be. For now,

  let's say it's all `!hask.value`. This will break once we mix strict and

  non-strict.

- This is correct code:

                                        

```

%mk_simple_int = hask.ref (@MkSimpleInt)

// what do now?

%boxed = hask.apSSA(%mk_simple_int, %i_sub_j)

hask.return(%boxed) :!hask.thunk

```

but if we assume that `hask.apSSA` must always return a `hask.value`, we

are screwed. The only way out I can see is to teaach `apSSA` and my

type system about currying and, well, function types. GG. Let's do this.

Great, so I now have a type system!

```

hask.force: (box: hask.thunk) -> hask.value

hask.case: (scrutinee: hask.value) -> T. All the pattern matches have to return the same value.

hask.ap: (fn: hask.func) * (param: A) -> B

hask.return: (retval: T) -> void

hask.lambda: (param: A) * (region with return: B) -> hask.func

hask.ref: (refname: Symbol) -> T

```


##### Raw git log

```

* a8c43a4 76 seconds ago Siddharth Bhat (HEAD -> master, origin/master) get first cut of type system working

|

|  8 files changed, 201 insertions(+), 125 deletions(-)

* 490c3af 3 hours ago Siddharth Bhat get hask.func to round-trip

|

|  4 files changed, 20 insertions(+), 14 deletions(-)

* ce64e16 4 hours ago Siddharth Bhat get angle bracket based fn type parsing working

|

|  2 files changed, 13 insertions(+), 1 deletion(-)

* 1ce1810 4 hours ago Siddharth Bhat add a HaskFunctionType that's not hooked in anywhere

|

|  2 files changed, 39 insertions(+), 1 deletion(-)

* ad0d367 5 hours ago Siddharth Bhat add appel paper on SSA v/s functional code

|

|  1 file changed, 6515 insertions(+)

* 50a656a 5 hours ago Siddharth Bhat Spring cleaning: rename ops from XSSAOp -> XOp

|

|  3 files changed, 44 insertions(+), 44 deletions(-)

* d4dda1a 6 hours ago Siddharth Bhat need function types. Scott be blessed.

|

|  9 files changed, 213 insertions(+), 216 deletions(-)

* 53bc03c 8 hours ago Siddharth Bhat started migrating to new normalization

|

|  8 files changed, 328 insertions(+), 83 deletions(-)

```

# Wed, Sep 2 2020

- [`A @Class@ corresponds to a Greek kappa in the static semantics:`](https://haskell-code-explorer.mfix.io/package/ghc-8.4.3/show/types/Class.hs#L271)

  --- Gee thanks,                                             

  that tells me where to lookup the static semantics and what `kappa` is...

- We extract out the data from `data ConcreteProd = MkConcreteProd Int# Int#`

  as:

```

//unique:rza

//name: ConcreteProd

//|data constructors|

  dcName: MkConcreteProd

  dcOrigTyCon: ConcreteProd

  dcFieldLabels: []

  dcRepType: Int# -> Int# -> ConcreteProd

  constructor types: [Int#, Int#]

  result type: ConcreteProd

  ---

  dcSig: ([], [], [Int#, Int#], ConcreteProd)

  dcFullSig: ([], [], [], [], [Int#, Int#], ConcreteProd)

  dcUniverseTyVars: []

  dcArgs: [Int#, Int#]

  dcOrigArgTys: [Int#, Int#]

  dcOrigResTy: ConcreteProd

  dcRepArgTys: [Int#, Int#]

```

- Similarly, for an *abstract* product, things are slightl more complicated:

  `data AbstractProd a b = MkAbstractProd a b`. I don't have a good idea for

  how the abstract binders should be serialized. In theory, we can just represent

  them as `lambda`s. In practice...

- For a concrete sum type, we get two data constructors:

```

//unique:rz7

//name: ConcreteSum

//|data constructors|

  dcName: ConcreteLeft

  dcOrigTyCon: ConcreteSum

  dcFieldLabels: []

  dcRepType: Int# -> ConcreteSum

  constructor types: [Int#]

  result type: ConcreteSum

  ...

  dcName: ConcreteRight

  dcOrigTyCon: ConcreteSum

  dcFieldLabels: []

  dcRepType: Int# -> ConcreteSum

  constructor types: [Int#]

  result type: ConcreteSum

  ...

//----

```

- For a concrete recursive type, the data constructor `ConcreteRecSumCons`

  refers to the type constructor `ConcreteRecSum`, which is also the result.

```

//unique:rz2

//name: ConcreteRecSum

//|data constructors|

  dcName: ConcreteRecSumCons

  dcOrigTyCon: ConcreteRecSum

  dcFieldLabels: []

  dcRepType: Int# -> ConcreteRecSum -> ConcreteRecSum

  constructor types: [Int#, ConcreteRecSum]

  result type: ConcreteRecSum

  ...

```

- So, I am unsure how we ought to handle abstract types like `Maybe a = Just a | Nothing`.

  I don't have a good sense of whether we should respect Core or not. I believe that

  what GRIN does is to not *care* about such issues: It doesn't even know what the hell

  a `Maybe` is. To it, it's just two types of boxes: Either `{tag:Just, data: [a]}`,

  `{tag:nothing, data:[]}`. Mh, I wish I had more clarity on any of this.

- Either way, let's say I want to represent these data constructors. I would

  like to have been able to write:

```

data ConcreteSum = ConcreteLeft Int# | ConcreteRight Int#

hask.make_algebraic_data_type @ConcreteSum  -- name of the ADT

  [@ConcreteLeft"[@"Int#"], # constructor1: Int# -> ConcreteSum

   @ConcreteRight[@"Int#"]] # constructor2: Int# -> ConcreteSum

# data ConcreteProd = MkConcreteProd Int# Int#

hask.make_algebraic_data_type @ConcreteProd 

 [@MkConcreteProd [@"Int#", @"Int#"]] 

# data ConcreteRec = MkConcreteRec Int# ConcreteRec

hask.make_algebraic_data_type @ConcreteRec 

 [@MkConcreteRec [@"Int#", @ConcreteRec]] 

```

- However, as far as I understand, such a declaration cannot be done easily

  because MLIR does not support *attribute lists*. It supports *type lists*,

  and *attribute dicts*. What do? One can of course encode a list using a dict

  with judicious use of torture. This seems like  a terrible solution to me

  though. Can we just beg upstram for attribute lists?

- OK, never mind, I am just horrendous at RTFMing. Turns out they call it

  "array attributes":

```

array-attribute ::= `[` (attribute-value (`,` attribute-value)*)? `]`

```

> An array attribute is an attribute that represents a collection of attribute values.

- FWIW, what threw me off is that this list attribute belongs to standard,

  and is not a primitive of the attribute vocabulary. Seems disingenous to me.

I'm trying to figure how to use custom attributes. On providing this input:

```

playground.mlir

module {

  hask.adt @SimpleInt [#hask.data_constructor<@MkSimpleInt, [@"Int#"]>]

}

```

I get the ever-so-helpful error message:

```

Error can't load file ./playground.mlir

```

Gee, thanks.

OK, now I need to find out which part of what I wrote is illegal.

- Fun aside: creating an `Op` derived class with _no traits_ results in an error!

```

/usr/local/include/mlir/IR/OpDefinition.h: In instantiation of ‘static bool mlir::Op::hasTrait(mlir::TypeID) [with ConcreteType = mlir::standalone::HaskADTOp; Traits = {}]’:

/usr/local/include/mlir/IR/OperationSupport.h:156:12:   required from ‘static mlir::AbstractOperation mlir::AbstractOperation::get(mlir::Dialect&) [with T = mlir::standalone::HaskADTOp]’

/usr/local/include/mlir/IR/Dialect.h:154:54:   required from ‘void mlir::Dialect::addOperations() [with Args = {mlir::standalone::HaskADTOp}]’

/home/bollu/mlir/coremlir/lib/Hask/HaskDialect.cpp:40:28:   required from here

/usr/local/include/mlir/IR/OpDefinition.h:1357:49: error: no matching function for call to ‘makeArrayRef()’

     return llvm::is_contained(llvm::makeArrayRef({TypeID::get()...}),

```

- OK, stupid errors are past. I'm now learning the `Attribute` framework. It seems

  to hold data in my class, I need to have an `AttributeStorage` member. I'm

  taking `ArrayAttr` as my prototype. Here's the code, for ease of use:

  ([Github permalink](https://github.com/llvm/llvm-project/blob/deb99610ab002702f43de79d818c2ccc80371569/mlir/include/mlir/IR/Attributes.h#L187))

```cpp

/// Array attributes are lists of other attributes.  They are not necessarily

/// type homogenous given that attributes don't, in general, carry types.

class ArrayAttr : public Attribute::AttrBase {

public:

  using Base::Base;

  using ValueType = ArrayRef;

  static ArrayAttr get(ArrayRef value, MLIRContext *context);

  ArrayRef getValue() const;

  Attribute operator[](unsigned idx) const;

  /// Support range iteration.

  using iterator = llvm::ArrayRef::iterator;

  iterator begin() const { return getValue().begin(); }

  iterator end() const { return getValue().end(); }

  size_t size() const { return getValue().size(); }

  bool empty() const { return size() == 0; }

private:

  /// Class for underlying value iterator support.

  template 

  class attr_value_iterator final

      : public llvm::mapped_iterator {

  public:

    explicit attr_value_iterator(ArrayAttr::iterator it)

        : llvm::mapped_iterator(

              it, [](Attribute attr) { return attr.cast(); }) {}

    AttrTy operator*() const { return (*this->I).template cast(); }

  };

public:

  template 

  iterator_range> getAsRange() {

    return llvm::make_range(attr_value_iterator(begin()),

                            attr_value_iterator(end()));

  }

  template 

  auto getAsValueRange() {

    return llvm::map_range(getAsRange(), [](AttrTy attr) {

      return static_cast(attr.getValue());

    });

  }

};

```

- [Github permalink](https://github.com/llvm/llvm-project/blob/deb99610ab002702f43de79d818c2ccc80371569/mlir/lib/IR/AttributeDetail.h#L49) of storage details

```cpp

struct ArrayAttributeStorage : public AttributeStorage {

  using KeyTy = ArrayRef;

  ArrayAttributeStorage(ArrayRef value) : value(value) {}

  /// Key equality function.

  bool operator==(const KeyTy &key) const { return key == value; }

  /// Construct a new storage instance.

  static ArrayAttributeStorage *construct(AttributeStorageAllocator &allocator,

                                          const KeyTy &key) {

    return new (allocator.allocate())

        ArrayAttributeStorage(allocator.copyInto(key));

  }

  ArrayRef value;

};

```

#### Git log at the end of today:

```

c7370bf 25 minutes ago Siddharth Bhat (HEAD -> master, origin/master) add legalizer data

 1 file changed, 18 insertions(+)

4abfd7a 38 minutes ago Siddharth Bhat It appears my attribute is created correctly. We print:

 2 files changed, 3 insertions(+), 2 deletions(-)

3261975 71 minutes ago Siddharth Bhat [WIP] getting there... can now store the data

 1 file changed, 8 insertions(+), 4 deletions(-)

f803c15 2 hours ago Siddharth Bhat [WIP] Playing with template errors, trying to figure out how to store attributes

 4 files changed, 119 insertions(+), 6 deletions(-)

cc6145a 2 hours ago Siddharth Bhat FUCK ME, I forgot to return x(

 1 file changed, 1 insertion(+), 1 deletion(-)

91d4b4e 3 hours ago Siddharth Bhat FFS, do NOT define classof() unless you know what you're doing

 2 files changed, 36 insertions(+), 6 deletions(-)

ff05162 3 hours ago Siddharth Bhat [WIP] I am literally unable to add an attribute...

 3 files changed, 9 insertions(+), 4 deletions(-)

dc4fec5 4 hours ago Siddharth Bhat [WIP] attr parsing

 6 files changed, 39 insertions(+), 17 deletions(-)

df17693 4 hours ago Siddharth Bhat add current status of getting attributes up

 12 files changed, 235 insertions(+), 272 deletions(-)

6ba2990 4 hours ago Siddharth Bhat add README documenting that we do in fact have attribute lists.

 1 file changed, 40 insertions(+)

4e16ba8 4 hours ago Siddharth Bhat [SIDEQUEST] Fuck this, let's just reboot hask98 from scratch on the weekend

 4 files changed, 14 insertions(+)

266201e 10 hours ago Siddharth Bhat add the exploration of data constructors here

 10 files changed, 2012 insertions(+), 551 deletions(-)

```

# Monday, 24 August 2020

- Nuked `HaskModuleOp`, `HaskDummyFinishOp` since I'm just using the regular `ModuleOp` now. I now understand

  why `ModuleOp` doesn't allow SSA variables in its body: these are not accessible from functions because of the

  `IsolatedFromAbove` constraint. So it only makes sense to have "true global data" in a `ModuleOp`. I really wish

  I didn't have to "learn their design choices" by reinventing the bloody wheel. Oh well, it was at least very

  instructive.

  

- Got full lowering down into LLVM. I now need to lower a program with `Int`, not just `Int#`.

- [Wow the names of data constructors are complicated](https://haskell-code-explorer.mfix.io/package/ghc-8.6.1/show/basicTypes/DataCon.hs#L126)

> Note [Data Constructor Naming]

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> Each data constructor C has two, and possibly up to four, Names associated with it:

- My god, GHC does love inflicting pain on those who decide to read its sources.

- I'm writing the simplest possible version of `fib` that compiles through the GHC toolchain:

```hs

{-# LANGUAGE MagicHash #-}

{-# LANGUAGE UnboxedTuples #-}

import GHC.Prim

import GHC.Types(IO(..))

data SimpleInt = MkSimpleInt Int#

plus :: SimpleInt -> SimpleInt -> SimpleInt

plus i j = case i of MkSimpleInt ival -> case j of MkSimpleInt jval -> MkSimpleInt (ival +# jval)

minus :: SimpleInt -> SimpleInt -> SimpleInt

minus i j = case i of MkSimpleInt ival -> case j of MkSimpleInt jval -> MkSimpleInt (ival -# jval)

one :: SimpleInt; one = MkSimpleInt 1#

zero :: SimpleInt; zero = MkSimpleInt 0#

fib :: SimpleInt -> SimpleInt

fib i =

    case i of

       MkSimpleInt 0# -> zero

       MkSimpleInt 1# -> one

       n -> plus (fib n) (fib (minus n one))

main = IO (\s -> (# s, ()#))

```

```

/tmp/ghc1433_0/ghc_2.s:194:0: error:

     Error: symbol `Main_MkSimpleInt_info' is already defined

    |

194 | Main_MkSimpleInt_info:

    | ^

/tmp/ghc1433_0/ghc_2.s:214:0: error:

     Error: symbol `Main_MkSimpleInt_closure' is already defined

    |

214 | Main_MkSimpleInt_closure:

    | ^

```

- OK, interesting, my GHC plugin is somehow causing `Int` to be defined twice.

 

- I gave up. It seems to be because I run `CorePrep` myself manually, after which GHC

  also decides to run `CorePrep`. So I came up with the brilliant solution of killing `GHC`

  in a plugin pass after all of my scheduled passes run. This is so fucked up.

- I need to change `apSSA` to be capable of accepting the second parameter as a symbol

  as well.

```

tomlir-fib.pass-0000.mlir:82:39: error: expected SSA operand

        %app_24 = hask.apSSA(%app_23, @one)

```

- OK, no, that's not going to work. I now understand why MLIR needs the `std.constant` instruction. So, consider

two different variations:

1. `apSSA(@f1, %v1)`

2. `apSSA(%v2, @f2)`

Now, note that as MLIR `Op`s, these have the exact same "shape". They both have

one _operand_ (`%v1` / `%v2`) and they both have one _symbol attribute_, 

`@f1 / @f2`. So, there's no way to tell one from the other (easily)!. 

1. Either we do something terrible, like naming the symbol attribute at the `i`th parameter

  location as `param_i`, but, I mean, this is too horrible to even consider.

2. Or, we introduce a `%val = hask.reference(@sym)` just like `std.constant`, which we then

   use to write `%vf1 = hask.reference(@f1); apSSA(%vf1, %v1)` and similarly for the

   other case, we write `%vf2 = hask.reference(@f2); apSSA(%v2, %vf2)`. 

3. This makes me sad. Why can't we have `@var` as a real parameter, rather than some kind of

   stilted "attribute".

   

It seems like I'll be spending today fixing my lowering to learn about this `hask.ref` syntax.

# Log:  [oldest] to [newest]

## Concerns about this `Graph` version of region

The code that looks like below is considered as a non-dominated use. So it

checks **use-sites**, not **def-sites**.

```cpp

standalone.module { 

standalone.dominance_free_scope {

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-Blocking dominance_free_scope

    vvvv-DEF

    %fib = standalone.toplevel_binding {  

        // ... standalone.dominance_free_scope { standalone.return (%fib) } ...

        ... standalone.return (%fib) } ...

                           USE-^^^^

    }

} // end dominance_free_scope

```

On the other hand, the code below is considered a dominated use (well, the domaintion

that is masked by `standalone.dominance_free_scope`:

```cpp

standalone.module { 

// standalone.dominance_free_scope {

    vvvv-DEF

    %fib = standalone.toplevel_binding {  

        ... standalone.dominance_free_scope { standalone.return (%fib) } ...

   BLOCKING-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                 USE-^^^^

        //... standalone.return (%fib) } ...

                                    

    }

//} // end dominance_free_scope

```

So, abstractly, the version below round-trips through MLIR. I will denote

this as `DEF(BLOCKING(USE))`:

```

DEF-MUTUAL-RECURSIVE

   BLOCKING----------->

     | USE-MUTUAL-RECURSIVE

     |

     v

```

The one below (denoted as `BLOCKING(DEF(USE)))`) does not

round-trip through MLIR; It gives domination errors:

```

BLOCKING----------->

|DEF-MUTUAL-RECURSIVE

|   USE-MUTUAL-RECURSIVE

|    

v    

```

- My mental model of the "blocking" was off! I thought it meant that

  everything inside this region can disobey SSA conditions. Rather,

  it means that everything **inside** this region can disobey SSA _with respect to_

  everything **outside** this region. [Maybe the other way round as well, I have not tried,

  nor do I have a good mental model of this].

- Unfortunately, for `Core`, it is the latter version with `BLOCKING (DEF (USE))` that 

  is of more use, since the `Core` encoding encodes the recursion as:

```

rec { -- BLOCKING

  fib :: Int -> Int

  

  {- Core Size{terms=23 types=6 cos=0 vbinds=0 jbinds=0} -}

  fib = -- DEF-MUTUAL-RECURSIVE

    λ i →

       ...

      (APP(Main.fib i)) -- USE-MUTUAL-RECURSIVE

      ...

}

```

So when we translate from `Core` into MLIR, we need to either

figure out which are the bindings that are `use-before-def` and then wrap them.

Or we participate in the discussion and petition for this kind of "lazy-region"

as well. Maybe both.

## Stuff discovered in this process about `ghc-dump`:

##### Fib for reference

```hs

{- Core Size{terms=23 types=6 cos=0 vbinds=0 jbinds=0} -}

fib = λ i → case i of wild {

    I# ds →

      case ds of ds {

        DEFAULT →

          APP(GHC.Num.+ @Int GHC.Num.$fNumInt // fib(i-1) + fib(i)

            (APP(Main.fib i)) // fib(i)

            (APP(Main.fib  -- fib(i - 1)

                    APP(GHC.Num.- @Int GHC.Num.$fNumInt i (APP(GHC.Types.I# 1#))))))) -- (i - 1)

        0# → APP(GHC.Types.I# 0#)

        1# → APP(GHC.Types.I# 1#)

      }

  }

```

I feel that `ghc-dump` does not preserve all the information we want. Hence

I started hand-writing the IR we want. I'll finish the translator after

I sleep and wake up. However, it's unclear to me how much extending `ghc-dump`

makes sense. I should maybe clean-slate from a Core plugin.

- Why? Because `ghc-dump` does not retain enough information. For example,

  it treats both `GHC.Num.$fNumInt` and `GHC.Types.I#` as variables; It has

  erased the fact that one is a typeclass dictionary and the other is

  a data constructor.

- Similarly, there is no way to query from within `ghc-dump` what `GHC.Num.-`

  is, and it's impossible to infer from context.

- In general, this is full of unknown-unknowns for me. I don't know enough 

  of the details of core to forsee what we will may need from GHC. Using

  `ghc-dump` is a bad idea because it's technical debt against a 

  _prettyprinter of core_ (fundamentally).

- Hence, we should really be reusing the

  [code in `ghc-dump` that traverses `Core` from within GHC](https://github.com/bgamari/ghc-dump/blob/master/ghc-dump-core/GhcDump/Convert.hs#L237).

# 1 July 2020

- [Reading GHC core sources paid off, the `CorePrep` invariants are documented here](https://haskell-code-explorer.mfix.io/package/ghc-8.6.1/show/coreSyn/CorePrep.hs#L142)

- In particular we have `case  of`. So nested cases are legal, 

  which is something we need to flatten.

- Outside of nested cases, everything else seems "reasonable": laziness is

  at each point of `let`. We can lower `let var = e` as `%let_var = lazy(e)`

  for example.

- Will first transcribe our `fibstrict` example by hand, then write a small

  Core plugin to do this automagically.

- I don't understand WTF `cabal` is doing. In particular, why `cabal install --library`

  installs the library twice :(

- It _seems_ like the cabal documentation on [how to install a system library](https://downloads.haskell.org/~cabal/Cabal-latest/doc/users-guide/installing-packages.html#building-and-installing-a-system-package)

  should do the trick.

```hs

$ runghc Setup.hs configure --ghc

$ runghc Setup.hs build

$ runghc Setup.hs install

```

- OK, so the problem was that I somehow had `cabal` hidden in my package management.

  It turns that even `ghc-pkg` maintains a local and a global package directory,

  and I was exposing stuff in my _local_ package directory (which is in `~/.ghc/.../package.conf.d`),

  note the global one (which is in `/usr/lib/ghc-6.12.1/package.conf.d`).

- The solution is to ask `ghc-pkg --global expose Cabal` which exposes `cabal`,

  which contains `Distribution.Simple`, which is needed to run `Setup.hs`.

- `runghc` is some kind of wrapper around `ghc` runs a file directly without

  having to compile things.

- Of course, this is disjoint from `cabal`'s `exposed-modules`, which is a layer

  disjoint from `ghc-pkg`. I think cabal commands `ghc-pkg` to expose and hide

  what it needs. This is fucking hilarious if it weren't so complex.

- To quote the GHC manual on `Cabal`'s `Distribution.Simple`:

> This module isn't called "Simple" because it's simple. Far from it. It's

> called "Simple" because it does complicated things to simple software.

> The original idea was that there could be different build systems that all

> presented the same compatible command line interfaces. There is still a

> Distribution.Make system but in practice no packages use it.

> https://hackage.haskell.org/package/Cabal-3.2.0.0/docs/Distribution-Simple.html

Reading GHC sources can sometimes be unpleasant. There are many, many invariants

to be maintained. [This is from CorePrep.hs:1450](https://haskell-code-explorer.mfix.io/package/ghc-8.6.1/show/coreSyn/CorePrep.hs#L1450):

> There is a subtle but important invariant ...

> The solution is CorePrep to have a miniature inlining pass...

> Why does the removal of 'lazy' have to occur in CorePrep? he gory details are in Note [lazyId magic]...

> We decided not to adopt this solution to keep the definition of 'exprIsTrivial' simple....

> There is ONE caveat however...

> the (hacky) non-recursive -- binding for data constructors...

- Brilliant, my tooling suddenly died thanks to https://github.com/well-typed/cborg/issues/242: GHC Prim

  and `cborg` started overlapping an export. 

- [`cabal install --lib` is not idempotent](https://github.com/haskell/cabal/issues/6394).

  Only haskellers would have issue citing a problem about **library installs**,

  while describing the issue as one of **idempotence**.

# 3 July 2020 (Friday)

Got the basic examples converted to SSA. Trying to do this in a GHC plugin.

Most of the translation code works. I'm stuck at a point, though. I need

to rename a variable `GHC.Num.-#` into something that can be named. Otherwise,

I try to create the MLIR:

```

%app_100  =  hask.apSSA(%-#, %i_s1wH)

```

where the `-#` refers to the variable name `GHC.Num.-#`. This is pretty

ludicrous. However, attempting to get a name from GHC seems quite complicated.

There are things like:

- `Id`

- `Var`

- `class NamedThing`

- `data OccName`

it's quite confusing as to what does what.

# 7 July 2020 (Tuesday)

- `mkUniqueGrimily`: great name for a function that creates data.

- OK, good, we now have MLIR that round-trips, in the sense that our

  MLIR gets verified. Now we have undeclared SSA variable problems:

```

tomlir-fibstrict.pass-0000.mlir:12:56: error: use of undeclared SSA value name

                                 %app_0  =  hask.apSSA(%var_minus_hash_99, %var_i_a12E)

                                                       ^

tomlir-fibstrict.pass-0000.mlir:12:76: error: use of undeclared SSA value name

                                 %app_0  =  hask.apSSA(%var_minus_hash_99, %var_i_a12E)

                                                                           ^

tomlir-fibstrict.pass-0000.mlir:25:72: error: use of undeclared SSA value name

                                                 %app_5  =  hask.apSSA(%var_plus_hash_98, %var_wild_X5)

                                                                       ^

tomlir-fibstrict.pass-0000.mlir:49:29: error: use of undeclared SSA value name

      %app_1  =  hask.apSSA(%var_TrNameS_ra, %lit_0)

                            ^

tomlir-fibstrict.pass-0000.mlir:50:29: error: use of undeclared SSA value name

      %app_2  =  hask.apSSA(%var_Module_r7, %app_1)

                            ^

tomlir-fibstrict.pass-0000.mlir:59:29: error: use of undeclared SSA value name

      %app_1  =  hask.apSSA(%var_fib_rwj, %lit_0)

                            ^

tomlir-fibstrict.pass-0000.mlir:65:37: error: use of undeclared SSA value name

              %app_3  =  hask.apSSA(%var_return_02O, %type_2)

                                    ^

tomlir-fibstrict.pass-0000.mlir:66:45: error: use of undeclared SSA value name

              %app_4  =  hask.apSSA(%app_3, %var_$fMonadIO_rob)

                                            ^

tomlir-fibstrict.pass-0000.mlir:69:45: error: use of undeclared SSA value name

              %app_7  =  hask.apSSA(%app_6, %var_unit_tuple_71)

                                            ^

tomlir-fibstrict.pass-0000.mlir:77:29: error: use of undeclared SSA value name

      %app_1  =  hask.apSSA(%var_runMainIO_01E, %type_0)

                            ^

makefile:4: recipe for target 'fibstrict' failed

make: *** [fibstrict] Error 1

```

Note that all of these names are GHC internals. We need to:

- Process all names, figure out what are our 'external' references.

- Code-generate 'extern' stubs for all of these.

There is also going to be the annoying "recursive call does not dominate use"

problem badgering us. We'll have to analyze Core to decide which use site is

recursive. This entire enterprise is messy, messy business.

The GHC sources are confusing. Consider `Util/Bag.hs`. We have `filterBagM` which

seems like an odd operation to have becuse a `Bag` is supposed to be unordered.

Nor does the function have any users at any rate. Spoke to Ben about it,

he said it's fine to delete the function, so I'll send a PR to do that once

I get this up and running...

# Wednesday, 8th july

- change my codegen so that regular variables are not uniqued, only wilds. This

  gives us stable names for things like `fib`, `runMain`, rather than names like `fib_X1` 

  or whatever. That will allow me to hardcode the preamble I need to build a

  vertical proptotype. This is also what Core seems to do:

```hs

Rec {

-- RHS size: {terms: 21, types: 4, coercions: 0, joins: 0/0}

fib [Occ=LoopBreaker] :: Int# -> Int#

[LclId]

fib -- the name fib is not uniqued

  = \ (i_a12E :: Int#) ->  -- this lambda variable is uniqued

      case i_a12E of {

        __DEFAULT ->

          case fib (-# i_a12E 1#) of wild_00 { __DEFAULT ->

          (case fib i_a12E of wild_X5 { __DEFAULT -> +# wild_X5 }) wild_00

          };

        0# -> i_a12E;

        1# -> i_a12E

      }

end Rec }

-- RHS size: {terms: 5, types: 0, coercions: 0, joins: 0/0}

$trModule :: Module

[LclIdX]

$trModule = Module (TrNameS "main"#) (TrNameS "Main"#)

-- RHS size: {terms: 7, types: 3, coercions: 0, joins: 0/0}

main :: IO ()

[LclIdX]

main

  = case fib 10# of { __DEFAULT -> return @ IO $fMonadIO @ () () }

-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}

main :: IO ()

[LclIdX]

main = runMainIO @ () main

```

Note that only parameters to lambdas and wilds are `unique`d. Toplevel names

are not. I need some sane way in the code to figure out what I should unique

and what I should not by reading the Core pretty printing properly.

- There is a bigger problem. Note that the Core appears to have ** two `main` ** 

  declarations. I have no idea WTF is the semantics of this.

- OK, names are now fixed. I call the underlying `Outputable` instance of `Var` that knows the 

  right thing to do in all contexts. I didn't do this earlier because it prints

  functions as `-#`, `+#`, `()`, etc. So I intercept these. The implementation

  is 4 lines, but figuring it out took half an hour :/. This entire enterprise

  is like this.

```hs

-- use the ppr of Var because it knows whether to print or not.

cvtVar :: Var -> SDoc

cvtVar v = 

	let name = unpackFS $ occNameFS $ getOccName v

	in if name == "-#" then  (text "%minus_hash")

  	   else if name == "+#" then (text "%plus_hash")

  	   else if name == "()" then (text "%unit_tuple")

  	   else text "%" >< ppr v 

```

##### Re-checking the dumps from `fibstrict.hs`

OK, so I decided to view the dump from the horse's mouth:

```hs

-- | fibstrict.hs

{-# LANGUAGE MagicHash #-}

import GHC.Prim

fib :: Int# -> Int#

fib i = case i of

        0# ->  i; 1# ->  i

        _ ->  (fib i) +# (fib (i -# 1#))

main :: IO (); main = let x = fib 10# in return ()

```

```Core

-- | generated from fibstrict.hs

==================== Desugar (after optimization) ====================

2020-07-08 16:31:29.998915479 UTC

...

-- RHS size: {terms: 7, types: 3, coercions: 0, joins: 0/0}

main :: IO ()

[LclIdX]

main

  = case fib 10# of { __DEFAULT ->

    return @ IO GHC.Base.$fMonadIO @ () GHC.Tuple.()

    }

-- | what is this :Main.main?

-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}

:Main.main :: IO ()

[LclIdX]

:Main.main = GHC.TopHandler.runMainIO @ () main

```

Note that there is `main`, and then there is `:Main.main` [So there is an extra `:Main.`].

This appears to inform the difference. One of them is some kind of top handler

that is added automagically. I might have to strip this from my printing.

I need to see how to deal with this. Will first identify what adds this symbol

and if there's a clean way to disable this.

- TODO: figure out how to get the core dump that I print in my MLIR file

  to contain as much information as the GHC dump. for example,

  the GHC dump says:

```

-- ***GHC file fibstrict.dump-ds***

-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}

:Main.main :: IO ()

[LclIdX]

:Main.main = GHC.TopHandler.runMainIO @ () main

```

```

-- ***my MLIR file with the Core appended to the end as a comment***

-- RHS size: {terms: 2, types: 1, coercions: 0, joins: 0/0}

main :: IO ()

[LclIdX]

main = runMainIO @ () main

```

- In particular, note that `fibstrict.dump-ds` says `:Main.main = GHC.TopHandler.runMainIO` while my

  MLIR file only says `main = runMainIO ...`. I want that full qualification

  in my dump as well. I will spend some time on this, because the upshot

  is **huge**: accurate debugging and names!

- The GHC codebase is written with misery as a resource, it seems:

```hs

-- compiler/GHC/Rename/Env.hs

        -- We can get built-in syntax showing up here too, sadly.  If you type

        --      data T = (,,,)

        -- the constructor is parsed as a type, and then GHC.Parser.PostProcess.tyConToDataCon

        -- uses setRdrNameSpace to make it into a data constructors.  At that point

        -- the nice Exact name for the TyCon gets swizzled to an Orig name.

        -- Hence the badOrigBinding error message.

        --

        -- Except for the ":Main.main = ..." definition inserted into

        -- the Main module; ugh!

```

Ugh indeed. I have no idea how to check if the binder is `:Main.main`

- What I do know is that this is built here:

```

compiler/GHC/Tc/Module.hs

-- See Note [Root-main Id]

-- Construct the binding

--      :Main.main :: IO res_ty = runMainIO res_ty main

; run_main_id <- tcLookupId runMainIOName

; let { root_main_name =  mkExternalName rootMainKey rOOT_MAIN

                   (mkVarOccFS (fsLit "main"))

                   (getSrcSpan main_name)

; root_main_id = Id.mkExportedVanillaId root_main_name

                                      (mkTyConApp ioTyCon [res_ty])

```

After which I have no _fucking_ clue how to check that the binding

comes from this module.

The note reads:

```

Note [Root-main Id]

~~~~~~~~~~~~~~~~~~~

The function that the RTS invokes is always :Main.main, which we call

root_main_id.  (Because GHC allows the user to have a module not

called Main as the main module, we can't rely on the main function

being called "Main.main".  That's why root_main_id has a fixed module

":Main".)

This is unusual: it's a LocalId whose Name has a Module from another

module. Tiresomely, we must filter it out again in GHC.Iface.Make, less we

get two defns for 'main' in the interface file!

```

# Monday, 13th July 2020

- Added a new type `hask.untyped` to represent all things in my hask dialect.

  This was mostly to future proof and ensure that stuff is not

  accidentally wrecked by my use of `none`.

## how is `FuncOp` implemented?

How the funcOp gets parsed:

- Toplevel: It calls `parseFunctionLikeOp`. They use `PIMPL` style here for whatever

  reason.

- https://github.com/llvm/llvm-project/blob/74145d584126da2ce7a836d9b2240d56442f3ea1/mlir/lib/IR/Function.cpp

```cpp

ParseResult FuncOp::parse(OpAsmParser &parser, OperationState &result) {

  auto buildFuncType = [](Builder &builder, ArrayRef argTypes,

                          ArrayRef results, impl::VariadicFlag,

                          std::string &) {

    return builder.getFunctionType(argTypes, results);

  };

  return impl::parseFunctionLikeOp(parser, result, /*allowVariadic=*/false,

                                   buildFuncType);

}

```

- the call to `parseFunctioLikeOp` does bog-standard stuff. The interesting

  bit is that it parses the function name as a _symbol_ (attribute). so the

  syntax `func foo` has `func` as a keyword, with `foo` being a symbol.

- Now I'm confused as to how this prevents "double declarations" of the same

  function. is this verified by the module after as a separate check, and

  not encoded as SSA? If so, that's fugly.

- https://github.com/llvm/llvm-project/blob/5eae715a3115be2640d0fd37d0bd4771abf2ab9b/mlir/lib/IR/FunctionImplementation.cpp#L160

```cpp

ParseResult

mlir::impl::parseFunctionLikeOp(OpAsmParser &parser, OperationState &result,

                                bool allowVariadic,

                                mlir::impl::FuncTypeBuilder funcTypeBuilder) {

  SmallVector entryArgs;

  SmallVector argAttrs;

  SmallVector resultAttrs;

  SmallVector argTypes;

  SmallVector
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bollu/lz

Awesome Lists containing this project

README