{"id":19364361,"url":"https://github.com/davidfstr/arf","last_synced_at":"2025-07-19T13:10:49.719Z","repository":{"id":36882846,"uuid":"41189834","full_name":"davidfstr/arf","owner":"davidfstr","description":"Tiny research language for investigating how to type-check programs with recursive function calls.","archived":false,"fork":false,"pushed_at":"2015-09-07T03:52:35.000Z","size":240,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-28T11:54:44.462Z","etag":null,"topics":["type-system"],"latest_commit_sha":null,"homepage":null,"language":"OCaml","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidfstr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-08-22T04:08:57.000Z","updated_at":"2017-12-16T21:30:50.000Z","dependencies_parsed_at":"2022-09-12T14:04:12.945Z","dependency_job_id":null,"html_url":"https://github.com/davidfstr/arf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/davidfstr/arf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2Farf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2Farf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2Farf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2Farf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidfstr","download_url":"https://codeload.github.com/davidfstr/arf/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2Farf/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263450010,"owners_count":23468154,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["type-system"],"created_at":"2024-11-10T07:37:11.823Z","updated_at":"2025-07-04T05:09:08.140Z","avatar_url":"https://github.com/davidfstr.png","language":"OCaml","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Assign-Recurse-Flow (ARF)\n\nThis is a tiny language calculus intended to research how to efficiently type-check recursive functions when performing a full-program analysis.\n\nSuch a procedure is important for full-program type checkers such as the [plint] type-checker I am writing for Python source code.\n\nThis research is now complete. Read on for details if you are interested in the theory. On the other hand if you just want to try out the ARF type checker, skip down to the \"Try ARF!\" section below.\n\n[plint]: https://github.com/davidfstr/plint\n\n## Abstract\n\nI have an algorithm that can fully type a recursive assignment-based program in **O(m·n)** worst case time, where:\n\n* **m** is the total number of functions in the program and\n* **n** is number of functions in the largest mutually recursive function group in the program.\n\n## ARF Language\n\nBelow is an (inefficient and slightly contrived) Python program that tests whether a specified integer \u003e= 0 is even or odd:\n\n```\ndef main(args):\n    k = int(args[0])\n    result = is_even(k)\n    return result\n\ndef is_even(n):\n    if n == 0:\n        k = True\n        return k\n    else:\n        result = is_odd(n - 1)\n        return result\n\ndef is_odd(n):\n    if n == 0:\n        k = False\n        return k\n    else:\n        result = is_even(n - 1)\n        return result\n\nif __name__ == '__main__':\n    import sys\n    print(main(sys.argv[1:]))\n```\n\nIf you save this as `is_even.py` then you can test whether an integer like `42` is even by running the command `python3 is_even.py 42`.\n\nThis same program in the ARF language would be written as:\n\n```\ndef main(_):\n    k = \u003cint\u003e               # AssignLiteral\n    result = is_even(k)     # AssignCall\n    return result           # Return\n\ndef is_even(n):\n    if:                     # If\n        k = \u003cbool\u003e          # AssignLiteral\n        return k            # Return\n    else:\n        result = is_odd(n)  # AssignCall\n        return result       # Return\n\ndef is_odd(n):\n    if:                     # If\n        k = \u003cbool\u003e          # AssignLiteral\n        return k            # Return\n    else:\n        result = is_even(n) # AssignCall\n        return result       # Return\n```\n\nNotice that various parts of the original Python program are erased in the ARF version, such as the condition of the if-statement and the specific values of literals like `True` and `False`. In particular anything that a flow-based type checker wouldn't care about is erased. This keeps the language focused on representing what such a type checker would see and need to evaluate.\n\nIt so happens that every possible ARF statement occurs in the above program, namely:\n\n* **AssignLiteral**(target_var, literal_type)\n* **AssignCall**(target_var, func_name, arg_var)\n* **If**(then_block, else_block)\n* **Return**(result_var)\n\nThere are a few additional constructs a type checker for a general purpose language must consider that ARF does not natively represent, namely loops and multi-parameter functions, but it is straightforward to extend the ARF language and type checker to support such constructs.\n\n## Type Checking\n\nGiven an ARF program, it is the objective of the ARF type checker to determine, for each function and argument type passed, what type the function will return.\n\nIn the example ARF program above, the determined types are:\n\n```\nmain(NoneType) -\u003e bool\nis_even(int) -\u003e bool\nis_odd(int) -\u003e bool\n```\n\nNotice that there is no deduced return type for calls like `is_even(bool)` because no such calls were made during any possible execution of the program.\n\nThe naive strategy for deducing such return types is simply to execute the ARF program along all possible code paths. However if there are any functions that perform a recursive calls (to a function earlier on the call stack) then the type checker will go into an infinite loop, descending infinitely into the function call graph.\n\n### Recursive Call Loops\n\nA smarter solution is to detect, when executing a call, whether the call is a recursive call and then suspend execution along that code path if it is. Once the target of the recursive call has an initial approximation of its return type, the suspended execution path can be resumed with the approximate return type that was deduced. This may happen repeatedly as the approximate return type of the recursive call's target is refined over time.[^max-refinements-in-loop] This procedure is analogous to the way the fixed point of a loop would be computed.\n\n[^max-refinements-in-loop]: At maximum there will be **t** refinements, where **t** is the number of types defined in the program. Currently **t** is bounded to exactly 3, since there are 3 builtin types (`NoneType`, `int`, and `bool`) and no user-defined types.\n\n### Infinitely Recursive Call Loops\n\nAnother difficulty that arises is that a function may provably always go into an infinite loop[^provable-infinite-loop], which prevents deducing any specific approximate return type for the function. For example neither function in the following ARF program will ever terminate:\n\n```\ndef infinite_loop_1(_):\n    _ = infinite_loop_2(_)\n    return _\n\ndef infinite_loop_2(_):\n    _ = infinite_loop_1(_)\n    return _\n```\n\nIn the case of such provably non-terminating function, the type checker can use a special return type that indicates the function can never return. This special type is written as ⊥ and pronounced \"bottom\". When the type checker receives a ⊥ as a result of a call, it suspends execution of the calling statement block.\n\nIf all execution paths in a function are suspended due to a ⊥ then the function itself returns a ⊥.\n\n[^provable-infinite-loop]: A function **f** provably goes into an infinite loop if, after executing all possible paths in the function, all of those paths were suspended due to recursive invocations of **f**. In that situation **f** is waiting on the suspended executions and the suspended executions are waiting on **f**, creating an unresolvable deadlock.\n\n### Avoiding Exponential Time with Non-Recursive Calls\n\nThe above strategy is sufficient for type checking any ARF program in such a way that the type checker itself will never go into an infinite loop when checking a program. However the type checker may do a lot of unnecessary work.\n\nConsider the following ARF program:\n\n```\ndef f1(_):\n    if:\n        _ = f2(_)\n    else:\n        _ = f2(_)\n\ndef f2(_):\n    if:\n        _ = f3(_)\n    else:\n        _ = f3(_)\n\n...\n\ndef f31(_):\n    if:\n        _ = f32(_)\n    else:\n        _ = f32(_)\n\ndef f32(_):\n    pass  # no statements in block, returning NoneType\n```\n\nIf you have **n** functions in a program following this pattern then type checking will take **O(2ⁿ)** time, which is clearly unacceptable.\n\nThe performance issue arises because functions **f\u003csub\u003e2\u003c/sub\u003e**..**f\u003csub\u003e32\u003c/sub\u003e** are needlessly type-checked multiple times. After such a function has returned with a final deduced return type, it is unnecessary to recheck it.\n\nTherefore while type checking you can introduce a cache of all functions whose final return type has been deduced. If an attempt is made to call a function whose return type resides in the cache, the cached return type can be used immediately.\n\nSuch a caching strategy reduces the time to type-check a program in this pattern to **O(n)**. Much better.\n\n### Avoiding Exponential Time with Recursive Calls\n\nHowever the preceding caching strategy faces some challenges when working with recursive function calls, since recursive calls can suspend execution and thereby delay the computation of a function's *final* return type.\n\nConsider the following ARF program:\n\n```\ndef f1(_):\n    if:\n        _ = f1(_)\n        return _\n    else:\n        if:\n            _ = f2()\n            return _\n        else:\n            if:\n                _ = f3()\n                return _\n            else\n                ... (up to f32)\n\ndef f2(_):\n    ... (same statements as in f1)\n\n...\n\ndef f32(_):\n    if:\n        ... (same statements as in f1)\n    else:\n        pass  # no statements in block, returning NoneType\n```\n\nIn this program, every function calls every other function in the program. The last function additionally may return `NoneType`, which will eventually propagate to every other function's return type.\n\nWhen type-checking function **f\u003csub\u003e2\u003c/sub\u003e**..**f\u003csub\u003e32\u003c/sub\u003e**, it is not possible to immediately deduce a final return type for the function because it depends on the return type of **f\u003csub\u003e1\u003c/sub\u003e**, which doesn't even have a first approximation by the time that **f\u003csub\u003e2\u003c/sub\u003e**..**f\u003csub\u003e32\u003c/sub\u003e** complete their execution the first time around.\n\nIt is the case, however, that any function **f** requires at most two executions of its body to fully determine its exact return type.[^max-two-executions-of-body] In the worst case, as demonstrated above, every function in the recursive function group must execute twice within its caller. In such a case the recursive function group executes in time **O(n²)**, where **n** is the number of functions in the recursive function group.\n\n#### What about Recursive Calls in Series?\n\nConsider the following ARF program:\n\n```\ndef f1(_):\n    if:\n        if:\n            _ = f1(_)\n            ...\n            _ = f32(_)\n            return _\n        else:\n            if:\n                _ = f2()\n                ...\n                _ = f32()\n                _ = f1()\n                return _\n            else:\n                if:\n                    _ = f3()\n                    ...\n                    _ = f32()\n                    _ = f1()\n                    ...\n                    _ = f2()\n                    return _\n                else\n                    ... (up to f32)\n    else:\n        pass  # no statements in block, returning NoneType\n\ndef f2(_):\n    ... (same statements as in f1)\n\n...\n\ndef f32(_):\n    ... (same statements as in f1)\n```\n\nThis program still executes in **O(n²)** time\u003csup\u003e†\u003c/sup\u003e, where **n** is the number of functions in the recursive function group, since my previous statements about evaluating recursion function groups still apply.\n\n† However an **O(n²)** total execution time is misleading here since it incorrectly assumes that the average function size is constant and not related to **n**. Here however each function contains **O(n²)** statements, based on the pattern of construction. Therefore the total execution time of a program written in this pattern is actually **O((n²)²) = O(n⁴)**.\n\n### Summarizing the Worst Case Time\n\nConsidering all of the prior discussion, a program is made of a set of functions, with certain function subsets being mutually recursive. These mutually recursive functions groups are non-overlapping (because otherwise they would be in the same group). Therefore there can be at most **O(m/n)** recursive function groups, where **m** is the total number of functions in the program and **n** is the size (i.e. number of functions in) the largest recursive group.\n\nEach recursive function group takes at worst **O(n²)** time to execute, where **n** is the size of the group, as mentioned in prior discussions.\n\nWhen a program with **m** functions contains no recursive functions invocations, it takes **O(m)** to execute, since every function body only needs to be executed once.\n\nWhen a program containing a mix of both recursive and non-recursive functions is executed, it takes (at worst) time:\n\n* (Max # recursive function groups)·(Worst case time for the largest group) + (Worst case time for program if all functions were non-recursive) =\n* O(m/n)·O(n²) + O(m) =\n* O((m/n)(n²) + m) =\n* O(m·n + m) =\n* **O(m·n)**\n\n## Try ARF!\n\nEnough theory. Try the ARF type checker yourself on some sample programs!\n\n### Build\n\n* Install Make.\n* Install OPAM and OCaml 4.02.1.\n    * On OS X, run `brew install opam` to get OPAM.\n    * With OPAM, run \u003ctt\u003eopam switch 4.02.1\u003c/tt\u003e and \u003ctt\u003eeval \u0026#x60;opam config env\u0026#x60;\u003c/tt\u003e\n* Run `make deps` to install remaining OCaml dependencies.\n* Run `make build` to build ARF.\n* Run `make test` to run the automated tests.\n\n### Run\n\nThere are a series of sample files in the `samples/` directory of the ARF project. You can run samples using a command like:\n\n```\n./Arf.native samples/is_even.arf\n```\n\nThat will type-check the specified ARF program and output the deduced return types of all functions:\n\n```\nmain(NoneType) -\u003e bool\nis_even(int) -\u003e bool\nis_odd(int) -\u003e bool\n```\n\nIf there are functions that provably never terminate, they will be given a ⊥ (\"bottom\") return type. For example:\n\n```\ninfinite_loop_1(NoneType) -\u003e ⊥\ninfinite_loop_2(NoneType) -\u003e ⊥\n```\n\n### Test\n\nThere are a number of automated unit tests for ARF. These tests contain several interesting programs.\n\nRun the automated tests with:\n\n```\nmake test\n```\n\n## License\n\nCopyright (c) 2015 by David Foster\n\n[^max-two-executions-of-body]: The \"max two executions needed\" property requires a lot of exposition to prove formally, so I will simply assert its truth for the moment.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidfstr%2Farf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidfstr%2Farf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidfstr%2Farf/lists"}