{"id":21028239,"url":"https://github.com/exyi/cisint","last_synced_at":"2025-03-13T19:13:54.459Z","repository":{"id":83321588,"uuid":"135472704","full_name":"exyi/cisint","owner":"exyi","description":".NET CIL symbolic interpreter","archived":false,"fork":false,"pushed_at":"2019-01-24T13:05:29.000Z","size":171,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-20T14:50:38.421Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"F#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/exyi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-30T16:54:09.000Z","updated_at":"2019-01-24T13:05:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"2e019fd0-2cf3-499c-a171-a008cf6ffa00","html_url":"https://github.com/exyi/cisint","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exyi%2Fcisint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exyi%2Fcisint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exyi%2Fcisint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exyi%2Fcisint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/exyi","download_url":"https://codeload.github.com/exyi/cisint/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243467024,"owners_count":20295309,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T11:54:33.041Z","updated_at":"2025-03-13T19:13:54.432Z","avatar_url":"https://github.com/exyi.png","language":"F#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CISINT - .NET CIL symbolic interpreter\n\nAs you could guess from the name, this is a library for symbolic execution of .NET code. It is intended for execution of specific method, not the whole program and it tries to perform full analysis of the program - not just a few of selected paths that were found. When some methods cannot be fully understood, they are considered a side effect of any code that invokes the function.\n\nFor example, suppose you have a function `let makeTuple (x:int) = (x, x + 1)` (returns a tuple of `x` and `x + 1`. When you try to execute it with generic parameter named `a` you will essentially get the function decompiled (just to a strange language) - it will say that function return a new object of type `System.Tuple``2\u003cSystem.Int32,System.Int32\u003e` that has field `m_Item1` set to `a` and `m_Item2` set to `a + 1`. This is how the output currently looks like:\n\n```\n.heapStuff {\n        let o1 = new System.Tuple`2\u003cSystem.Int32,System.Int32\u003e\n        o1.m_Item1 = a\n        o1.m_Item2 = (a + 1)\n}\n\nreturn [\n        o1\n]\n```\n\n\u003e Note that this language is just a \"debug view\" of the state, and its syntax is pretty arbitrary.\n\nThe fun begins when you use this object in another function -- suppose we have functions `let sumTuple (a: int, b) = a + b` and `let fn a = sumTuple (makeTuple a)`. When we execute `fn` with a generic parameter we get only this:\n\n```\nreturn [\n        (a + (a + 1))\n]\n```\n\nAlthough the tuple object was allocated, when it was used the interpreter when what's inside, so it has just inlined the value. And because the object is not used anywhere it's not displayed at all.\n\n\u003e Note that F# compiler is capable of doing this optimizing this function itself, but it's only limited to tuples.\n\nThe main strength of this thing is executing functions when only some parameters are generic. For example, when you have a higher order function that is invoked with a lambda expression (essentially a \"code constant\") like this:\n\n```fsharp\nlet seqMap (fn: 'a -\u003e 'b) (xs: 'a seq) =\n\tseq { for i in xs do yield fn i }\nlet seqCreate count value =\n\tseq { for _ in 1..count do yield value }\n\nlet fn (a: int) (b: int) =\n\tseqCreate a b |\u003e seqMap (fun a -\u003e a * a) |\u003e Seq.sum\n```\n\nWhen you try to execute `fn 3 b` you will get a result like `b*b + b*b + b*b` although executing just `seqMap` would fail. If you'd like to appreciate the capabilities of the interpreter, have a look at [the code generated from the `seq` computation expressions](https://sharplab.io/#v2:DYLgZgzgPg9gDgUwHYAIDKBPCAXBBbAWAChsNF0Y8FsALASyQHMUBeFAY2AEMIJiUBKHF2x12KKngBGCAE7oEARwCyXOCgAUYJCBQByLigC0APn1SAlJoAeEXQaFKrLfoLcQlKAN4owMeXQoDCi2KAAmMCgYdAjAYb6ogQC+rgLCouKSMvJoSgDCsggiCBwwAK5I2CgAblzAZSUuRG7unj5+AUGoAIwAdL3s5ZXhkdGx8bX1JSlEqULYImIS+NkoAGJoNFyycAAyXNJhPACCSGG5ippcugzYznNuaJTU9Ey9FwVFuCiGhlBmTyotAYjHeSlU6i0FR+xjMhgAVD8rP8FIpehAyoQiMgwkA===)\n\n### Limitations\n\nThe interpreter is quite powerful, but there is also a lot of limitations. Specifically these are not supported:\n\n* Obscure and unsafe IL instructions\n* Delegates (and C# lambdas)\n* Most exception handlers (unless it figures out that `try` or `finally` section does nothing)\n* Cycles when the condition cannot be computed\n* Functions with too many jumps (by default 100). It's there to prevents stalls on complex computations and infinite cycles.\n* Methods that don't have IL body.\n* When compiled in RELEASE, functions that trigger unexpected exceptions are also considered uninterpretable. In DEBUG, the exception will not be caught and it will crash the entire interpreter.\n\nIn most cases, the interpreter will return the correct result or say that it can't interpret the function. Except for a few cases where it deviates from the .NET runtime specification:\n* When the code depends on an exception from a primitive operation (like a cast of an object) but does not use its value.\n\t- When the value is used, the cast is reflected somewhere in the tree, so it should be equivalent with the original code.\n\t- This is probably a necessary compromise since it's pretty rare and checking each overflow, cast or null reference would overwhelm the output by a ton of side-effects and make it practically unusable.\n* Although there is a protection against infinite loops and stuff like that, in some cases the interpreter may end up with stack overflow exception, out of memory exception or may simply take too much time to run. It's not great, but at least does not produce bad results.\n\n### Side effects\n\nWhen something unsupported is encountered, the function is marked as *too complicated* and it's considered to be a side-effect of the computation. All side effects are tracked in the original order (with conditions under which can it happen). For example, functions that are using C#'s `yield return` do some optimization if the current thread has not changed by calling `Thread::GetCurrentThreadNative` and `Thread::GetCurrentThreadNative`. These invocations are tracked as side effects and its return values introduce new symbolic parameters and some expressions may depend on them. In case of the iterators, the behavior is the same regardless of the function result, so the symbolic parameter is likely not used anywhere and the result might look like this:\n\n```\n\nse102 := global. System.Threading.Thread System.Threading.Thread::GetCurrentThreadNative()()\n.heapStuff {\n\tlet se102 = new System.Threading.Thread\n\tshared se102\n}\nse103 := global. System.Int32 System.Threading.Thread::get_ManagedThreadId()(se102)\nse104 := global. System.Threading.Thread System.Threading.Thread::GetCurrentThreadNative()()\n.heapStuff {\n\tlet se104 = new System.Threading.Thread\n\tshared se104\n}\nse105 := global. System.Int32 System.Threading.Thread::get_ManagedThreadId()(se104)\nif !(se103 = se105) {\n\t * \tse106 := global. System.Threading.Thread System.Threading.Thread::GetCurrentThreadNative()()\n\t * \t.heapStuff {\n\t\t\tlet se106 = new System.Threading.Thread\n\t\t\tshared se106\n\t\t}\n\t\tse107 := global. System.Int32 System.Threading.Thread::get_ManagedThreadId()(se106)\n}\n\nreturn [\n\t-97\n]\n\n```\n\nYou may have noticed the **shared** statement in the code above. It basically means that everything in this object may be in any state and it can be shared with another thread and every read and write to it is considered a side-effect. Fortunately, in this case, the computation was not dependent on the result of these methods and always returns -97.\n\n### Program state\n\nWhen the code is interpreted, the interpreter keeps its state (locals, parameters, IL stack, objects) in the same `ExecutionState` record that is used to communicate out the result of the computation. Most of the state info is stored in the form of symbolic expressions -- when you have a value somewhere it can be either a constant, symbolic parameter or an expression like `a + 1` or `if (x = 1) { y } else { z }`. These expressions represent values on the IL stack, in local variables, in fields of objects on the heap or elements of an array, they can contain references to another objects, invocation of pure functions, invocations of instructions, conditions and constants. The state may be formatted to the pseudocode you have seen above -- the code has a few important features:\n\n* It always starts with a list of side effects.\n\t- Field read/writes are represented intuitively as a assignment (`\u003c-`) or parameter definition (`:=`).\n\t- Method calls are represented as a method full name (including full signature) and a arguments in brackets. There is usually a parameter defined for the result value.\n\t\tThere may a `.global` prefix if the method call may have an observable effect on the entire environment and there may be `.virt` prefix if the call is virtual.\n\t\tFor example: `se92 := global. System.Int32 Cisint.Tests.TestInputs.Something::SideEffect2(System.String)(y)`\n\t- If the  is conditioned, it's wrapped in a `if` block.\n* Then there are expressions on the IL stack - usually a result value of a method call\n* `.heapState` blocks - when the side-effect or result depends on some objects, they are introduced and put into the correct state in this block. It may contain the following constructs:\n\t- `let p = new Obj` - declares new parameter and assigns an uninitialized object into it\n\t- `def p : Obj` - declares new parameter and says that it has type `Obj`. In this case, it can also contain object of a derived type.\n\t- `p.f = expr` - assigns `expr` to field `f`\n\t- `shared x [iff condition]` - if the condition holds, the object is in a **shared** state\n\n\n### Expression simplifier\n\nTo figure out what is in the symbolic expressions, we have a simplifier -- it takes an expression and tries to reduce its complexity. Specifically, it should figure out that an expression is always `true`/`false` to prevent redundant branches, it should fold constant (i.e. `1 + 1` -\u003e `2`), and transform equivalent expressions into a common shape (`a + b` \u003c--\u003e `b + a`). It's quite common for symbolic execution engines to use some SMT solver, this simplifier is a very basic compared to that, but it can IMHO handle lot of practical use cases. Unfortunately, it's not very powerful, so you can't expect it to understand hash-tables or quick-sort. Overall, it does something, but I'm not very satisfied with its capabilities, I'll have to look how to write these things...\n\n### Extensibility\n\nWhen looking at the result of the interpretation is not enough for you (because it was not able to understand something, you don't want it to implement something, ...) there are some extensibility option. The main \"entry point function\" `Interpreter.interpretMethod` takes an argument of type `ExecutionServices` which contains few function that you may find good to override. Specifically, you can provide custom implementation (or decorated version of the default one) of recursive `InterpretMethod`, custom logic for accessing static field (they may often contain predictable data, even though in general case we can't assume much about them). You may also provide information about method that are considered a side-effect and a dispatcher of computation frames, if you have a smart way of planning them.\n\n### Test/Demo\n\nIf you are looking for some demo, you can run the unit tests (`dotnet test` in `src/Cisint.Tests`) and then have a look in the `bin/Debug/netcoreapp2.0/state_dump` directory. There are printed out versions of the execution state after the corresponding method from `TestInputs.fs` were executed. You can of course add you own ones if you want to see what it does to your piece of code.\n\n\n## Docs\n\nYou can find more detailed into in\n* [API documentation](docs/api.md) - how to use the API and extensibility points\n* [Internals docs](docs/internals.md) - how it works. It may me useful in order to understand what's happening.\n\n## Project status\n\nThis projects is rather a proof of concept than something you could really use. If you'd like to play with it you are welcome, you can post issues, pull requests or join discussion at [#cisint-chat:matrix.org](https://riot.im/app/#/room/#cisint-chat:matrix.org)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexyi%2Fcisint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexyi%2Fcisint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexyi%2Fcisint/lists"}