{"id":22645869,"url":"https://github.com/SwitchGDX/clearwing-vm","last_synced_at":"2025-08-06T05:31:12.037Z","repository":{"id":65410581,"uuid":"485258117","full_name":"TheLogicMaster/clearwing-vm","owner":"TheLogicMaster","description":"A Java to C++ transpiler project","archived":false,"fork":false,"pushed_at":"2024-11-17T17:53:22.000Z","size":1880,"stargazers_count":42,"open_issues_count":1,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-17T18:44:10.901Z","etag":null,"topics":["cplusplus","cpp","java","transpiler","vm"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TheLogicMaster.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-25T07:01:57.000Z","updated_at":"2024-11-13T13:26:30.000Z","dependencies_parsed_at":"2024-11-18T01:04:17.492Z","dependency_job_id":null,"html_url":"https://github.com/TheLogicMaster/clearwing-vm","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheLogicMaster%2Fclearwing-vm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheLogicMaster%2Fclearwing-vm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheLogicMaster%2Fclearwing-vm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheLogicMaster%2Fclearwing-vm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TheLogicMaster","download_url":"https://codeload.github.com/TheLogicMaster/clearwing-vm/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228841595,"owners_count":17980061,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cplusplus","cpp","java","transpiler","vm"],"created_at":"2024-12-09T06:07:41.418Z","updated_at":"2025-08-06T05:31:12.014Z","avatar_url":"https://github.com/TheLogicMaster.png","language":"Java","readme":"# Clearwing VM\n[![Release](https://jitpack.io/v/com.thelogicmaster/clearwing-vm.svg)](https://jitpack.io/#com.thelogicmaster/clearwing-vm)\n\n## About\nThis is a Java to C++ transpiler based on the CodenameOne Parpar VM. It was created for use in \n[SwitchGDX](https://github.com/TheLogicMaster/switch-gdx), the Nintendo Switch Homebrew and Xbox UWP backend for\nthe [LibGDX](https://libgdx.com) game framework. It features a stripped down runtime library and produces native\nexecutables that don't require a VM to run. A minimal example based on Gradle and CMake is provided.\n\n```mermaid\nflowchart LR\n    Java --\u003e compiler\n    Kotlin --\u003e compiler\n    compiler(Compiler) --\u003e Bytecode --\u003e Transpiler --\u003e C++ --\u003e cpp_compiler(C++ Compiler) --\u003e Executable\n```\n\n## Features\n- Regex ([RegExodus](https://github.com/tommyettinger/RegExodus))\n- Basic File I/O\n- Buffers (Direct and ByteBuffer wrappers)\n- Basic threading\n- Reflection\n- Java 7 runtime with Java 8 features (lambdas, method references, and default methods)\n- Incremental compilation\n- Plain C++ 20 project output\n- Easy Gradle integration\n- Class-level optimization (Only required classes are built)\n- Instruction-level optimizations (Partially decompile Bytecode into optimizable C++)\n\n## Limitations\n- Limited runtime library\n- Not fully conformant with standard (Regex, string formatting, internal names, Unicode support)\n- Has to be debugged as a large C++ codebase\n- Long initial compilation and linkage times\n\n## Todo\n- Testing suite\n- Further runtime library support and specify which packages are supported\n- Transpiler logging\n- Method trimming\n- Crash logs\n- Add additional safe-points in loops without function calls to prevent freezes on GC/exit\n- Debugger (Possibly by python GDB based debug server)\n- Floating point to int conversion conformance (Undefined in C++, needs to round to zero with special cases for NaN and infinity)\n\n## Dependencies\n- C++ 20\n- ZLib\n- ZZip\n- LibFFI\n- CMake (If using default project files)\n\n## Usage\nThe transpiler is provided as a fat JAR which can be used as a standalone commandline tool or integrated into a \nbuild system. An example is provided for a simple Gradle based project which produces and builds a CMake C++ \nproject. Command line arguments are used for specifying the input and output directories as well as a JSON config\nfile for further configuration of transpiler options. The transpiler is also provided as a normal Java library \nwhich could be used programmatically. Java 17 is what the transpiler has been tested with.\n\nJitPack Maven artifacts:\n```\n'com.thelogicmaster.clearwing-vm:transpiler:{version}'     // Fat JAR\n'com.thelogicmaster.clearwing-vm:transpiler-lib:{version}' // Library JAR\n'com.thelogicmaster.clearwing-vm:annotations:{version}'    // Annotations\n```\n\nThe following command line arguments are provided:\n- __--input, -i__: Input class directories (`build/classes`) or library JARs\n- __--source, -s__: Source directories (`main/java/src`) to use for jnigen style inline native code generation (Optional)\n- __--output, -o__: The output directory to write into (Required)\n- __--main, -m__: The class with the `public static void main` entrypoint method (Optional)\n- __--config, -c__: A JSON config file with further configuration options (Optional)\n- __--project, -p__: Enable generating basic project files (Optional)\n\n## Configuration\nThe JSON file specified by the `config` command line argument is loaded in addition to files ending in \n`clearwing.json` in the input JARs. Any pattern arguments accept wildcards, where `*` expands to any name section\nand `**` expands to anything. \n\nThe available options are as follows:\n- __nonOptimized__: Patterns for classes to not optimize out even if unused\n- __intrinsics__: A list of methods to treat as native so that they can be patched (For example, `java.lang.Integer.toString()Ljava/lang/String;`)\n- __sourceIgnores__: Patterns for source files to ignore for jnigen style inlined C++\n- __generateProjectFiles__: Whether to generate basic project files like the CMake config\n- __mainClass__: An optional \"main class\" that contains the entrypoint main function\n- __useLineNumbers__: Enable stack trace line numbers (Requires stack traces, disable for a slight performance increase) (Not yet used)\n- __useValueChecks__: Enable type/NPE checks at runtime, has substantial performance overhead (Not yet used)\n- __useStackCookies__: Enable Java stack cookies (Only needed for debugging VM) (Not yet used)\n\n## Native Code\nNative C++ code can be used with the project in a ways. The easiest way is to add native functions with trailing\nblock comments, ensuring that the Java source directory has been added to the command line parameters. This will\ngenerate a corresponding C++ function with the inlined code, jnigen style, with helper code for accessing buffer,\nstring, and array data. You can also generate a code block outside of functions by using a block comment that starts\nwith `/*JNI`. You can also declare a method as `native` normally and later supply a C++ file implementing the function.\nThis approach can be tedious with the name mangling that takes place. If you need to replace an existing method with\na native implementation for performance reasons, you can add an entry to the `intrinsics` config option and define a\ncorresponding C++ function in the same way. There's no JNI library loading functionality, so native libraries need to\nhave bindings written for them.\n\n## Building\nAs a Gradle project, there are tasks for building the needed components. JDK 17 is what has been tested with.\nThe runtime and example submodules get built with a language level of 8, and the transpiler uses language level 14.\nWhen building the runtime in an Intellij, `Build\u003e\u003eRebuild Project` may be necessary after making changes. \n\n## Technical Details\nThis project was originally just the CodenameOne Parpar VM made to work as a cross-platform C codebase, with \nvarious bug fixes and additional runtime library functionality added to work with LibGDX. As a C codebase using\nstructs and function pointer arrays to represent Java classes, it became quite tedious and challenging to debug,\nespecially for the garbage collector related memory bugs. The coding practices used in the transpiler side of the\nproject also left a lot to be desired, with the redundancy and lack of organization making it a challenge to \nmaintain. As such, the decision was made to start from scratch with a new transpiler and \"VM\" that would utilize\nmodern C++. The C version is now available in the _legacy_ branch. The original C++ version used more convenient\nand higher level C++ features like smart pointers, inheritance, and type variants, which made it much easier to\ndebug, but the result had performance and memory leak issues which were unsuitable for large projects. It was an\ninteresting experiment, but the lack of a garbage collector led to reference cycles leaking memory and the performance\noverhead of storing stack values as variants and using smart pointer for all objects was huge, even after optimizations.\nThe current architecture aims to maintain a C ABI compatible interface for generated code and represents objects as\nplain data structs. Since C++ is only used to make the code a bit nicer with templates and such (Especially in hand\nwriten code), exceptions use longjmp/setjmp rather than C++ exceptions, so it is generally unsafe to count on RAII\nor non-volatile variable contents when calling code which can throw exceptions.\n\n### VM\nThe \"VM\" part of the project is a handful of C++ files that handle interfacing with the host system, providing\nan initialization mechanism, and various helper functions. `Clearwing.h` is the main header, providing the \nprimary types and functions, as well as macros for implementing Java ByteCode. \n`Array.hpp` is the representation of a Java array. `Object.cpp` is the only\nnon-autogenerated runtime class, since it's the parent type. `Config.h` and `main.cpp` are autogenerated by\nthe transpiler, storing configuration flags and the program entrypoint, respectively. Class registration is \ndone by individual classes calling the `registerClass` during global initialization. No \ngenerated code can be called at this point to avoid circular dependencies during VM startup, so the required\narrays, strings, and classes are specially crafted to avoid calling any Java static initializers. After this\npoint, the main program entrypoint is run, the VM gets initialized, then the program runs, lazily initializing\nclasses as normal. Classes are stored into a map for easy retrieval by name. Similarly, strings are pooled and\nplaced into a map by address of the literal when using the literal `_j` suffix. \n\n### Types\nReference types are stored as pointers and primitive values are stored as their corresponding type. When\nin a method local or on the method stack, types are stored as their arithmetic variants, meaning smaller types \nget promoted to int and the remaining ones are stored the same way. Wide types (Long and double) normally would\noccupy two stack/local slots, but here, all types are stored in a union which can accommodate any\narithmetic or reference type. For common reference types like objects, arrays, strings, classes, and each\nof the primitive values, typedefs are provided in the form of `jobject`, `jint`, `jarray`, and so on. Arrays\nare a bit special in that they don't have variants for different component types. Instead, they contain a\n`void *` to their data along with the name of the type they contain. It's up to the accessor to know the type\nthat a given array contains, though there are type checks for primitive/reference values to avoid memory bugs.\nStrings are normal objects, but get constructed\nfrom C-strings or C++ UTF-8 literals using the custom `_j` suffix. The value is stored as UTF-16 and can be\nretrieved as a C-string using the `getNativeString` function. Classes are also normal objects, but get\nconstructed by the VM when registering object classes or creating classes for array types.\n\n### Garbage Collection\nGarbage collection is done by the garbage collector at object allocation time based on the amount of memory and \nnumber of objects allocated since the last collection. It is a simple algorithm, but should be sufficient, and\nmacros are available to override the defaults if needed. All objects that aren't static class members, on a Java \nthread's stack, explicitly protected, and are not referenced by another reachable object will be collected when the\nGC is run. Before running, the thread running the GC waits for all other threads to be at a safe-point. This \nnormally happens during function calls, where threads check for thread suspension and wait for the GC to run. \nCurrently, a thread that is busy waiting or otherwise blocking without performing any function calls could lead to\na system freeze when the GC tries to run. This issue could be prevented in the future by inserting safe-point calls\nas needed in loops without function calls. A thread in native code like `Thread.sleep` can be treated as if it is\nat a safe-point so long as it does not interact with VM objects or return to VM code until the GC is done. To avoid\npremature collection of objects, it is essential to ensure that all objects are stored on the stack before calling\nany functions, since it is the responsibility of the caller to protect arguments. That is the safest way for hand-writen\nnative code, but objects can also be marked as protected at allocation time to prevent collection until marked accordingly.\n\n### Exceptions\nException handling is done using longjmp/setjmp, which has implications as far as memory safety is concerned.\nC++ exceptions are not compatible with this approach, along with normal RAII behavior, so uncaught native exceptions\nwill crash the program and a jump from a Java exception will bypass native stack cleanup, so objects can leak and\notherwise cause issues. Variables used between potential jumps also have to be volatile to prevent the compiler from\noptimizing a stack access into a register and the like, causing undefined behavior. Helper templated functions are\npresent to help a bit mitigate these potential issues.\n\n### Reflection\nReflection metadata is generated automatically for all classes. libFFI is used for method reflection, \nand fields use `offsetof` to get member offsets. The legacy object-oriented C++ version required the generation\nof helper/proxy functions to access fields and call methods so generation was disabled by default, but that\nbehavior is not needed anymore. The only attention needed for classes used with reflection is that they must be\nexplicitly listed in the config if they are not directly referenced by other used classes or they will be optimized\nout at compile time.\n\n### Java 8 Features\nLambda and method reference functionality is accomplished by generating proxy classes which implement\nthe target interface, store captured values, and handle primitive boxing conversions. It only supports\nthe lambda factory InvokeDynamic and string builder targets, so compiling with a version later than Java 8 may introduce \nunsupported calls for string building and such. Default methods are implemented by populating inheriting class\nvtables with default values before populating overwriting implementations.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSwitchGDX%2Fclearwing-vm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSwitchGDX%2Fclearwing-vm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSwitchGDX%2Fclearwing-vm/lists"}