{"id":19092412,"url":"https://github.com/p7g/c-bytecode-vm","last_synced_at":"2025-08-21T08:09:05.822Z","repository":{"id":42479791,"uuid":"275463123","full_name":"p7g/c-bytecode-vm","owner":"p7g","description":"A VM implementing a dynamically-typed imperative programming language from scratch.","archived":false,"fork":false,"pushed_at":"2025-03-23T02:21:27.000Z","size":1092,"stargazers_count":9,"open_issues_count":16,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-30T15:51:13.740Z","etag":null,"topics":["interpreter","programming-language"],"latest_commit_sha":null,"homepage":"https://p7g.github.io/c-bytecode-vm","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/p7g.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-27T22:27:07.000Z","updated_at":"2025-03-23T02:16:27.000Z","dependencies_parsed_at":"2025-01-04T20:31:40.295Z","dependency_job_id":"50c7c80c-9167-424e-ac14-b1a4c1d5f41d","html_url":"https://github.com/p7g/c-bytecode-vm","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p7g%2Fc-bytecode-vm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p7g%2Fc-bytecode-vm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p7g%2Fc-bytecode-vm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p7g%2Fc-bytecode-vm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/p7g","download_url":"https://codeload.github.com/p7g/c-bytecode-vm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249500389,"owners_count":21282393,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["interpreter","programming-language"],"created_at":"2024-11-09T03:19:39.576Z","updated_at":"2025-04-18T13:34:38.872Z","avatar_url":"https://github.com/p7g.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- vim: set tw=79: --\u003e\n\n# c-bytecode-vm\n\n[![TODOs](https://img.shields.io/github/search?query=repo%3Ap7g%2Fc-bytecode-vm%20AND%20TODO\u0026label=TODOs)](https://github.com/search?type=code\u0026q=repo%3Ap7g%2Fc-bytecode-vm+TODO)\n[![FIXMEs](https://img.shields.io/github/search?query=repo%3Ap7g%2Fc-bytecode-vm%20AND%20FIXME\u0026label=FIXMEs\u0026color=red)](https://github.com/search?type=code\u0026q=repo%3Ap7g%2Fc-bytecode-vm+FIXME)\n\n[Standard libary documentation](https://p7g.github.io/c-bytecode-vm/docs/stdlib)\n|\n[Performance tracking](https://p7g.github.io/c-bytecode-vm/dev/bench)\n\nThis is a small, weakly and dynamically typed, interpreted programming language\nthat doesn't really have a name. My goal is to see how close I can get to a\n\"real\" programming language like Python while implementing everything on my own.\n\nAnother goal I have with this language is to keep the built-in stuff\n(implemented in C) to a minimum. You'll probably be able to tell throughout the\nrest of this document.\n\nTo try any of these examples, clone the repository, run `make` (only MacOS and\nLinux are supported right now), and start the repl by running `./cbcvm`. Use the\n`--help` flag to see how to otherwise use the CLI.\n\nWithout further ado, here is an overview of the language:\n\n## Comments\n\nThere are only line comments. They start with a `#` and end at the next newline.\n\n## Primitive Types\n\nObjects of a primitive type are immutable and passed around by value; if you\naccess a variable bound to a number, you get a copy of its value. These values\nare stored directly on the (VM's) stack.\n\nThe primitive types are the following:\n\n### Integer\n\nA signed 64-bit integer. Hexadecimal literals are supported (with a lowercase X).\n\nExamples:\n```js\n1234\n0xabCdef\n-6\n```\n\n### Double\n\nA double-precision floating point value.\n\nExamples:\n```js\n1.1\n0.123\n123.0\n-156.3\n```\n\n### Char\n\nA single Unicode codepoint.\n\nExamples:\n```c\n'a'\n'\\n'\n'\\''\n```\n\n### Bool\n\nTrue or false.\n\nExamples:\n```js\ntrue\nfalse\n```\n\n### Null\n\nA sentinel value used to represent the absence of a value.\n\nExample: `null`\n\n### String\n\nA UTF-8-encoded string. Can be manipulated using functions in the `string`\nmodule.\n\nExamples:\n```js\n\"hello\"\n\"\\\"\"\n\"this\nspans\nmultiple\nlines\"\n```\n\n## Composite Data Types\n\nThere are three native types for aggregating values:\n\n### Array\n\nA fixed-length, heap-allocated, mutable series of values. Array elements are\naccessed using brackets (`[]`).\n\nExamples:\n```js\n[1, \"hello\", null]\n[]\n[1,]\n\nsome_array[3]\n```\n\nAccessing an index that is out of range will throw.\n\n### Struct\n\nEssentially an array with named indices. A struct has a \"struct spec\" associated\nwith it, but it has no \"identity\"; two structs of the same spec with identical\nvalues are considered equal.\n\nIf a key is not specified when instantiating a struct spec, it is assigned null.\n\nStruct fields are accessed using a `:`.\n\nExamples:\n\n```c\nstruct test { a, b }\ntest { a = 1 }\n\ntest.a\n```\n\nAccessing a struct field that does not exist results in a throw at run-time.\n\nAn anonymous struct can be declared and instantiated in one expression:\n```c\nprintln(struct { a = 123 });\n```\n\nNote that each anonymous struct declaration has its own struct spec. This means\nthat two anonymous structs declared in different places will never be equal.\nFor example:\n\n```c\nfunction makestruct() { return struct {}; }\n\nprintln(makestruct() == makestruct());  # true\nprintln(makestruct() == struct {});  # false\n```\n\n### Function\n\nFunctions are closures. This means they can capture variables in the scope where\nthey are defined. As such I think it's (somewhat) reasonable to consider them a\ncomposite data type.\n\nA function is defined like so:\n```js\nfunction my_function(a, b, c) {\n  return a + b * c;\n}\n```\n\nA function declaration can also appear as an expression, in which case it will\nnot be associated with its name in the scope where it's defined. The name allows\nit to call itself recursively.\n\nCalling a function looks like this:\n```js\nmy_function(1, 2, 3)\n```\n\nYou can pass more arguments than the function declares, but not less.\n\nA function can return a single value. A bare `return` statement will return the\nvalue `null`.\n\n## The Userdata Type\n\nUserdata values are opaque and can store anything at the C level. These are only\nuseful for C extensions. Code in the hosted language cannot do anything with the\nvalue other than move it around.\n\nAn example use-case is storing `FILE *` values, as seen in the `fs` standard\nlibrary module.\n\n## Variable Declarations and Scoping Rules\n\n### Syntax\n\nA variable declaration uses the `let` keyword, like this:\n\n```js\nlet a = 123;\nlet b;\nlet c = 3, d;\n```\n\nIf no initializer is provided, the variable is initialized with `null`.\n\nStructs and arrays can be destructured like so:\n\n```js\nlet { a } = struct { a = 123, b = 234 };\nlet [b, c] = [1, 2, 3];\n```\n\nIf a field is destructured that doesn't exist on the struct or if more elements\nare destructured from an array than the array contains, the program will throw.\n\nAny fields/elements which aren't matched on are ignored.\n\n### Scoping Rules\n\nVariables are lexically scoped.\n\nIf a variable exists in the current scope with a given name, references to that\nname will refer to that variable. If not, the compiler will check for the\nvariable in the parent function scopes one by one. If it finds the variable in\na parent scope, that variable will be closed-over by the current function. If\nit is not found, it is assumed to be a global variable.\n\nGlobal variables only exist in the module in which they're defined.\n\nVariables in a function's closure are references to a variable elsewhere. This\nmeans that if multiple functions close over the same variable, they will see the\nupdates made by the others. Those changes would also be visible in the scope\nwhere the variable was originally defined.\n\n## Flow Control\n\nA selection of the usual constructs are available. In all cases the braces\nsurrounding the block are required.\n\n### If Statement\n\nWorks as you would expect:\n```js\nif (true) {\n} else if (false) {\n} else {\n}\n```\n\n### While Loop\n\nWorks as you would expect:\n```js\nwhile (true) {}\n```\n\n### For Loop\n\nThe `for` loop is like a C for loop. The language has no native iteration\nprotocol, but see the \"Iteration\" section below for more on that.\n\n```js\nfor (let a = 0; a \u003c 10; a++) {}\n```\n\nAny of the initializer, condition, or whatever the third part is called can be\nomitted.\n\n## Errors and Error Handling\n\nThere is a pretty rudimentary exception system. It works much like exceptions\nin JavaScript, using try/catch and throw. You can throw any value, and\na try/catch block will catch all errors regardless of type.\n\nIn the event of an error, the VM will unwind the call stack until the error\nstate is recovered or it reaches the top of the stack, at which point the\nprogram will exit. Here's an example:\n\n```js\ntry {\n  this_might_fail();\n  return \"success\";\n} catch (e) {\n  return \"failed\";\n}\n```\n\n## Module System\n\nA module is imported using an `import` statement:\n\n```js\nimport test;\n```\n\nUpon encountering this statement, the compiler will search for a module with\nthat name. If a built-in module exists with that name, it's imported. If not,\nthe compiler will search for a file called (in this case) `test.cb` in each\nof the paths defined in the `CBCVM_PATH` environment variable. This variable\nshould be a colon-separated list of directories. Once it finds a module with\nthat name, any imports of that name will result in the same module being\nimported.\n\nOnce a module is imported, its exports can be accessed by prefixing their name\nwith the module name and a dot, like `test::assert`. There is currently no way\nto alias an imported module, nor is there a way to create individual bindings\nfor its exports while importing.\n\nNote that the module name in an expression like `test::assert` is not resolved\nlike a variable; it is looked up in the list of imported modules directly. This\nmeans that the following will not work:\n\n```js\nimport test;\nlet t = test;\n```\n\n## Garbage Collection\n\nc-bytecode-vm has a tracing (mark-and-sweep) garbage collector.\n\nTo allow C extensions to safely hold onto values without them being collected,\nuse `cb_gc_hold` to add it to a GC root, and `cb_gc_release` to remove it from\nthe root. The next time the GC runs the value may be eligible for collection.\n\n## Intrinsic Functions\n\nHere are some of the intrinsic functions that likely won't be going anywhere, at\nleast for now:\n- `print`: Write a string representation of some values to stdout without a\n  trailing newline.\n- `println`: Write a string representation of some values to stdout _with_ a\n  trailing newline.\n- `tostring`: Get a string representation of a given value.\n- `typeof`: Get the type of a value as a string.\n- `ord`: Convert a character to an integer.\n- `chr`: Convert an integer to a character.\n- `tofloat`: Convert an integer to a double. This should be renamed.\n- `__upvalues`: Returns the values of the current function's closure as an array.\n- `apply`: Passes an array of values to a function as individual arguments.\n- `toint`: Converts a double to an integer, or does what `ord` does to a char.\n- `__gc_collect`: Have the garbage collector run immediately. It has a scary\n  name because it seems like a scary thing to do.\n- `__dis`: Print the disassembly of the given function directly to stdout.\n- `id`: Get the ID of the given value if it has one.\n\n## Standard Library\n\nStandard library documentation lives in the source code. The generated markdown\nfile can be found\n[here](https://github.com/p7g/c-bytecode-vm/blob/master/docs/stdlib.md).\n\nHere are some patterns that have appeared in the standard library:\n\n### Iteration\n\nUsing [traits], the [iter] module defines an iteration protocol. This protocol\ninvolves 2 traits: `Iterable` and `Iterator`.\n\n[traits]: /blob/main/lib/trait.cb\n[iter]: /blob/main/lib/iter.cb\n\nThe `Iterable` trait makes an object iterable (as the name suggests). It has a\nsingle method called `iter(self)` that must return an `Iterator`. Objects of any\ntype that implements this trait can be passed as the iterable argument to any\nfunction in the `iter` module.\n\nAn `Iterator` object is a stateful iterator with a `next(self)` parameter, which\nmust return the next item in the sequence.\n\nHere's an example of both:\n\n```c++\nstruct Range { start, stop, step }\nstruct RangeIterator { range, i }\n\ntrait::impl(iter::Iterable, Range, struct {\n  function iter(self) {\n    return RangeIterator { range = self }\n  }\n});\n\ntrait::impl(iter::Iterator, RangeIterator, struct {\n  function next(self) {\n    if (self.i == null) {\n      self.i = self.range.start;\n    }\n\n    if (self.i \u003e= self.range.end) {\n      return iter::STOP;\n    }\n\n    let { i } = self;\n    self.i += self.range.step;\n    return i;\n  }\n});\n\nfunction range(start, stop=null, step=null) {\n  if (stop == null) {\n    return Range { start = 0, stop = start, step = 1 };\n  } else {\n    if (step == null) {\n      step = 1;\n    }\n    return Range { start = start, stop = stop, step = step };\n  }\n}\n```\n\nIt's common to implement `iter::Iterable` for any type that also implements\n`iter::Iterator`; that way you can use existing iterators as arguments for\n`iter` functions.\n\nOnce you have an object that implements the iteration protocol, you can use it\nlike this:\n```c++\niter::foreach(range(10), function (n) {\n  println(n);\n});\n```\n\nTo \"break\" out of `foreach` just return `iter::STOP` (the same sentinel used to\nsignal when an iterator is exhausted).\n\nModules that declare \"types\" that can be iterated over should export an `iter`\nfunction that returns this kind of iterator.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fp7g%2Fc-bytecode-vm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fp7g%2Fc-bytecode-vm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fp7g%2Fc-bytecode-vm/lists"}