{"id":21024867,"url":"https://github.com/rzubek/simlang","last_synced_at":"2025-10-12T21:33:43.633Z","repository":{"id":201842109,"uuid":"707403875","full_name":"rzubek/SimLang","owner":"rzubek","description":"SimLang is a serialization language by SomaSim. We use it to serialize all configuration files and save game files in all our simulation and strategy games.","archived":false,"fork":false,"pushed_at":"2023-10-20T06:51:05.000Z","size":48,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-26T06:23:09.550Z","etag":null,"topics":["csharp","dotnet","json","serialization","serialization-library","unity","unity3d","xml","yaml"],"latest_commit_sha":null,"homepage":"https://somasim.com","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rzubek.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-10-19T20:33:31.000Z","updated_at":"2024-03-05T13:56:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"aa13e893-2c94-4101-83c3-674e462d70b0","html_url":"https://github.com/rzubek/SimLang","commit_stats":null,"previous_names":["rzubek/simlang"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rzubek%2FSimLang","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rzubek%2FSimLang/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rzubek%2FSimLang/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rzubek%2FSimLang/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rzubek","download_url":"https://codeload.github.com/rzubek/SimLang/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248560808,"owners_count":21124724,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csharp","dotnet","json","serialization","serialization-library","unity","unity3d","xml","yaml"],"created_at":"2024-11-19T11:29:02.345Z","updated_at":"2025-10-12T21:33:38.601Z","avatar_url":"https://github.com/rzubek.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SimLang - sim file markup language\n\n**SimLang** is a serialization language by SomaSim. We use it to serialize all \nconfiguration files and save game files in all our simulation and strategy games.\n\n**SimLang** is intentionally similar to JSON, but easier for humans to read\nand easier to write out by hand. Specifically:\n\n- It's optimized to be human-readable and -writable\n    - Minimal punctuation and reduced visual noise\n    - As similar to Lisp s-expressions as possible\n- Easy de/serialization into C# class instances\n- Easy to read \"like English\" if the classes are set up correctly\n\nWe've been using it in production since 2014.\n\n\n## Example \n\nVery quick example of **SimLang** syntax:\n\n```\n{    \n    ;; this is a sample game character\n    ;;\n    #type   \"Person\"\n    id      person-123\n    info    { name \"Bob the Wizard\" age 100 }\n    stats   { wisdom 20 strength 3 dexterity 12 }\n    inventory [\n        { id item-234 name \"Magic Staff\" magical #true }\n        { id item-345 name Bow }\n        { id item-456 name Quiver contents [\n            { id item-567 name Arrows quantity 3 }\n        ] }\n        { id item-678 name \"Pointy Hat\" }\n    ]\n    inuse {\n        head item-678\n        righthand item-234\n    }\n}\n```\n\nSome highlights:\n\n- Compared to JSON, there are no extraneous double-quotes, colons, or commas.\n  A JSON object like `{ \"name\": \"Bob\", \"age\": 100 }` is represented \n  more compactly as `{ name Bob age 100 }`. \n- Strings are not double quoted if not necessary, if they only contain \n  ASCII characters and dashes/underscores. So, `Bob` instead of `\"Bob\"`.\n  This greatly improves human readability - and writeability.\n- Comments are supported\n\nThe **SimLang** library can then deserialize this into a plain C# class instance (POCO),\nor serialize POCOs back out to text files.\n\n***Note:*** SimLang files have the extension `.sim` and\nwe refer to them in this document as _sim files_.\n\n\n\n## Design\n\n**SimLang** design goals and constraints, in no particular order:\n\n- **Human friendly**\n  - Removes visual noise, such as unnecessary quotation of short strings\n  - Removes spurious delimiters between collection elements which \n    are prone to merge conflicts, typos, etc.\n  - Allows for comments in the data file\n  - Whitespace is used as a token separator, but otherwise it's not meaningful \n    (unlike in some other markup languages - looking at you, yaml and make!)\n  - Key/value collections are alpha-sorted by default to help with merging\n    (but this can be turned off if desired, e.g. for very large data)\n\n- **Clear and simple semantics**\n  - Supports similar value types as JSON\n    - primitives (numbers, strings, booleans)\n    - lists of values\n    - maps from strings to values\n  - Special syntax for null values which are supported\n  - Special syntax for boolean values so that can't be mistaken for strings\n  - Allows for explicit type annotation when needed\n\n- **Fast**\n  - Parsing a sim file into data structures must be possible with a single pass over the character stream,\n    without backtracking or an overly complex state machine\n  - Value serialized into a sim file can have only one canonical representation\n\n- **Extensible**\n  - Users can specify custom de/serializers for chosen data types\n  - Users can intercept the data stream in the middle of serialization, e.g.\n    to modify or to send to a different printing backend\n  - Users can add custom _reader macros_, e.g. to send text to an external\n    language interpreter before it's parsed (we use it to embed Lisp expressions\n    that get evaluated at parse time)\n\n**SimLang** design borrows a lot from S-expressions in Scheme and Lisp, but presents a\nmore modern syntax that has special forms for maps vs arrays.\n\n\n### Why?\n\nOur games are strongly data-driven. Game configuration gets loaded up from data files,\nwhich are created and maintained by human designers - entirely by hand. \n\nWe used JSON in the past, and it worked... fine. Not great. JSON is simple and \nubiquitous, which is good. But some of the rough edges make it particularly \nannoying for human writers who have to write it by hand: things like having to \ndouble-quote even the shortest strings everywhere, having to add commas and colons \neverywhere and make sure there's the right number of them, lack of comments,\nor merge problems caused by stray commas in lists of objects. \n\n**SimLang** is a data serialization format with a syntax designed to be human-friendly, \nwhile maintaining the simplicity of JSON. \n\nWe've now used it since 2014, and shipped several commercial titles with it.\n\n\n# Detailed Examples\n\nSome common primitive value types - here's how they look serialized:\n```\n    Strings: \n        \"I'm a string\", thisIsAStringToo, another_string, also-a-string\n    Numbers: \n        42, -1, +1, 3.14159\n    Booleans: \n        #true, #false\n    Null: \n        #null\n```\nThis list contains four primitive values, and another embedded list:\n```\n    [ some-string 42 #true #null \n        [ \"I'm inside a second list\" ] ]\n```\nThis dictionary maps from string names, to lists that contain numbers:\n```\n    { \n      \"John Smith\" [ 312 555 1212 ]\n      \"Jane Doe\" [ 415 555 1212 ]\n    }\n```\nStrings including escaped characters, or using single-quotes for convenience:\n```\n    \"This is how you add a quote in a string: \\\"Hello\\\" \"\n    \"To escape the escape character, double it: \\\\\"\n    \"This is the first line \\n ... and this is the second line\"\n    'Single-quote syntax disables escaping, eg. \"c:\\windows\\system\"'\n```\nWhitespace is used to separate values, but otherwise not significant:\n```\n    [this is a list of    seven     strings]\n    {    name \"Dennis\" age 37}\n```\nComments start with ; and go until EOL\n```\n    [here is a  ; this is comment that goes to the end of the line\n     list of strings]\n```\nLonger example of a made-up data object\n```\n    {\n        ;; this is a sample sprite definition\n        identity { template npc }\n        navigation {\n            animations {\n                directions [ left right up down ]\n                sprites [ \"walk-l\" \"walk-r\" \"stand-l\" \"stand-r\" ]\n            }\n            path '\\images\\npc'\n        }\n        placement {\n            size { x 1 y 1 }\n            snap #false\n            center #true\n        }\n    }\n```\n\n# Technical details\n\n## Syntax Definition\n\nThis is a BNF-ish description of the syntax:\n\n```\n    Compound values:\n\n    \u003cdocument\u003e        :: \u003cignored\u003e* \u003cvalue\u003e \u003cignored\u003e*\n    \u003cvalue\u003e           :: \u003cprimitive\u003e | \u003clist\u003e | \u003cdictionary\u003e | \u003cmacro\u003e\n    \u003clist\u003e            :: \"[\" \u003cignored\u003e* \u003cvalue\u003e? (\u003cignored\u003e+ \u003cvalue\u003e)* \u003cignored\u003e* \"]\"\n    \u003cdictionary\u003e      :: \"{\" \u003cignored\u003e* \u003centry\u003e? (\u003cignored\u003e+ \u003centry\u003e)* \u003cignored\u003e* \"}\"\n    \u003centry\u003e           :: \u003cvalue\u003e \u003cignored\u003e+ \u003cvalue\u003e\n    \u003cignored\u003e         :: \u003ccomment\u003e | \u003cwhitespace\u003e\n    \u003cmacro\u003e           :: \"(\" \u003cmacro-char\u003e+ \")\"\n    \u003ccomment\u003e         :: \";\" and all subsequent characters up to \u003cnewline\u003e\n\n    Primitive values:\n\n    \u003cprimitive\u003e       :: \u003cnull\u003e | \u003cboolean\u003e | \u003cnumber\u003e | \u003cstring\u003e\n    \u003cnull\u003e            :: \"#null\"\n    \u003cboolean\u003e         :: \"#true\" | \"#false\"\n    \u003cnumber\u003e          :: (\"+\" | \"-\")? \u003cdigit\u003e+ (\".\" \u003cdigit\u003e+)?\n    \u003cstring\u003e          :: \u003cverbatim-string\u003e | \u003cescaped-string\u003e | \u003csimple-string\u003e\n    \u003cverbatim-string\u003e :: \"\\'\" \u003cverbatim-char\u003e* \"\\'\"\n    \u003cescaped-string\u003e  :: \"\\\"\" (\u003cescaped-char\u003e | \u003cbasic-char\u003e)* \"\\\"\"\n    \u003csimple-string\u003e   :: \u003csimple-char\u003e (\u003csimple-char\u003e | \u003cdigit\u003e | \"-\" | \".\" )*\n\n    Tokenization elements:\n\n    \u003cdigit\u003e           :: \"0\" | .. | \"9\"\n    \u003csimple-char\u003e     :: \"a\" | .. | \"z\" | \"A\" .. \"Z\" | \"_\"\n    \u003cverbatim-char\u003e   :: utf8 character except \"\\'\" or control character\n    \u003cescaped-char\u003e    :: \"\\n\" | \"\\r\"\n    \u003cbasic-char\u003e      :: utf8 character except \"\\\\\", \"\\\"\", or control character\n    \u003cmacro-char\u003e      :: utf8 character except \"(\"\n    \u003cwhitespace\u003e      :: white space characters and all control codes, i.e. 0x00 - 0x20\n    \u003cnot-newline\u003e     :: utf8 character except for newline characters\n```\n\n## Serialization Example \n\n**Note: there's a full serialization example under [Sim Unit Tests/SerializerDemo.cs](Sim%20Unit%20Tests/SerializerDemo.cs).**\n\nExample **SimLang** serializer output for the sample game character from the introduction:\n\n```\n{ \n  id person-123\n  info { age 100 name \"Bob the Wizard\" }\n  inuse { head item-678 righthand item-234 }\n  inventory [ \n    { id item-234 magical #true name \"Magic Staff\" }\n    { id item-345 name Bow }\n    { \n      contents [ { id item-567 name Arrows quantity 3 } ]\n      id item-456\n      name Quiver\n    }\n    { id item-678 name \"Pointy Hat\" }\n  ]\n  stats { dexterity 12 strength 3 wisdom 30 }\n}\n```\n\nThis example will naturally serialize from / deserialize into the following classes:\n\n```csharp\npublic class Person {\n    public string id;\n    public Info info;\n    public Dictionary\u003cStat, int\u003e stats;\n    public List\u003cEntry\u003e inventory;\n    public Dictionary\u003cstring, string\u003e inuse;\n\n    public class Info { \n        public string name; \n        public int age; \n    } \n\n    public enum Stat {\n        Wisdom, Strength, Dexterity\n    }\n\n    public class Entry { \n        public string id;\n        public string name; \n        public int quantity = 1; \n        public bool magical = false;\n        public List\u003cEntry\u003e contents;\n    }\n}\n```\n\nWhat this example shows:\n- Most strings are serialized without double quotes, unless they contain spaces or other special characters\n- Dictionaries and classes are serialized using curly braces, and they don't use colons or commas, they're separated with whitespace\n- Lists and arrays are serialized using square braces, and don't use colons, they're separated with whitespace\n- Enums are supported as either names or values\n- All classes and data structures can be arbitrarily nested, as long as they're trees (so no cycles)\n- One-line comments are supported, they start after the first unescaped \";\" token \n\n\n### C# API\n\nSerialization:\n\n```csharp\n   var Bob = new Person { ......... };\n   var simfile = SimFile.Serialize(Bob);\n```\n\nDeserialization:\n\n```csharp\n    var newBob = SimFile.Deserialize\u003cPerson\u003e(simfile);\n\n    // is it the same Bob? yes, yes it is\n    Assert.IsTrue(DeepCompare.DeepEquals(Bob, newBob));\n```\n\n\n## Comparison with JSON\n\nHere is the verbatim output as SimLang file (note that keys are alphabetically sorted by default):\n\n```\n{ \n  id person-123\n  info { age 100 name \"Bob the Wizard\" }\n  inuse { head item-678 righthand item-234 }\n  inventory [ \n    { id item-234 magical #true name \"Magic Staff\" }\n    { id item-345 name Bow }\n    { \n      contents [ { id item-567 name Arrows quantity 3 } ]\n      id item-456\n      name Quiver\n    }\n    { id item-678 name \"Pointy Hat\" }\n  ]\n  stats { dexterity 12 strength 3 wisdom 30 }\n}\n```\n \nCompare to output as JSON:\n\n```\n{\n    \"id\": \"person-123\",\n    \"info\": {\n        \"name\": \"Bob the Wizard\",\n        \"age\": 100\n    },\n    \"stats\": {\n        \"Wisdom\": 30,\n        \"Strength\": 3,\n        \"Dexterity\": 12\n    },\n    \"inventory\": [\n    {\n        \"id\": \"item-234\",\n        \"name\": \"Magic Staff\",\n        \"quantity\": 1,\n        \"magical\": true\n    },\n    {\n        \"id\": \"item-345\",\n        \"name\": \"Bow\",\n        \"quantity\": 1\n    },\n    {\n        \"id\": \"item-456\",\n        \"name\": \"Quiver\",\n        \"quantity\": 1,\n        \"contents\": [\n        {\n            \"id\": \"item-567\",\n            \"name\": \"Arrows\",\n            \"quantity\": 3\n        }\n        ]\n    },\n    {\n        \"id\": \"item-678\",\n        \"name\": \"Pointy Hat\",\n        \"quantity\": 1\n    }\n    ],\n    \"inuse\": {\n        \"head\": \"item-678\",\n        \"righthand\": \"item-234\"\n    }\n}\n```\n\n## Serialization Details\n\n### Two-stage serialization / deserialization\n\nAll de/serialization happens in two stages, going through an intermediate representation (IR).\n\nFor example, serialization:\n1. Serialize strongly typed instance into IR\n2. Print IR as text\n\nSimilarly, deserialization:\n1. Parse text into IR\n2. Deserialize IR into a strongly typed class instance\n\nOr schematically:\n\n```\n   Plain C# object  --(serialize)---\u003e   IR   --(print)-\u003e   Sim file\n\n   Plain C# object  \u003c-(deserialize)--   IR   \u003c-(parse)--   Sim file\n```\n\nThe IR is an untyped collection of primitives. It allows only the following types: \n**String, U/Int64, Double, ArrayList, Hashtable.** The primitive types are the\nwidest available for each category, and data structures are untyped.\n\nOnce the string is parsed into those types, the serializer's job is to\nconvert hashtables to class instances (or dictionaries), untyped array lists to \nstrongly typed collections, and wide primitives into appropriate specific ones.\n\nThis has several benefits:\n- The job of tokenizing/parsing/printing is different from the job of reflection-based serialization,\n  so this allows for much cleaner implementation\n- This lets us swap out front-ends, and use e.g. JSON printer instead of a **SimLang** printer if desired\n  (which in fact we did on one project)\n\nOn the other hand, this means the serializer is not a streaming serializer, meaning\nthat it has to allocate the entire parse tree in memory.\nHowever, nothing about the language design prevents a streaming serializer implementation in the future.\n\n\n### Custom serialization handlers\n\nTo extend the serializer to process custom types, you can add custom serializer/deserializer \nfunctions under `SerializationSettings.CustomSerializers`/`Deserializers`.\n\nThese will be called whenever accessing an object of that type.\n\n\n### Type information\n\nAs you can see above, type information is inferred from class member definition,\nand if there is no ambiguity, class name will not be serialized out, or needed during deserialization.\n\nHowever in some cases you need type information, for example when dealing with a generic list\nwhere the generic type is an interface, or an ArrayList which is completely untyped.\n\nIn this case, the serializer will write out type name, e.g. `{ #type \"Person\" ... }`.\nNamespace information may be included or omitted, based on adding or removing\nthat namespace from `SerializationSettings.ImplicitNamespaces`.\n\n### Reader macros\n\nText inside raw parentheses, e.g. `{ strength (* 2 base-strength) }` is a reader macro\nand during the _parsing_ step it will be sent over to the registered macro processor.\nThe processor should then transform this expression into a new value, which will\nreplace the old one and parsing will resume.\n\nIf a macro processor has not been registered, a macro will result in a syntax error  \nsince expressions of the form `( ... )` are not part of the grammar.\n\n\n\n## Limitations\n\n### Limitations around public / protected / private access modifiers\n\nAs a design principle, only public information gets serialized.\nFor classes, this means fully-public fields and properties.\n\nSpecifically the following **do** get serialized:\n- All public, writable class fields\n- All properties which have a public setter and a public getter\n\nWhile the following **do not** get serialized:\n- Private, protected class fields\n- Readonly fields\n- Properties without a public getter\n- Properties without a public setter\n\n\n### Limitations around collections\n\nIn the current implementation, the following collections are supported by the serializer:\n\n```\nNon-generic:\n- T[] array (e.g. int[], MyClass[])\n- ArrayList\n- Hashtable\n\nGeneric:\n- List\u003cT\u003e\n- Dictionary\u003cK,V\u003e\n- HashSet\u003cT\u003e\n(Including recursively, e.g. Dictionary\u003cK, List\u003cT\u003e\u003e)\n```\n\nFor generic types, the serializer _should_ work with all `ICollection\u003cT\u003e` types, \nbut hasn't been extensively tested because there's a lot of them. :)\n\nWhile other collection types may be added in the future, users can also use\nthe custom serialization functionality to provide their own in the meantime.\n\n### Other limitations\n\nSome known issues:\n- Anonymous tuples fail during serialization, e.g. `List\u003c(string, string)\u003e`.\n  It is recommended to convert those into named tuple structs first.\n- Circular references are neither supported nor detected.\n  Attempts to serialize data structures with circular references\n  will result in stack overflow.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frzubek%2Fsimlang","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frzubek%2Fsimlang","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frzubek%2Fsimlang/lists"}