{"id":19672049,"url":"https://github.com/michelp/hillisp","last_synced_at":"2025-04-29T01:30:40.593Z","repository":{"id":145541019,"uuid":"46208561","full_name":"michelp/hillisp","owner":"michelp","description":"CUDA parallel lisp toy inspired by Connection Machines","archived":false,"fork":false,"pushed_at":"2024-04-04T15:09:32.000Z","size":129,"stargazers_count":18,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-05T12:11:27.313Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michelp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-11-15T07:22:02.000Z","updated_at":"2025-02-21T18:01:25.000Z","dependencies_parsed_at":"2024-04-04T16:39:00.165Z","dependency_job_id":"90296a4b-7e66-44f6-baa7-faf870f6a75d","html_url":"https://github.com/michelp/hillisp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelp%2Fhillisp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelp%2Fhillisp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelp%2Fhillisp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michelp%2Fhillisp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michelp","download_url":"https://codeload.github.com/michelp/hillisp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251415586,"owners_count":21585857,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T17:10:42.084Z","updated_at":"2025-04-29T01:30:40.332Z","avatar_url":"https://github.com/michelp.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# hillisp\n\n[CUDA](https://en.wikipedia.org/wiki/CUDA) parallel lisp inspired by\n[Connection\nMachines](https://en.wikipedia.org/wiki/Connection_Machine).\n\nhillisp CUDA arrays are called \"xectors\" and can be operated on in\n[CUDA\nSIMT](https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads)\nfashion using a parallel lisp syntax.  Inspirations for this syntax\nwere described in [The Connection Machine (link to book on\nAmazon)](http://www.amazon.com/The-Connection-Machine-Artificial-Intelligence/dp/0262580977)\nby [Daniel Hillis](https://en.wikipedia.org/wiki/Danny_Hillis) and the\npaper [Connection Machine Lisp: fine-grained parallel symbolic\nprocessing](http://dl.acm.org/citation.cfm?id=319870) by Hillis and\n[Guy L. Steele, Jr.](https://en.wikipedia.org/wiki/Guy_L._Steele,_Jr.)\n\n## install\n\nJust type 'make'.  You will need a GPU with compute capability of 3.0\nor better and CUDA 7.0+ installed.  The interpreter is the 'lisp'\nbinary.  Install the 'rlwrap' program to get readline support with the\n'hillisp' script.\n\n## lisp\n\nhillisp is an extremely tiny Lisp implementation written in CUDA C++.\nIts primary purpose is to drive the GPU as efficiently as possible.\nThe language itself is not designed to be especially performant or\nfeatureful, as any computational density your program needs should be\ndone in-kernel on the CUDA device and should be appropriate for CUDA\nworkloads.\n\nTo that end, the interpreter is very simple, has few \"general purpose\"\nprogramming features, and is designed to undertake its interpretation\nduties (ie, scheduling, garbage collection) asynchronously while the\nGPU is running CUDA kernels.  In this way it attempts to be as \"zero\ntime\" as possible.\n\nhillisp is not a general purpose programming language, but a language\nfor exploring parallel algorithms using the high-level language\ndeveloped for Connection Machines on extremely powerful, modern GPU\nhardware.\n\n## xectors\n\nA xector is constructed using bracket syntax.  Currently only int64_t\nand double xectors are supported.  Lisp functions operate on\ntraditional arguments like numbers, but can also operate on xectors\nentirely in the GPU.  For example, the '*' function can multiply two\nintegers together (this is done on the CUDA \"host\", the CPU) or it can\nmultiply two xectors together (this is done on the CUDA \"device\", the\nGPU):\n\n    ? (* 3 4)  ; mulitply on host\n    : 12\n\n    ? (* [1 2 3] [4 5 6]) ; parallel mulitply on device\n    : [4 10 18]\n\nLarge arrays can be created and intialized entirely on-device:\n\n    ? (+ (fill 3 1000000) (fill 4 1000000))\n    : [7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 ... 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7]\n    ?\n\nThe 'fill' function takes a value and a size and creates a xector of\nthe specified size and fills it, in parallel, with that value.  Thus,\nthe second expression above creates two xectors of one million\nintegers each, fills them with the values 3 and 7, respectively, then\nadds them together, yielding a xector containing one million \"10\"\nvalues.\n\nThis is conceptually very similar to the following numpy code:\n\n    \u003e\u003e\u003e 3 + 4\n    7\n    \u003e\u003e\u003e a = np.empty(1000000)\n    \u003e\u003e\u003e b = np.empty(1000000)\n    \u003e\u003e\u003e a.fill(3)\n    \u003e\u003e\u003e b.fill(4)\n    \u003e\u003e\u003e a + b\n    array([ 7.,  7.,  7., ...,  7.,  7.,  7.])\n    \u003e\u003e\u003e\n\n## CUDA kernels\n\nInternally, '+' and 'fill' cause CUDA kernels to be queued for launch\nasynchronously into a CUDA stream.  First two 'fill' kernels, then a\n'+' kernel.  Since the 'fill' kernels don't depend on each other, they\ncan be dispatched in parallel.  While the first two kernels complete\nthe interpreter does garbage collection, and queues up the next kernel\nto run, the '+' kernel which waits until the 'fill' kernels complete\nbefore adding the two xectors together. This finally yields a third\nxector containing the result of one million integers set to value '7'.\n\nXectors are allocated using CUDA [Unified\nMemory](http://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/).\nPointers to xector data can be accessed by both the host and the\ndevice.  CUDA manages the unified memory so that only the minimal\namount of copying to and from the device to the host is required.\n\n## TODO\n\n  - N-dimensional xectors.\n\n  - Unicode strings.\n\n  - Multi-device support.\n\n  - Currently only 64 bit integer, double, and double complex xectors\n    are supported, but code is in place to support all the main CUDA\n    numeric types and nested xectors.\n\n  - Data-loading functions to fill xectors from data in files.\n\n  - Implement loadable modules and wrap libraries like cub, cublas,\n    cufft, cusparse, etc.  Make CUDA library reuse as trivial as\n    possible.\n\n  - Inlining CUDA C kernels for hand-tuning performance.\n\n  - \"Xappings\": cuda distributed hash tables that can be indexed by a\n    key as well as position.\n\n  - Native graph types.\n\n  - Compile S-expressions directly to CUDA PTX assembly.\n\n## Alpha, Dot, and Beta\n\nThe book and paper cited above expressed parallelism using a Lisp\nmacro-like parallel expression syntax with three operators, alpha,\ndot, and beta.  Implementing these operators in hillisp is certainly a\ngoal, but I'm not positive it can be done efficiently yet without a\nnew feature in CUDA called dynamic parallelism, which requires a\ngreater compute capability than any devices I have available to me at\nthe moment.  Feel free to send me a dual-maxwell system and I'll get\nit done. :)\n\n\n## Reference\n\nCheck out [core tests](test/test.lsp) and [xector tests](test/xector.lsp).\n\n## Core\n\n### (is x y)\n\nAre 'x' and 'y' the same symbol?\n\n### (isinstance x y)\n\nIs 'x' an instance of type 'y'?\n\n### (type x)\n\nReturn the type of 'x'.\n\n### (quote x)\n\nReturn 'x' without evaluating it.\n\n### (eval x)\n\nEval list the list 'x'.\n\n### (apply x args)\n\nApply 'args' to the lambda expression 'x-.\n\n### (assert x)\n\nAssert 'x' is true.\n\n### (asserteq x y)\n\nAssert 'x' equals 'y' by comparison ('==').\n\n### (assertall x)\n\nAssert all elements in 'x' are true.\n\n### (assertany x)\n\nAssert at least one element in 'x' is true.\n\n\n##List\n\n### (car x)\n\nReturn the first element of the list 'x'.\n\n### (cdr x)\n\nReturn the rest of the elements in 'x'.\n\n### (cons x y)\n\nReturn a pair of 'x' and 'y'.\n\n### (list ...)\n\nCons all arguments into a list.\n\n\n## IO\n\n### (print x)\n\nPrint 'x'.\n\n### (println x)\n\nPrint 'x' on its own line.\n\n### (printsp x)\n\nPrint 'x' then a space.\n\n\n## Math\n\n### (+ x y)\n\nAdd 'x' and 'y', may be numbers or xectors.\n\n### (+= x y)\n\nIn-place add 'x' and 'y' storing the result in 'x', may be numbers or\nxectors.\n\n### (- x y)\n\nSubtract 'y' from 'x', may be numbers or xectors.\n\n### (-= x y)\n\nIn-place subtract 'y' from 'x' storing the result in 'x', may be\nintegers or xectors.\n\n### (* x y)\n\nMultiply 'x' and 'y', may be numbers or xectors.\n\n### (*= x y)\n\nIn-place multiple 'y' and 'x' storing the result in 'x', may be\nintegers or xectors.\n\n### (/ x y)\n\nDivide 'x' by 'y', may be numbers or xectors.\n\n### (/= x y)\n\nIn-place divide 'x' by 'y' storing the result in 'x', may be\nnumbers or xectors.\n\n### (fma x y z)\n\nFused-multiply add 'x * y + z', may be numbers or xectors.\n\n### (fma= x y z)\n\nFused-multiply add 'x * y + z' storing result in 'x', may be numbers\nor xectors.\n\n\n## Comparison\n\n### (== x y)\n\nCompare 'x' and 'y' for equality.\n\n### !=\n\n### \u003e\n\n### \u003c\n\n### min\n\n### max\n\n### sum\n\n\n## Logic\n\n### not\n\n### and\n\n### all\n\n### any\n\n### or\n\n\n# Names\n\n### (set name value)\n\nBinds 'value' to 'name' in the current scope.\n\n### (len l)\n\nReturns the length of the list or xector, otherwise nil.\n\n### (range start stop step)\n\nCons a list of integers in the given range.\n\n### (def name (args) (body ...))\n\nBind the function taking 'args', defined by 'body', to 'name'.\n\n\n## Flow Control\n\n### (if cond (exp ...) [(exp ...)])\n\n### (while cond (exp ...))\n\n### (do start end (exp ...))\n\n### (for i start end (exp ...))\n\n### (collect value)\n\n\n## Xectors\n\n### (fill value size)\n\nCreate a new xector of 'type' with size 'size' and fill each element\nwith 'value'.  The type of 'value' determines the type of the\nxector. 'size' can be an integer (one-dimensional xector) or list of\ndimensions, ie '(3 3)' creates a 3x3 two-dimensional xector.\n\n### (empty type size)\n\nLike 'fill', but returns an uninitialized xector where no value is\nprovided or filled into the new xector. 'type' determines the type of\nthe xector.\n\n### (copy x y)\n\nCopy the contents of xector 'x' into xector 'y'.  The two xectors must\nhave the same shape.\n\n### (slice x shape)\n\nSlice the xector 'x' to the specific 'shape'.  'shape' must match the\nleading dimensions of 'x'.\n\n### (swap x y)\n\nSwap the contents of 'x' and 'y'.\n\n## Misc\n\n### (dir)\n\nShow all the bound names.\n\n### (time)\n\nReturn the time in microseconds since the epoch.\n\n### (gc)\n\nTrigger garbage collection.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelp%2Fhillisp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichelp%2Fhillisp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichelp%2Fhillisp/lists"}