{"id":21339044,"url":"https://github.com/capsadmin/luajit-llama3","last_synced_at":"2025-04-15T02:53:19.840Z","repository":{"id":241598887,"uuid":"807198510","full_name":"CapsAdmin/luajit-llama3","owner":"CapsAdmin","description":"llama3 inference in pure luajit ffi","archived":false,"fork":false,"pushed_at":"2024-12-27T01:37:15.000Z","size":245,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-15T02:53:14.987Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CapsAdmin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-28T16:38:13.000Z","updated_at":"2025-03-17T05:06:57.000Z","dependencies_parsed_at":"2024-05-29T07:59:48.445Z","dependency_job_id":null,"html_url":"https://github.com/CapsAdmin/luajit-llama3","commit_stats":null,"previous_names":["capsadmin/luajit-llama3"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CapsAdmin%2Fluajit-llama3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CapsAdmin%2Fluajit-llama3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CapsAdmin%2Fluajit-llama3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CapsAdmin%2Fluajit-llama3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CapsAdmin","download_url":"https://codeload.github.com/CapsAdmin/luajit-llama3/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248997088,"owners_count":21195797,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-22T00:41:55.257Z","updated_at":"2025-04-15T02:53:19.823Z","avatar_url":"https://github.com/CapsAdmin.png","language":"Lua","funding_links":[],"categories":[],"sub_categories":[],"readme":"Llama3 inference with luajit using the Q4_0 model variant. \n\nIt can use cuda, cpu threads on windows and linux or plain luajit (single thread) to do inference, though the luajit variant is painfully slow. Most of the time is being spent in Tensor.MatrixVectorMultiply\n\nThe cuda version uses a kernel to do the multiplication while the threaded version just spreads the calculation accross multiple lua states in threads. Otherwise all calculations are done in luajit.\n\nIt would be cool to make the pure luajit version faster, but I'm not really sure how. Using simd can speed it up quite a bit, but this is not available directly in LuaJIT ([though it's on the roadmap](https://github.com/LuaJIT/LuaJIT/pull/116)) so the only option is to compile specialized C code to load with ffi.\n\n\n```\nluajit llama.lua gpu \"Meta-Llama-3-8B-Instruct-Q4_0.gguf\" \"write a luajit haiku\"\ncuda driver version: 12.04\nusing device: NVIDIA GeForce RTX 4090\nreading gguf metadata took 0.10612 seconds\nreading gguf tensors took 2.02863 seconds\nuploading tensors to gpu took 0.50892 seconds\n4.21gb tensors allocated on GPU\n3.35 / 23.64 gb vram in use\n4.33gb tensors allocated on CPU\n\n\n\n\n\u003c|start_header_id|\u003euser\u003c|end_header_id|\u003e\nwrite a luajit haiku\u003c|eot_id|\u003e\u003c|start_header_id|\u003eassistant\u003c|end_header_id|\u003e\nA haiku in Lua!\n\nFuzzy math whispers\nGlowing pixels unfold\nCode's gentle hum\u003c|eot_id|\u003e\n\n\n\n3.36 / 23.64 gb vram in use\ntoken count: 38\nelapsed: 1.86s\n20.40 tokens/s\n```\n\nI mostly used https://github.com/mukel/llama3.java as source reference. You can find the instructions on how to download \"Meta-Llama-3-8B-Instruct-Q4_0.gguf\" in there.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcapsadmin%2Fluajit-llama3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcapsadmin%2Fluajit-llama3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcapsadmin%2Fluajit-llama3/lists"}