{"id":31714681,"url":"https://github.com/docusealco/rllama","last_synced_at":"2026-01-20T17:59:51.368Z","repository":{"id":318132155,"uuid":"1070075931","full_name":"docusealco/rllama","owner":"docusealco","description":"Ruby FFI bindings for llama.cpp to run open-source LLMs such as GPT-OSS, Qwen 3, Gemma 3, and Llama 3 locally with Ruby.","archived":false,"fork":false,"pushed_at":"2025-10-07T13:37:13.000Z","size":41,"stargazers_count":66,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-12T16:11:31.111Z","etag":null,"topics":["ai","embeddings","ffi","gguf","inference","llamacpp","llm","ruby"],"latest_commit_sha":null,"homepage":"https://www.docuseal.com/blog/run-open-source-llms-locally-with-ruby","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/docusealco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-05T08:02:20.000Z","updated_at":"2025-10-11T22:11:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"4dc01e07-1f59-42fd-899f-25c800e41341","html_url":"https://github.com/docusealco/rllama","commit_stats":null,"previous_names":["docusealco/rllama"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/docusealco/rllama","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docusealco%2Frllama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docusealco%2Frllama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docusealco%2Frllama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docusealco%2Frllama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/docusealco","download_url":"https://codeload.github.com/docusealco/rllama/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docusealco%2Frllama/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280317282,"owners_count":26309998,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-21T02:00:06.614Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","embeddings","ffi","gguf","inference","llamacpp","llm","ruby"],"created_at":"2025-10-09T01:45:00.167Z","updated_at":"2025-10-23T02:43:48.706Z","avatar_url":"https://github.com/docusealco.png","language":"Ruby","funding_links":[],"categories":["Ruby"],"sub_categories":[],"readme":"\u003cimg width=\"336\" height=\"212.0\" alt=\"Logo\" src=\"https://github.com/user-attachments/assets/e27442fb-22d1-44cf-ba3d-f10b24c13652\" /\u003e\n\n# Rllama\n\nRuby bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) to run open-source language models locally. Run models like GPT-OSS, Qwen 3, Gemma 3, Llama 3, and many others directly in your Ruby application code.\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'rllama'\n```\n\nAnd then execute:\n\n```bash\nbundle install\n```\n\nOr install it yourself as:\n\n```bash\ngem install rllama\n```\n\n## CLI Chat\n\nThe `rllama` command-line utility provides an interactive chat interface for conversing with language models. After installing the gem, you can start chatting immediately:\n\n```bash\nrllama\n```\n\nWhen you run `rllama` without arguments, it will display:\n\n- **Downloaded models**: Any models you've already downloaded to `~/.rllama/models/`\n- **Popular models**: A curated list of popular models available for download, including:\n  - Gemma 3 1B\n  - Llama 3.2 3B\n  - Phi-4\n  - Qwen3 30B\n  - GPT-OSS\n\nSimply enter the number of the model you want to use. If you select a model that hasn't been downloaded yet, it will be automatically downloaded from Hugging Face.\n\nYou can also specify a model path or URL directly:\n\n```bash\nrllama path/to/your/model.gguf\n```\n\n```bash\nrllama https://huggingface.co/microsoft/phi-4-gguf/resolve/main/phi-4-Q3_K_S.gguf\n```\n\nOnce the model has loaded, you can start chatting.\n\n## Usage\n\n### Text Generation\n\nGenerate text completions using local language models:\n\n```ruby\nrequire 'rllama'\n\n# Load a model\nmodel = Rllama.load_model('lmstudio-community/gemma-3-1B-it-QAT-GGUF/gemma-3-1B-it-QAT-Q4_0.gguf')\n\n# Generate text\nresult = model.generate('What is the capital of France?')\nputs result.text\n# =\u003e \"The capital of France is Paris.\"\n\n# Access generation statistics\nputs \"Tokens generated: #{result.stats[:tokens_generated]}\"\nputs \"Tokens per second: #{result.stats[:tps]}\"\nputs \"Duration: #{result.stats[:duration]} seconds\"\n\n# Don't forget to close the model when done\nmodel.close\n```\n\n#### Generation parameters\n\nAdjust the generation with parameters:\n\n```ruby\nresult = model.generate(\n  'Write a short poem about Ruby programming',\n  max_tokens: 2024,\n  temperature: 0.8,\n  top_k: 40,\n  top_p: 0.95,\n  min_p: 0.05\n)\n```\n\n#### Streaming generation\n\nStream generated text token-by-token:\n\n```ruby\nmodel.generate('Explain quantum computing') do |token|\n  print token\nend\n```\n\n#### System prompt\n\nInclude system promt to guide model behavior:\n\n```ruby\nresult = model.generate(\n  'What are best practices for Ruby development?',\n  system: 'You are an expert Ruby developer with 10 years of experience.'\n)\n```\n\n#### Messages list\n\nPass multiple messages with roles for more complex interactions:\n\n```ruby\nresult = model.generate([\n  { role: 'system', content: 'You are a helpful assistant.' },\n  { role: 'user', content: 'What is the capital of France?' },\n  { role: 'assistant', content: 'The capital of France is Paris.' },\n  { role: 'user', content: 'What is its population?' }\n])\nputs result.text\n```\n\n### Chat\n\nFor ongoing conversations, use a context object that maintains the conversation history:\n\n```ruby\n# Initialize a chat context\ncontext = model.init_context\n\n# Send messages and maintain conversation history\nresponse1 = context.message('What is the capital of France?')\nputs response1.text\n# =\u003e \"The capital of France is Paris.\"\n\nresponse2 = context.message('What is the population of that city?')\nputs response2.text\n# =\u003e \"Paris has a population of approximately 2.1 million people...\"\n\nresponse3 = context.message('What was my first message?')\nputs response3.text\n# =\u003e \"Your first message was asking about the capital of France.\"\n\n# The context remembers all previous messages in the conversation\n\n# Close context when done\ncontext.close\n```\n\n### Embeddings\n\nGenerate vector embeddings for text using embedding models:\n\n```ruby\nrequire 'rllama'\n\n# Load an embedding model\nmodel = Rllama.load_model('lmstudio-community/embeddinggemma-300m-qat-GGUF/embeddinggemma-300m-qat-Q4_0.gguf')\n\n# Generate embedding for a single text\nembedding = model.embed('Hello, world!')\nputs embedding.length\n# =\u003e 724 (depending on your model)\n\n# Generate embeddings for multiple sentences\nembeddings = model.embed([\n  'roses are red',\n  'violets are blue',\n  'sugar is sweet'\n])\n\nputs embeddings.length\n# =\u003e 3\nputs embeddings[0].length\n# =\u003e 768\n\nmodel.close\n```\n\n#### Vector parameters\n\nBy default, embedding vectors are normalized. You can disable normalization with `normalize: false`:\n\n```ruby\n# Generate unnormalized embeddings\nembedding = model.embed('Sample text', normalize: false)\n```\n\n## Finding Models\n\nYou can download GGUF format models from various sources:\n\n- [Hugging Face](https://huggingface.co/models?library=gguf) - Search for models with \"GGUF\" format\n\n## License\n\nMIT\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/docusealco/rllama.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocusealco%2Frllama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdocusealco%2Frllama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocusealco%2Frllama/lists"}