{"id":15169604,"url":"https://github.com/llukas22/llm-rs-python","last_synced_at":"2025-06-27T03:08:05.564Z","repository":{"id":153587036,"uuid":"629016968","full_name":"LLukas22/llm-rs-python","owner":"LLukas22","description":"Unofficial python bindings for the rust llm library. 🐍❤️🦀","archived":false,"fork":false,"pushed_at":"2023-08-19T14:24:39.000Z","size":493,"stargazers_count":73,"open_issues_count":7,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-10T15:36:19.965Z","etag":null,"topics":["llama","llm","python","rust"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LLukas22.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-17T13:00:57.000Z","updated_at":"2024-09-24T05:01:15.000Z","dependencies_parsed_at":"2024-08-03T07:49:20.910Z","dependency_job_id":"d7b29177-4312-46aa-ba32-55cfc9bb7104","html_url":"https://github.com/LLukas22/llm-rs-python","commit_stats":{"total_commits":109,"total_committers":4,"mean_commits":27.25,"dds":0.1834862385321101,"last_synced_commit":"3bc82ba9254b12fa169145236c491883e0006b69"},"previous_names":["llukas22/llama-rs-python"],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLukas22%2Fllm-rs-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLukas22%2Fllm-rs-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLukas22%2Fllm-rs-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLukas22%2Fllm-rs-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LLukas22","download_url":"https://codeload.github.com/LLukas22/llm-rs-python/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230345782,"owners_count":18211997,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llama","llm","python","rust"],"created_at":"2024-09-27T07:04:05.748Z","updated_at":"2024-12-18T22:07:36.480Z","avatar_url":"https://github.com/LLukas22.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# llm-rs-python: Python Bindings for Rust's llm Library\n\n[![PyPI](https://img.shields.io/pypi/v/llm-rs)](https://pypi.org/project/llm-rs/)\n[![PyPI - License](https://img.shields.io/pypi/l/llm-rs)](https://pypi.org/project/llm-rs/)\n[![Downloads](https://static.pepy.tech/badge/llm-rs)](https://pepy.tech/project/llm-rs)\n\nWelcome to `llm-rs`, an unofficial Python interface for the Rust-based [llm](https://github.com/rustformers/llm) library, made possible through [PyO3](https://github.com/PyO3/pyo3). Our package combines the convenience of Python with the performance of Rust to offer an efficient tool for your machine learning projects. 🐍❤️🦀\n\nWith `llm-rs`, you can operate a variety of Large Language Models (LLMs) including LLama and GPT-NeoX directly on your CPU or GPU. \n\nFor a detailed overview of all the supported architectures, visit the [llm](https://github.com/rustformers/llm) project page. \n\n### Integrations:\n* 🦜️🔗 [LangChain](https://github.com/hwchase17/langchain)\n* 🌾🔱 [Haystack](https://github.com/deepset-ai/haystack)\n\n## Installation\n\nSimply install it via pip: `pip install llm-rs`\n\n\u003cdetails\u003e\n\u003csummary\u003eInstallation with GPU Acceleration Support\u003c/summary\u003e\n\u003cbr\u003e\n\n`llm-rs` incorporates support for various GPU-accelerated backends to facilitate enhanced inference times. To enable GPU-acceleration the `use_gpu` parameter of your `SessionConfig` must be set to `True`. The [llm documentation](https://github.com/rustformers/llm/blob/main/doc/acceleration-support.md#supported-accelerated-models) lists all model architectures, which are currently accelerated. We distribute prebuilt binaries for the following operating systems and graphics APIs:\n\n### MacOS (Using Metal)\nFor MacOS users, the Metal-supported version of `llm-rs` can be easily installed via pip:\n\n`\npip install llm-rs-metal\n`\n\n### Windows/Linux (Using CUDA for Nvidia GPUs)\nDue to the significant file size, CUDA-supported packages cannot be directly uploaded to `pip`. To install them, download the appropriate `*.whl` file from the latest [Release](https://github.com/LLukas22/llm-rs-python/releases/latest) and install it using pip as follows:\n\n`\npip install [wheelname].whl\n`\n\n### Windows/Linux (Using OpenCL for All GPUs)\n\nFor universal GPU support on Windows and Linux, we offer an OpenCL-supported version. It can be installed via pip:\n\n`\npip install llm-rs-opencl\n`\n\u003c/details\u003e\n\n\n## Usage\n### Running local GGML models:\nModels can be loaded via the `AutoModel` interface.\n\n```python \nfrom llm_rs import AutoModel, KnownModels\n\n#load the model\nmodel = AutoModel.from_pretrained(\"path/to/model.bin\",model_type=KnownModels.Llama)\n\n#generate\nprint(model.generate(\"The meaning of life is\"))\n```\n\n### Streaming Text\nText can be yielded from a generator via the `stream` function:\n```python \nfrom llm_rs import AutoModel, KnownModels\n\n#load the model\nmodel = AutoModel.from_pretrained(\"path/to/model.bin\",model_type=KnownModels.Llama)\n\n#generate\nfor token in model.stream(\"The meaning of life is\"):\n    print(token)\n```\n\n### Running GGML models from the Hugging Face Hub\nGGML converted models can be directly downloaded and run from the hub.\n```python \nfrom llm_rs import AutoModel\n\nmodel = AutoModel.from_pretrained(\"rustformers/mpt-7b-ggml\",model_file=\"mpt-7b-q4_0-ggjt.bin\")\n```\nIf there are multiple models in a repo the `model_file` has to be specified.\nIf you want to load repositories which were not created throught this library, you have to specify the `model_type` parameter as the metadata files needed to infer the architecture are missing.\n\n### Running Pytorch Transfomer models from the Hugging Face Hub\n`llm-rs` supports automatic conversion of all supported transformer architectures on the Huggingface Hub. \n\nTo run covnersions additional dependencies are needed which can be installed via `pip install llm-rs[convert]`.\n\nThe models can then be loaded and automatically converted via the `from_pretrained` function.\n\n```python\nfrom llm_rs import AutoModel\n\nmodel = AutoModel.from_pretrained(\"mosaicml/mpt-7b\")\n```\n\n### Convert Huggingface Hub Models\n\nThe following example shows how a [Pythia](https://huggingface.co/EleutherAI/pythia-410m) model can be covnverted, quantized and run.\n\n```python\nfrom llm_rs.convert import AutoConverter\nfrom llm_rs import AutoModel, AutoQuantizer\nimport sys\n\n#define the model which should be converted and an output directory\nexport_directory = \"path/to/directory\" \nbase_model = \"EleutherAI/pythia-410m\"\n\n#convert the model\nconverted_model = AutoConverter.convert(base_model, export_directory)\n\n#quantize the model (this step is optional)\nquantized_model = AutoQuantizer.quantize(converted_model)\n\n#load the quantized model\nmodel = AutoModel.load(quantized_model,verbose=True)\n\n#generate text\ndef callback(text):\n    print(text,end=\"\")\n    sys.stdout.flush()\n\nmodel.generate(\"The meaning of life is\",callback=callback)\n```\n## 🦜️🔗 LangChain Usage\nUtilizing `llm-rs-python` through langchain requires additional dependencies. You can install these using `pip install llm-rs[langchain]`. Once installed, you gain access to the `RustformersLLM` model through the `llm_rs.langchain` module. This particular model offers features for text generation and embeddings.\n\nConsider the example below, demonstrating a straightforward LLMchain implementation with MPT-Instruct:\n\n```python\nfrom llm_rs.langchain import RustformersLLM\nfrom langchain import PromptTemplate\nfrom langchain.chains import LLMChain\nfrom langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n\ntemplate=\"\"\"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n### Instruction:\n{instruction}\n### Response:\nAnswer:\"\"\"\n\nprompt = PromptTemplate(input_variables=[\"instruction\"],template=template,)\n\nllm = RustformersLLM(model_path_or_repo_id=\"rustformers/mpt-7b-ggml\",model_file=\"mpt-7b-instruct-q5_1-ggjt.bin\",callbacks=[StreamingStdOutCallbackHandler()])\n\nchain = LLMChain(llm=llm, prompt=prompt)\n\nchain.run(\"Write a short post congratulating rustformers on their new release of their langchain integration.\")\n```\n\n\n## 🌾🔱 Haystack Usage\nUtilizing `llm-rs-python` through haystack requires additional dependencies. You can install these using `pip install llm-rs[haystack]`. Once installed, you gain access to the `RustformersInvocationLayer` model through the `llm_rs.haystack` module. This particular model offers features for text generation.\n\nConsider the example below, demonstrating a straightforward Haystack-Pipeline implementation with OpenLLama-3B:\n\n```python\nfrom haystack.nodes import PromptNode, PromptModel\nfrom llm_rs.haystack import RustformersInvocationLayer\n\nmodel = PromptModel(\"rustformers/open-llama-ggml\",\n                    max_length=1024,\n                    invocation_layer_class=RustformersInvocationLayer,\n                    model_kwargs={\"model_file\":\"open_llama_3b-q5_1-ggjt.bin\"})\n\npn = PromptNode(\n    model,\n    max_length=1024\n)\n\npn(\"Write me a short story about a lama riding a crab.\",stream=True)\n```\n\n\n## Documentation\n\nFor in-depth information on customizing the loading and generation processes, refer to our detailed [documentation](https://llukas22.github.io/llm-rs-python/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllukas22%2Fllm-rs-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fllukas22%2Fllm-rs-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllukas22%2Fllm-rs-python/lists"}