Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tinybiggames/infero
An easy to use, high performant CUDA powered LLM inference library.
https://github.com/tinybiggames/infero
cuda llamacpp llm-inference win64 windows-10 windows-11
Last synced: 3 months ago
JSON representation
An easy to use, high performant CUDA powered LLM inference library.
- Host: GitHub
- URL: https://github.com/tinybiggames/infero
- Owner: tinyBigGAMES
- License: bsd-3-clause
- Created: 2024-06-05T21:15:14.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-06T20:04:20.000Z (7 months ago)
- Last Synced: 2024-09-22T19:02:10.746Z (3 months ago)
- Topics: cuda, llamacpp, llm-inference, win64, windows-10, windows-11
- Language: Pascal
- Homepage:
- Size: 2.24 MB
- Stars: 12
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
![Infero](media/Infero.jpg)
[![Chat on Discord](https://img.shields.io/discord/754884471324672040.svg?logo=discord)](https://discord.gg/tPWjMwK) [![Twitter Follow](https://img.shields.io/twitter/follow/tinyBigGAMES?style=social)](https://twitter.com/tinyBigGAMES)
# Infero
### Overview
A streamlined and user-friendly library designed for performing local LLM inference directly through your preferred programming language. This library efficiently loads LLMs in [GGUF format](https://huggingface.co/docs/hub/gguf) into CPU or GPU memory, utilizing a [CUDA backend](https://blogs.nvidia.com/blog/what-is-cuda-2/) for enhanced processing speed.### Installation
- Download the [Infero](https://github.com/tinyBigGAMES/Infero/archive/refs/heads/main.zip) repo.
- Download the Infero [Runtime](https://github.com/tinyBigGAMES/Infero/releases/tag/v1.0.0) dependencies, `CUDA` and `llama`. These DLLs must be present on your target device for Infero to function properly. Please ensure they are placed in the same directory as your Infero executable file.
- Acquire a GGUF model. All vetted models compatible with Infero can be downloaded from our Hugging Face account.
- The application utilizes CUDA for enhanced performance on supported [GPUs](docs/gpu.md). Ensure the model size does not exceed the available system resources, considering the requisite memory.
- Consult the `installdir\examples` directory for demonstrations on integrating **Infero** with your programming language.
- Include the following DLLs in your project distribution: `CUDA runtime`, `llama runtime`, and `Infero.dll`.
- Infero API supports integration across programming languages that accommodate Win64 and Unicode, with out-of-the-box support for Pascal and C/C++.
- Ship-ready DLLs are included in the repository; however, if there is a need to rebuild the `Infero.dll`, Delphi 12.1 is required.
- This project is developed using RAD Studio 12.1, on Windows 11, powered by an Intel Core i5-12400F at 2500 MHz with 6 cores (12 logical), equipped with 36GB RAM and an NVIDIA RTX 3060 GPU with 12GB VRAM.- We encourage testing and welcome pull requests.
- If you find this project beneficial, please consider starring the repository, sponsoring, or promoting it. Your support is invaluable and highly appreciated.
### Examples
Pascal example:
```Delphi
uses
SysUtils,
Infero;begin
// init config
InitConfig('C:\LLM\gguf', -1);
// define model
DefineModel('phi-3-mini-4k-instruct.Q4_K_M.gguf',
'phi-3-mini-4k-instruct.Q4_K_M', 4000,
'<|{role}|>{content}<|end|>', '<|assistant|>');
// add messages
AddMessage(ROLE_SYSTEM, 'You are a helpful AI assistant.');
AddMessage(ROLE_USER, 'What is AI?');
// load model
if not LoadModel('phi-3-mini-4k-instruct.Q4_K_M') then Exit;// run inference
if RunInference('phi-3-mini-4k-instruct.Q4_K_M', 1024) then
begin
// success
end
else
begin
// error
end;
// unload mode
UnloadModel();
end.
```
C/CPP Example
```CPP
#includeint main()
{
// init config
InitConfig('C:/LLM/gguf', -1);
// define model
DefineModel(L"phi-3-mini-4k-instruct.Q4_K_M.gguf",
L"phi-3-mini-4k-instruct.Q4_K_M", 4000,
L"<|{role}|>{content}<|end|>", L"<|assistant|>");
// add messages
AddMessage(ROLE_SYSTEM, L"You are a helpful AI assistant.");
AddMessage(ROLE_USER, L"What is AI?");
// load model
if (!LoadModel(L"phi-3-mini-4k-instruct.Q4_K_M")) return 1;// run inference
if (RunInference(L"phi-3-mini-4k-instruct.Q4_K_M", 1024))
{
// success
}
else
{
// error
}
// unload mode
UnloadModel();
return 0;
}
```### Media
### Support
Our development motto:
- We will not release products that are buggy, incomplete, adding new features over not fixing underlying issues.
- We will strive to fix issues found with our products in a timely manner.
- We will maintain an attitude of quality over quantity for our products.
- We will establish a great rapport with users/customers, with communication, transparency and respect, always encouragingng feedback to help shape the direction of our products.
- We will be decent, fair, remain humble and committed to the craft.### Links
- Issues
- Discussions
- Discord
- Facebook Group
- YouTube
- X (Twitter)
- tinyBigGAMES### License
Infero is a community-driven project created by tinyBigGAMES LLC.BSD-3-Clause license - Core developers:
- Jarrod Davis### Acknowledgments
Infero couldn't have been built without the help of wonderful people and great software already available from the community. **Thank you!**Software
- [llama.cpp](https://github.com/ggerganov/llama.cpp).People
- John Claw
- Robert JalarvoContributors