https://github.com/tinybiggames/infero

An easy to use, high performant CUDA powered LLM inference library.
https://github.com/tinybiggames/infero

cuda llamacpp llm-inference win64 windows-10 windows-11

Last synced: 5 months ago
JSON representation

An easy to use, high performant CUDA powered LLM inference library.

Host: GitHub
URL: https://github.com/tinybiggames/infero
Owner: tinyBigGAMES
License: bsd-3-clause
Created: 2024-06-05T21:15:14.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-06-06T20:04:20.000Z (about 1 year ago)
Last Synced: 2024-09-22T19:02:10.746Z (10 months ago)
Topics: cuda, llamacpp, llm-inference, win64, windows-10, windows-11
Language: Pascal
Homepage:
Size: 2.24 MB
Stars: 12
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

        ![Infero](media/Infero.jpg)

[![Chat on Discord](https://img.shields.io/discord/754884471324672040.svg?logo=discord)](https://discord.gg/tPWjMwK) [![Twitter Follow](https://img.shields.io/twitter/follow/tinyBigGAMES?style=social)](https://twitter.com/tinyBigGAMES)

# Infero

###  Overview

A streamlined and user-friendly library designed for performing local LLM inference directly through your preferred programming language. This library efficiently loads LLMs in [GGUF format](https://huggingface.co/docs/hub/gguf) into CPU or GPU memory, utilizing a [CUDA backend](https://blogs.nvidia.com/blog/what-is-cuda-2/) for enhanced processing speed.

###  Installation

- Download the [Infero](https://github.com/tinyBigGAMES/Infero/archive/refs/heads/main.zip) repo.

- Download the Infero [Runtime](https://github.com/tinyBigGAMES/Infero/releases/tag/v1.0.0) dependencies, `CUDA` and `llama`. These DLLs must be present on your target device for Infero to function properly. Please ensure they are placed in the same directory as your Infero executable file.

- Acquire a GGUF model. All vetted models compatible with Infero can be downloaded from our Hugging Face account.

- The application utilizes CUDA for enhanced performance on supported [GPUs](docs/gpu.md). Ensure the model size does not exceed the available system resources, considering the requisite memory.

- Consult the `installdir\examples` directory for demonstrations on integrating **Infero** with your programming language.

- Include the following DLLs in your project distribution: `CUDA runtime`, `llama runtime`, and `Infero.dll`.

- Infero API supports integration across programming languages that accommodate Win64 and Unicode, with out-of-the-box support for Pascal and C/C++.

- Ship-ready DLLs are included in the repository; however, if there is a need to rebuild the `Infero.dll`, Delphi 12.1 is required.

- This project is developed using RAD Studio 12.1, on Windows 11, powered by an Intel Core i5-12400F at 2500 MHz with 6 cores (12 logical), equipped with 36GB RAM and an NVIDIA RTX 3060 GPU with 12GB VRAM.

- We encourage testing and welcome pull requests.

- If you find this project beneficial, please consider starring the repository, sponsoring, or promoting it. Your support is invaluable and highly appreciated.

 

###  Examples  

Pascal example:

```Delphi   

uses

  SysUtils,

  Infero;

begin

  // init config

  InitConfig('C:\LLM\gguf', -1);

  

  // define model

  DefineModel('phi-3-mini-4k-instruct.Q4_K_M.gguf',

    'phi-3-mini-4k-instruct.Q4_K_M', 4000,

    '<|{role}|>{content}<|end|>', '<|assistant|>');  

    

  // add messages

  AddMessage(ROLE_SYSTEM, 'You are a helpful AI assistant.');

  AddMessage(ROLE_USER, 'What is AI?');

  

  // load model

  if not LoadModel('phi-3-mini-4k-instruct.Q4_K_M') then Exit;

  // run inference

  if RunInference('phi-3-mini-4k-instruct.Q4_K_M', 1024) then

    begin

      // success

    end

 else

   begin

     // error

   end;

 

  // unload mode

  UnloadModel();

end.

```  

C/CPP Example  

```CPP  

#include 

int main()

{

    // init config

    InitConfig('C:/LLM/gguf', -1);

  

    // define model

    DefineModel(L"phi-3-mini-4k-instruct.Q4_K_M.gguf",

      L"phi-3-mini-4k-instruct.Q4_K_M", 4000,

      L"<|{role}|>{content}<|end|>", L"<|assistant|>");  

    

    // add messages

    AddMessage(ROLE_SYSTEM, L"You are a helpful AI assistant.");

    AddMessage(ROLE_USER, L"What is AI?");

  

    // load model

    if (!LoadModel(L"phi-3-mini-4k-instruct.Q4_K_M")) return 1;

    // run inference

    if (RunInference(L"phi-3-mini-4k-instruct.Q4_K_M", 1024))

    {

        // success

    }

    else

    {

        // error

    }

 

    // unload mode

    UnloadModel();

    

    return 0;

}

```

###  Media

###  Support

Our development motto: 

- We will not release products that are buggy, incomplete, adding new features over not fixing underlying issues.

- We will strive to fix issues found with our products in a timely manner.

- We will maintain an attitude of quality over quantity for our products.

- We will establish a great rapport with users/customers, with communication, transparency and respect, always encouragingng feedback to help shape the direction of our products.

- We will be decent, fair, remain humble and committed to the craft.

###  Links

- Issues

- Discussions

- Discord

- Facebook Group

- Reddit

- YouTube

- X (Twitter)

- tinyBigGAMES

###  License

Infero is a community-driven project created by tinyBigGAMES LLC.

BSD-3-Clause license - Core developers:

- Jarrod Davis

###  Acknowledgments

Infero couldn't have been built without the help of wonderful people and great software already available from the community. **Thank you!**

Software

- [llama.cpp](https://github.com/ggerganov/llama.cpp). 

People

- John Claw

- Robert Jalarvo

Contributors

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tinybiggames/infero

Awesome Lists containing this project

README