An open API service indexing awesome lists of open source software.

https://github.com/tinybiggames/jetinfero

Local LLM Inference Library
https://github.com/tinybiggames/jetinfero

ai-inference c-cpp library llama-cpp local-inference pascal procedural-api win64

Last synced: 4 months ago
JSON representation

Local LLM Inference Library

Awesome Lists containing this project

README

        

![JetInfero](media/jetinfero.png)
[![Chat on Discord](https://img.shields.io/discord/754884471324672040?style=for-the-badge)](https://discord.gg/tPWjMwK)
[![Follow on Bluesky](https://img.shields.io/badge/Bluesky-tinyBigGAMES-blue?style=for-the-badge&logo=bluesky)](https://bsky.app/profile/tinybiggames.com)

## ๐ŸŒŸ Fast, Flexible Local LLM Inference for Developers ๐Ÿš€

JetInfero is a nimble and high-performance library that enables developers to integrate local Large Language Models (LLMs) effortlessly into their applications. Powered by **llama.cpp** ๐Ÿ•Š๏ธ, JetInfero prioritizes speed, flexibility, and ease of use ๐ŸŒ. Itโ€™s compatible with any language supporting **Win64**, **Unicode**, and dynamic-link libraries (DLLs).

## ๐Ÿ’ก Why Choose JetInfero?

- **Optimized for Speed** โšก๏ธ: Built on llama.cpp, JetInfero offers lightning-fast inference capabilities with minimal overhead.
- **Cross-Language Support** ๐ŸŒ: Seamlessly integrates with Delphi, C++, C#, Java, and other Win64-compatible environments.
- **Intuitive API** ๐Ÿ”ฌ: A clean procedural API simplifies model management, inference execution, and callback handling.
- **Customizable Templates** ๐Ÿ–‹๏ธ: Tailor input prompts to suit different use cases with ease.
- **Scalable Performance** ๐Ÿš€: Leverage GPU acceleration, token streaming, and multi-threaded execution for demanding workloads.

## ๐Ÿ› ๏ธ Key Features

### ๐Ÿค– Advanced AI Integration

JetInfero expands your toolkit with capabilities such as:

- Dynamic chatbot creation ๐Ÿ—ฃ๏ธ.
- Automated text generation ๐Ÿ”„ and summarization ๐Ÿ•ป.
- Context-aware content creation ๐ŸŒ.
- Real-time token streaming for adaptive applications โŒš.

### ๐Ÿ”’ Privacy-Centric Local Execution

- Operates entirely offline ๐Ÿ”, ensuring sensitive data remains secure.
- GPU acceleration supported via Vulkan for enhanced performance ๐Ÿš’.

### โš™๏ธ Performance Optimization

- Configure GPU utilization with `AGPULayers` ๐Ÿ”„.
- Allocate threads dynamically using `AMaxThreads` ๐ŸŒ.
- Access performance metrics to monitor throughput and efficiency ๐Ÿ“Š.

### ๐Ÿ”€ Flexible Prompt Templates

JetInferoโ€™s template system simplifies input customization. Templates include placeholders such as:

- **`{role}`**: Denotes the senderโ€™s role (e.g., `user`, `assistant`).
- **`{content}`**: Represents the message content.

For example:

```pascal
jiDefineModel(
// Model Filename
'C:/LLM/GGUFDolphin3.0-Llama3.1-8B-Q4_K_M.gguf',

// Model Refname
'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',

// Model Template
'<|im_start|>{role}\n{content}<|im_end|>',

// Model Template End
'<|im_start|>assistant',

// Capitalize Role
False,

// Max Context
8192,

// Main GPU, -1 for best, 0..N GPU number
-1,

// GPU Layers, -1 for max, 0 for CPU only, 1..N for layer
-1,

// Max threads, default 4, max will be physical CPU count
4
);
```

#### Template Benefits

- **Adaptability** ๐ŸŒ: Customize prompts for various LLMs and use cases.
- **Consistency** ๐Ÿ”„: Ensure predictable inputs for reliable results.
- **Flexibility** ๐ŸŒˆ: Modify prompt formats for tasks like JSON or markdown generation.

### ๐Ÿ‚ Streamlined Model Management

- Define models with `jiDefineModel` ๐Ÿ”จ.
- Load/unload models dynamically using `jiLoadModel` and `jiUnloadModel` ๐Ÿ”€.
- Save/load model configurations with `jiSaveModelDefines` and `jiLoadModelDefines` ๐Ÿ—ƒ๏ธ.
- Clear all model definitions using `jiClearModelDefines` ๐Ÿงน.

### ๐Ÿ” Inference Execution

- Perform inference tasks with `jiRunInference` โš™๏ธ.
- Stream real-time tokens via `InferenceTokenCallback` โŒš.
- Retrieve responses using `jiGetInferenceResponse` ๐Ÿ–Š๏ธ.

### ๐Ÿ“Š Performance Monitoring

- Retrieve detailed metrics like tokens/second, input/output token counts, and execution time via `jiGetPerformanceResult` ๐Ÿ“Š.

## ๐Ÿ› ๏ธ Installation

1. **Download the Repository** ๐Ÿ“ฆ
- [Download here](https://github.com/tinyBigGAMES/JetInfero/archive/refs/heads/main.zip) and extract the files to your preferred directory ๐Ÿ“‚.

Ensure `JetInfero.dll` is accessible in your project directory.

2. **Acquire a GGUF Model** ๐Ÿง 
- Obtain a model from [Hugging Face](https://huggingface.co), such as [
Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF](https://huggingface.co/tinybiggames/Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF/resolve/main/dolphin3.0-llama3.1-8b-q4_k_m.gguf?download=true), a good general purpose model. You can download directly from our Hugging Face account. See the [model card](https://huggingface.co/tinybiggames/Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF) for more information.
- Save it to a directory accessible to your application (e.g., `C:/LLM/GGUF`) ๐Ÿ’พ.

2. **Add JetInfero to Your Project** ๐Ÿ”จ
- Include the `JetInfero` unit in your Delphi project.

4. **Ensure GPU Compatibility** ๐ŸŽฎ
- Verify Vulkan compatibility for enhanced performance โšก. Adjust `AGPULayers` as needed to accommodate VRAM limitations ๐Ÿ“‰.

5. **Building JetInfero DLL** ๐Ÿ› ๏ธ
- Open and compile the `JetInfero.dproj` project ๐Ÿ“‚. This process will generate the 64-bit `JetInfero.dll` in the `lib` folder ๐Ÿ—‚๏ธ.
- The project was created and tested using Delphi 12.2 on Windows 11 24H2 ๐Ÿ–ฅ๏ธ.

6. **Using JetInfero** ๐Ÿš€
- JetInfero can be used with any programming language that supports Win64 and Unicode bindings ๐Ÿ’ป.
- Ensure the `JetInfero.dll` is included in your distribution and accessible at runtime ๐Ÿ“ฆ.

**Note: JetInfero requires direct access to the GPU/CPU and is not recommended for use inside a virtual machine.**

## ๐Ÿ“ˆ Quick Start

### โš™๏ธ Basic Setup

Integrate JetInfero into your Delphi project:

```pascal
uses
JetInfero;

var
LTokensPerSec: Double;
LTotalInputTokens: Int32;
LTotalOutputTokens: Int32;
begin
if jiInit() then
begin
jiDefineModel(
'C:/LLM/GGUF/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
'<|im_start|>{role}\n{content}<|im_end|>',
'<|im_start|>assistant', False, 8192, -1, -1, 4);

jiLoadModel('Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf');

jiAddMessage('user', 'What is AI?');

if jiRunInference(PWideChar(LModelRef)) then
begin
jiGetPerformanceResult(@LTokensPerSec, @LTotalInputTokens, @LTotalOutputTokens);
WriteLn('Input Tokens : ', LTotalInputTokens);
WriteLn('Output Tokens: ', LTotalOutputTokens);
WriteLn('Speed : ', LTokensPerSec:3:2, ' t/s');
end
else
begin
WriteLn('Error: ', jiGetLastError());
end;

jiUnloadModel();
jiQuit();
end;

end.
```

### ๐Ÿ” Using Callbacks

Define a custom callback to handle token streaming:

```pascal
procedure InferenceCallback(const Token: string; const UserData: Pointer);
begin
Write(Token);
end;

jiSetInferenceTokenCallback(@InferenceCallback, nil);
```

### ๐Ÿ“Š Retrieve Performance Metrics

Access performance results to monitor efficiency:

```pascal
var
Metrics: TPerformanceResult;
begin
Metrics := jiGetPerformanceResult();
WriteLn('Tokens/Sec: ', Metrics.TokensPerSecond);
WriteLn('Input Tokens: ', Metrics.TotalInputTokens);
WriteLn('Output Tokens: ', Metrics.TotalOutputTokens);
end;
```

### ๐Ÿ› ๏ธ Support and Resources

- Report issues via the [Issue Tracker](https://github.com/tinyBigGAMES/jetinfero/issues) ๐Ÿž.
- Engage in discussions on the [Forum](https://github.com/tinyBigGAMES/jetinfero/discussions) and [Discord](https://discord.gg/tPWjMwK) ๐Ÿ’ฌ.
- Learn more at [Learn Delphi](https://learndelphi.org) ๐Ÿ“š.

### ๐Ÿค Contributing

Contributions to **โœจ JetInfero** are highly encouraged! ๐ŸŒŸ
- ๐Ÿ› **Report Issues:** Submit issues if you encounter bugs or need help.
- ๐Ÿ’ก **Suggest Features:** Share your ideas to make **Lumina** even better.
- ๐Ÿ”ง **Create Pull Requests:** Help expand the capabilities and robustness of the library.

Your contributions make a difference! ๐Ÿ™Œโœจ

#### Contributors ๐Ÿ‘ฅ๐Ÿค



### ๐Ÿ“œ Licensing

**JetInfero** is distributed under the ๐Ÿ†“ **BSD-3-Clause License**, allowing for redistribution and use in both source and binary forms, with or without modification, under specific conditions. See the [LICENSE](https://github.com/tinyBigGAMES/JetInfero?tab=BSD-3-Clause-1-ov-file#BSD-3-Clause-1-ov-file) file for more details.

---
**Elevate your Delphi projects with JetInfero ๐Ÿš€ โ€“ your bridge to seamless local generative AI integration ๐Ÿค–.**


Delphi


Made with :heart: in Delphi