https://github.com/tinybiggames/jetinfero
Local LLM Inference Library
https://github.com/tinybiggames/jetinfero
ai-inference c-cpp library llama-cpp local-inference pascal procedural-api win64
Last synced: 4 months ago
JSON representation
Local LLM Inference Library
- Host: GitHub
- URL: https://github.com/tinybiggames/jetinfero
- Owner: tinyBigGAMES
- License: bsd-3-clause
- Created: 2025-01-24T01:48:12.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-01-30T20:18:00.000Z (5 months ago)
- Last Synced: 2025-03-09T15:11:25.320Z (4 months ago)
- Topics: ai-inference, c-cpp, library, llama-cpp, local-inference, pascal, procedural-api, win64
- Language: Pascal
- Homepage:
- Size: 10.2 MB
- Stars: 6
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README

[](https://discord.gg/tPWjMwK)
[](https://bsky.app/profile/tinybiggames.com)## ๐ Fast, Flexible Local LLM Inference for Developers ๐
JetInfero is a nimble and high-performance library that enables developers to integrate local Large Language Models (LLMs) effortlessly into their applications. Powered by **llama.cpp** ๐๏ธ, JetInfero prioritizes speed, flexibility, and ease of use ๐. Itโs compatible with any language supporting **Win64**, **Unicode**, and dynamic-link libraries (DLLs).
## ๐ก Why Choose JetInfero?
- **Optimized for Speed** โก๏ธ: Built on llama.cpp, JetInfero offers lightning-fast inference capabilities with minimal overhead.
- **Cross-Language Support** ๐: Seamlessly integrates with Delphi, C++, C#, Java, and other Win64-compatible environments.
- **Intuitive API** ๐ฌ: A clean procedural API simplifies model management, inference execution, and callback handling.
- **Customizable Templates** ๐๏ธ: Tailor input prompts to suit different use cases with ease.
- **Scalable Performance** ๐: Leverage GPU acceleration, token streaming, and multi-threaded execution for demanding workloads.## ๐ ๏ธ Key Features
### ๐ค Advanced AI Integration
JetInfero expands your toolkit with capabilities such as:
- Dynamic chatbot creation ๐ฃ๏ธ.
- Automated text generation ๐ and summarization ๐ป.
- Context-aware content creation ๐.
- Real-time token streaming for adaptive applications โ.### ๐ Privacy-Centric Local Execution
- Operates entirely offline ๐, ensuring sensitive data remains secure.
- GPU acceleration supported via Vulkan for enhanced performance ๐.### โ๏ธ Performance Optimization
- Configure GPU utilization with `AGPULayers` ๐.
- Allocate threads dynamically using `AMaxThreads` ๐.
- Access performance metrics to monitor throughput and efficiency ๐.### ๐ Flexible Prompt Templates
JetInferoโs template system simplifies input customization. Templates include placeholders such as:
- **`{role}`**: Denotes the senderโs role (e.g., `user`, `assistant`).
- **`{content}`**: Represents the message content.For example:
```pascal
jiDefineModel(
// Model Filename
'C:/LLM/GGUFDolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
// Model Refname
'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
// Model Template
'<|im_start|>{role}\n{content}<|im_end|>',
// Model Template End
'<|im_start|>assistant',
// Capitalize Role
False,
// Max Context
8192,
// Main GPU, -1 for best, 0..N GPU number
-1,
// GPU Layers, -1 for max, 0 for CPU only, 1..N for layer
-1,
// Max threads, default 4, max will be physical CPU count
4
);
```#### Template Benefits
- **Adaptability** ๐: Customize prompts for various LLMs and use cases.
- **Consistency** ๐: Ensure predictable inputs for reliable results.
- **Flexibility** ๐: Modify prompt formats for tasks like JSON or markdown generation.### ๐ Streamlined Model Management
- Define models with `jiDefineModel` ๐จ.
- Load/unload models dynamically using `jiLoadModel` and `jiUnloadModel` ๐.
- Save/load model configurations with `jiSaveModelDefines` and `jiLoadModelDefines` ๐๏ธ.
- Clear all model definitions using `jiClearModelDefines` ๐งน.### ๐ Inference Execution
- Perform inference tasks with `jiRunInference` โ๏ธ.
- Stream real-time tokens via `InferenceTokenCallback` โ.
- Retrieve responses using `jiGetInferenceResponse` ๐๏ธ.### ๐ Performance Monitoring
- Retrieve detailed metrics like tokens/second, input/output token counts, and execution time via `jiGetPerformanceResult` ๐.
## ๐ ๏ธ Installation
1. **Download the Repository** ๐ฆ
- [Download here](https://github.com/tinyBigGAMES/JetInfero/archive/refs/heads/main.zip) and extract the files to your preferred directory ๐.Ensure `JetInfero.dll` is accessible in your project directory.
2. **Acquire a GGUF Model** ๐ง
- Obtain a model from [Hugging Face](https://huggingface.co), such as [
Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF](https://huggingface.co/tinybiggames/Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF/resolve/main/dolphin3.0-llama3.1-8b-q4_k_m.gguf?download=true), a good general purpose model. You can download directly from our Hugging Face account. See the [model card](https://huggingface.co/tinybiggames/Dolphin3.0-Llama3.1-8B-Q4_K_M-GGUF) for more information.
- Save it to a directory accessible to your application (e.g., `C:/LLM/GGUF`) ๐พ.2. **Add JetInfero to Your Project** ๐จ
- Include the `JetInfero` unit in your Delphi project.4. **Ensure GPU Compatibility** ๐ฎ
- Verify Vulkan compatibility for enhanced performance โก. Adjust `AGPULayers` as needed to accommodate VRAM limitations ๐.
5. **Building JetInfero DLL** ๐ ๏ธ
- Open and compile the `JetInfero.dproj` project ๐. This process will generate the 64-bit `JetInfero.dll` in the `lib` folder ๐๏ธ.
- The project was created and tested using Delphi 12.2 on Windows 11 24H2 ๐ฅ๏ธ.6. **Using JetInfero** ๐
- JetInfero can be used with any programming language that supports Win64 and Unicode bindings ๐ป.
- Ensure the `JetInfero.dll` is included in your distribution and accessible at runtime ๐ฆ.**Note: JetInfero requires direct access to the GPU/CPU and is not recommended for use inside a virtual machine.**
## ๐ Quick Start
### โ๏ธ Basic Setup
Integrate JetInfero into your Delphi project:
```pascal
uses
JetInfero;var
LTokensPerSec: Double;
LTotalInputTokens: Int32;
LTotalOutputTokens: Int32;
begin
if jiInit() then
begin
jiDefineModel(
'C:/LLM/GGUF/Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
'Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf',
'<|im_start|>{role}\n{content}<|im_end|>',
'<|im_start|>assistant', False, 8192, -1, -1, 4);
jiLoadModel('Dolphin3.0-Llama3.1-8B-Q4_K_M.gguf');jiAddMessage('user', 'What is AI?');
if jiRunInference(PWideChar(LModelRef)) then
begin
jiGetPerformanceResult(@LTokensPerSec, @LTotalInputTokens, @LTotalOutputTokens);
WriteLn('Input Tokens : ', LTotalInputTokens);
WriteLn('Output Tokens: ', LTotalOutputTokens);
WriteLn('Speed : ', LTokensPerSec:3:2, ' t/s');
end
else
begin
WriteLn('Error: ', jiGetLastError());
end;jiUnloadModel();
jiQuit();
end;end.
```### ๐ Using Callbacks
Define a custom callback to handle token streaming:
```pascal
procedure InferenceCallback(const Token: string; const UserData: Pointer);
begin
Write(Token);
end;jiSetInferenceTokenCallback(@InferenceCallback, nil);
```### ๐ Retrieve Performance Metrics
Access performance results to monitor efficiency:
```pascal
var
Metrics: TPerformanceResult;
begin
Metrics := jiGetPerformanceResult();
WriteLn('Tokens/Sec: ', Metrics.TokensPerSecond);
WriteLn('Input Tokens: ', Metrics.TotalInputTokens);
WriteLn('Output Tokens: ', Metrics.TotalOutputTokens);
end;
```### ๐ ๏ธ Support and Resources
- Report issues via the [Issue Tracker](https://github.com/tinyBigGAMES/jetinfero/issues) ๐.
- Engage in discussions on the [Forum](https://github.com/tinyBigGAMES/jetinfero/discussions) and [Discord](https://discord.gg/tPWjMwK) ๐ฌ.
- Learn more at [Learn Delphi](https://learndelphi.org) ๐.### ๐ค Contributing
Contributions to **โจ JetInfero** are highly encouraged! ๐
- ๐ **Report Issues:** Submit issues if you encounter bugs or need help.
- ๐ก **Suggest Features:** Share your ideas to make **Lumina** even better.
- ๐ง **Create Pull Requests:** Help expand the capabilities and robustness of the library.Your contributions make a difference! ๐โจ
#### Contributors ๐ฅ๐ค
### ๐ Licensing
**JetInfero** is distributed under the ๐ **BSD-3-Clause License**, allowing for redistribution and use in both source and binary forms, with or without modification, under specific conditions. See the [LICENSE](https://github.com/tinyBigGAMES/JetInfero?tab=BSD-3-Clause-1-ov-file#BSD-3-Clause-1-ov-file) file for more details.
---
**Elevate your Delphi projects with JetInfero ๐ โ your bridge to seamless local generative AI integration ๐ค.**
![]()
Made with :heart: in Delphi