https://github.com/developer239/llama-chat
https://github.com/developer239/llama-chat
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/developer239/llama-chat
- Owner: developer239
- Created: 2024-07-26T11:12:15.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-09-26T23:56:38.000Z (over 1 year ago)
- Last Synced: 2025-01-04T21:33:01.932Z (about 1 year ago)
- Language: C++
- Size: 47.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LlamaChat 🦙🦙🦙
LlamaChat is a C++ library designed for running language models using the [llama.cpp](https://github.com/ggerganov/llama.cpp) framework. It provides an easy-to-use interface for loading models, querying them, and streaming responses in C++ applications.
**Supported Systems:**
- MacOS
- Windows
- Linux
## Installation
### Add LlamaChat as a Submodule
First, add this library as a submodule in your project:
```bash
$ git submodule add https://github.com/developer239/llama-chat externals/llama-chat
```
Load the module's dependencies:
```bash
$ git submodule update --init --recursive
```
### Update Your CMake
In your project's `CMakeLists.txt`, add the following lines to include and link the LlamaChat library:
```cmake
add_subdirectory(externals/llama-chat)
target_link_libraries( PRIVATE LlamaChat)
```
## Usage
### Basic Usage
To use the LlamaChat library, include the header and create an instance of the `LlamaChat` class. You can initialize the model and context separately, then run queries or stream responses.
```cpp
#include "llama-chat.h"
#include
int main() {
LlamaChat llama;
ModelParams modelParams;
modelParams.nGpuLayers = 32; // Adjust based on your GPU capabilities
if (!llama.InitializeModel("path/to/model", modelParams)) {
std::cerr << "Failed to initialize the model." << std::endl;
return 1;
}
ContextParams ctxParams;
ctxParams.nContext = 2048;
if (!llama.InitializeContext(ctxParams)) {
std::cerr << "Failed to initialize the context." << std::endl;
return 1;
}
std::string systemPrompt = "You are a helpful AI assistant.";
llama.SetSystemPrompt(systemPrompt);
std::string userMessage = "How do I write hello world in C++?";
llama.Prompt(userMessage, [](const std::string& piece) {
std::cout << piece << std::flush;
});
return 0;
}
```
### Streaming Responses
The `Prompt` method implements streaming responses by providing a callback function. This is useful for long outputs.
## API Reference
### LlamaChat Class
The `LlamaChat` class provides methods to interact with language models loaded through llama.cpp.
#### Public Methods
- `LlamaChat()`: Constructor. Initializes the LlamaChat object.
- `~LlamaChat()`: Destructor. Cleans up resources.
- `bool InitializeModel(const std::string& modelPath, const ModelParams& params)`: Initializes the model with the specified path and parameters.
- `bool InitializeContext(const ContextParams& params)`: Initializes the context with the specified parameters.
- `void SetSystemPrompt(const std::string& systemPrompt)`: Sets the system prompt for the conversation.
- `void ResetConversation()`: Resets the conversation history.
- `void Prompt(const std::string& userMessage, const std::function& callback)`: Processes the user message and streams the response, invoking the callback function with each piece of the response.
#### Structs
- `LlamaToken`: Represents a token in the model's vocabulary.
- `tokenId` (int): The unique identifier of the token.
- `ModelParams`: Parameters for model initialization.
- `nGpuLayers` (int): Number of layers to offload to GPU. Set to 0 for CPU-only.
- `vocabularyOnly` (bool): Only load the vocabulary, no weights.
- `useMemoryMapping` (bool): Use memory mapping for faster loading.
- `useModelLock` (bool): Force system to keep model in RAM.
- `ContextParams`: Parameters for context initialization.
- `nContext` (size_t): Size of the context window (in tokens).
- `nThreads` (int): Number of threads to use for computation.
- `nBatch` (int): Number of tokens to process in parallel.
- `SamplingParams`: Parameters for text generation sampling.
- `maxTokens` (size_t): Maximum number of tokens to generate.
- `temperature` (float): Controls randomness in generation.
- `topK` (int32_t): Limits sampling to the k most likely tokens.
- `topP` (float): Limits sampling to a cumulative probability.
- `repeatPenalty` (float): Penalty for repeating tokens.
- `frequencyPenalty` (float): Penalty based on token frequency in generated text.
- `presencePenalty` (float): Penalty for tokens already present in generated text.
- `repeatPenaltyTokens` (std::vector): Tokens to consider for repeat penalty.