Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tib0/llama3-wrapper

Node Llama Cpp wrapper for Node JS
https://github.com/tib0/llama3-wrapper

llama3 llamacpp node wrapper

Last synced: 29 days ago
JSON representation

Node Llama Cpp wrapper for Node JS

Host: GitHub
URL: https://github.com/tib0/llama3-wrapper
Owner: tib0
License: mit
Created: 2024-04-27T17:17:19.000Z (6 months ago)
Default Branch: main
Last Pushed: 2024-06-22T12:25:18.000Z (5 months ago)
Last Synced: 2024-09-30T03:47:38.420Z (about 1 month ago)
Topics: llama3, llamacpp, node, wrapper
Language: TypeScript
Homepage:
Size: 1.04 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

        ![logo](./.github/llama3-wrapper-banner.png)

# Llama3-Wrapper

A wrapper class for interacting with a LLaMA model instance loaded locally.

Use any gguf models, you can find models on [huggingface.co](https://huggingface.co/models?search=gguf).

This projects is based on llama.cpp by [ggerganov](https://github.com/ggerganov) used via node-llama-cpp by [withcatai](https://github.com/withcatai/).

## METHODS

### Constructor

Creates a new instance of the LlamaWrapper class, assigning it a unique ID and initializing its internal state.

### loadModule()

Loads the LLaMA module required for interacting with the model. Throws an error if the module cannot be loaded.

### loadLlama(gpu?: Gpu)

Loads the LLaMA library instance, optionally specifying a GPU device to use. Throws an error if the module isn't initialized.

### loadModel(modelPath: string)

Loads a specific LLaMA model from the specified modelPath. Throws an error if Llama isn't loaded.

### initSession(systemPrompt: string)

Initializes a new chat session with the wrapped LLaMA model using the specified systemPrompt. Throws an error if the model isn't initialized.

### getStatus()

Returns the current status of the wrapper as an object containing the status and optional message. The status can be one of LlamaStatusType (Uninitialized, Ready, Loading, Generating or Error).

### isReady()

Returns a boolean indicating whether the wrapper is in ready state.

### prompt(message: string, onToken?: (chunk: string) => void)

Generates an answer to the specified message and calls the optional onToken callback function with each generated chunk. Throws an error if the session isn't initialized.

### getHistory()

Retrieves the chat history associated with this wrapper's current session. Throws an error if no session is found.

### setHistory(chatHistoryItem: ChatHistoryItem[]): Promise

Sets the chat history associated with this wrapper's current session to the specified chatHistoryItem array. Throws an error if no session is found.

### getId()

Returns the unique ID assigned to this wrapper instance.

### getInfos()

Returns an object containing various information about the wrapped LLaMA model, including its ID, model filename, train context size, and more.

### disposeSession()

Disposes of the current chat session associated with this wrapper. Throws an error if no session is found.

### clearHistory()

Clears the history of the current chat sequence associated with this wrapper. Throws an error if no session or sequence is found.

## TYPES

### Gpu

The `Gpu` type represents a GPU device to use.

- **auto**: The default value, which means the library will automatically select the best available GPU.

- **cuda**: A CUDA-enabled GPU.

- **vulkan**: A VULKAN-enabled GPU.

- **metal**: A METAL-enabled GPU.

- **false**: Skip GPU usage.

### LlamaStatusType

The `LlamaStatusType` type defines the possible status values for a llama wrapper instance.

- **uninitialized**: The initial state of a new llama session.

- **loading**: The session is loading a model or initializing session.

- **ready**: The session is ready to use.

- **error**: An error occurred during initialization or loading.

- **generating**: The session is currently generating a response to a message.

### ChatHistoryItem

The `ChatHistoryItem` type represents a single item in the chat history, which contains the message and its corresponding response. Can be ChatSystemMessage | ChatUserMessage | ChatModelResponse

- **type**: The original message sent by the user. can be 'system', 'user' or 'model'.

- **text**: Represent the stored user text or system prompt. This property comes from ChatSystemMessage and ChatUserMessage types.

- **response**: Represent the model answer, comes from ChatModelResponse type.

## BUILDING FROM SOURCE

By following the steps below, you can build and install the module from source code.

1. Clone the repository:

```sh

git clone https://github.com/tib0/llama3-wrapper.git

```

2. Install dependencies:

```sh

cd ./llama3-wrapper

pnpm i

```

3. Build the module:

```sh

pnpm build

```

4. Link the module globally:

```sh

pnpm link -g

```

5. In the target project folder use the module:

```sh

cd /path/to/target-project

pnpm link -g llama3-wrapper

```

## CONFIGURATION

Add your GGUF model path in a .env file at the root of your project:

```sh

LLAMA_MODELS_PATH=/Users/me/example/LLM/Models/my-model-file.gguf

```

## EXAMPLE

Sample chat-like usage in terminal:

```ts

import { type ChatHistoryItem, LlamaWrapper } from 'llama3-wrapper';

import readline from 'readline';

import { spawn } from 'node:child_process';

const rl = readline.createInterface({

  input: process.stdin,

  output: process.stdout,

});

const run = async () => {

  console.log(`# START LLAMA CHAT`);

  console.log(`\n`);

  console.log(`# Feeding history traces`);

  const history: ChatHistoryItem[] = [

    { type: 'user', text: 'Hey.' },

    { type: 'model', response: ['Hello !'] },

  ];

  console.log(`# Waiting seat allocation`);

  const llamaNodeCPP = new LlamaWrapper();

  await llamaNodeCPP.loadModule();

  await llamaNodeCPP.loadLlama();

  await llamaNodeCPP.loadModel(process.env.LLAMA_MODELS_PATH);

  await llamaNodeCPP.initSession(promptSystem);

  console.log(`# Prompt ready`);

  console.log(`# Activated TTS (voice)`);

  console.log(`\n`);

  rl.setPrompt('1 > ');

  rl.prompt();

  let i = 1;

  rl.on('line', async (q) => {

    if (!q || q === '' || q === 'exit' || q === 'quit' || q === 'q') {

      rl.close();

    } else {

      const a = await llamaNodeCPP.prompt(q);

      console.log(`${i} @ ${a}`);

      spawn('say', [a]);

      console.log(`\n`);

      i++;

    }

    rl.setPrompt(`${i} > `);

    rl.prompt();

  }).on('close', async () => {

    console.log(`\n`);

    console.log(`Disposing session...`);

    await llamaNodeCPP.disposeSession();

    console.log(`\n`);

    const a = await llamaNodeCPP.getHistory();

    console.log(`History:`);

    console.log(JSON.stringify(a));

    console.log(`\n`);

    console.log('# END LLAMA CHAT');

    process.exit(0);

  });

};

run();

```

# RESSOURCES

- [https://github.com/withcatai/node-llama-cpp](https://github.com/withcatai/node-llama-cpp): Run AI models locally on your machine with node.js bindings for llama.cpp. Force a JSON schema on the model output on the generation level.

- [https://github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp): official implementation of LLaMA in C++.

- [https://huggingface.co](https://huggingface.co): let's you explore the models and datasets available on the Hub.

- [https://github.com/facebookresearch/llama](https://github.com/facebookresearch/llama): official implementation of LLaMA.