Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/duck4i/node-llama

LLM inside your Node.JS
https://github.com/duck4i/node-llama

llamacpp llm llm-inference local nodejs

Last synced: 15 days ago
JSON representation

LLM inside your Node.JS

Host: GitHub
URL: https://github.com/duck4i/node-llama
Owner: duck4i
Created: 2025-01-03T18:38:16.000Z (17 days ago)
Default Branch: main
Last Pushed: 2025-01-04T11:31:11.000Z (16 days ago)
Last Synced: 2025-01-04T11:37:31.227Z (16 days ago)
Topics: llamacpp, llm, llm-inference, local, nodejs
Language: C++
Homepage:
Size: 1.51 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 4
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # NODE-LLAMA

Run llama cpp locally inside your Node environment. 

# Build Status

| OS      | Node 18 | Node 20 | Node 22 |

|---------|---------|---------|---------|

| Ubuntu  | [![Ubuntu Node 18](https://github.com/duck4i/node-llama/actions/workflows/build.yml/badge.svg?branch=main&jobName=build%20(ubuntu-latest%2C%2018.x))](https://github.com/duck4i/node-llama/actions/workflows/build.yml) | [![Ubuntu Node 20](https://github.com/duck4i/node-llama/actions/workflows/build.yml/badge.svg?branch=main&jobName=build%20(ubuntu-latest%2C%2020.x))](https://github.com/duck4i/node-llama/actions/workflows/build.yml) | [![Ubuntu Node 22](https://github.com/duck4i/node-llama/actions/workflows/build.yml/badge.svg?branch=main&jobName=build%20(ubuntu-latest%2C%2022.x))](https://github.com/duck4i/node-llama/actions/workflows/build.yml) |

| macOS   | [![macOS Node 18](https://github.com/duck4i/node-llama/actions/workflows/build.yml/badge.svg?branch=main&jobName=build%20(macos-latest%2C%2018.x))](https://github.com/duck4i/node-llama/actions/workflows/build.yml) | [![macOS Node 20](https://github.com/duck4i/node-llama/actions/workflows/build.yml/badge.svg?branch=main&jobName=build%20(macos-latest%2C%2020.x))](https://github.com/duck4i/node-llama/actions/workflows/build.yml) | [![macOS Node 22](https://github.com/duck4i/node-llama/actions/workflows/build.yml/badge.svg?branch=main&jobName=build%20(macos-latest%2C%2022.x))](https://github.com/duck4i/node-llama/actions/workflows/build.yml) |

| Windows | [![Windows Node 18](https://github.com/duck4i/node-llama/actions/workflows/build.yml/badge.svg?branch=main&jobName=build%20(windows-latest%2C%2018.x))](https://github.com/duck4i/node-llama/actions/workflows/build.yml) | [![Windows Node 20](https://github.com/duck4i/node-llama/actions/workflows/build.yml/badge.svg?branch=main&jobName=build%20(windows-latest%2C%2020.x))](https://github.com/duck4i/node-llama/actions/workflows/build.yml) | [![Windows Node 22](https://github.com/duck4i/node-llama/actions/workflows/build.yml/badge.svg?branch=main&jobName=build%20(windows-latest%2C%2022.x))](https://github.com/duck4i/node-llama/actions/workflows/build.yml) |

# Package Info

[![npm version](https://badge.fury.io/js/@duck4i%2Fllama.svg)](https://badge.fury.io/js/@duck4i%2Fllama)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

[![Node Version](https://img.shields.io/node/v/@duck4i/llama)](https://www.npmjs.com/package/@duck4i/llama)

## Reasoning 

Sometimes you just need a **small model** that can run anywhere and can't be bothered with making REST calls to services like OpenRouter or Ollama. 

This project is super simple, NodeJS native inference based on `llamacpp` project and with no need for external services.

Install NPM, download a model, and run it. Simple as.

## Features

 

- Minimal dependencies (mostly CMake and GCC) and no need for external services

- High performance, full speed of `llamacpp` with a thin layer of Node

- Supports most LLM models

- Easy to use API

- Command line for direct inference and model download

## Installation

```sh

npm install @duck4i/llama

```

Please note that you need CMake and GCC installed if you don't have it already, as the plugin is cpp based.

```sh

sudo apt-get install -y build-essential cmake g++

```

## Usage

```javascript

const { RunInference } = require('@duck4i/llama');

const system_prompt = "The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.";

const user_prompt = "What is life expectancy of a duck?";

const inference = RunInference("model.gguf", user_prompt, system_prompt, /*optional*/ 512);

console.log("Answer", inference);

```

It is likely you will want async functions for better memory management with multiple prompts, which is done like this:

```javascript

const { LoadModelAsync, CreateContextAsync, RunInferenceAsync, ReleaseContextAsync, ReleaseModelAsync } = require('@duck4i/llama');

const system_prompt = "The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.";

const prompts = [

    "How old can ducks get?",

    "Why are ducks so cool?",

    "Is there a limit on number of ducks I can own?"

]

const model = await LoadModelAsync("model.gguf");

const ctx = await CreateContextAsync(model, /*optional n_ctx*/ 0, /*optional flash_att*/ true);

console.log("Model loaded", model);

for (const prompt of prompts) {

    const inference = await RunInferenceAsync(model, ctx, prompt, system_prompt, /*optional max tokens*/ 512);

    console.log("Answer:", inference);

}

await ReleaseContextAsync(model);

await ReleaseModelAsync(model);

```

### Model format

Its likely you will want more control over the model, so you can push the complete formatted prompt to it with prefix `!#`, like this:

```javascript

const system = "You are ...";

const user = "...";

//  QWEN example (prefix !# will get removed before reaching the llm)

const prompt = `"!#<|im_start|>system ${system}<|im_end|><|im_start|>user ${user}<|im_end|><|im_start|>assistant"`;

const reply = await RunInferenceAsync(modelHandle, prompt, /*optional max token*/ 128)

```

Getting tokens from model is done by `GetModelToken` method.

```javascript

const eos = GetModelToken(modelHandle, "EOS");

const bos = GetModelToken(modelHandle, "BOS");

const eot = GetModelToken(modelHandle, "EOT");

const sep = GetModelToken(modelHandle, "SEP");

const cls = GetModelToken(modelHandle, "CLS");

const nl = GetModelToken(modelHandle, "NL");

```

### Logging control

You can control log levels coming from llamacpp like this:

```javascript

const { SetLogLevel } = require('@duck4i/llama');

// 0 - none, 1 - debug, 2 - info, 3 - warn, 4 - error

SetLogLevel(1);

```

## Command line 

```bash

# Download model

npx llama-download -u https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-fp16.gguf?download=true -p model.gguf

# Run inference

npx llama-run -m model.gguf -p "How old can ducks get?"

# Run with system prompt

npx llama-run -m model.gguf -p "How old can ducks get?" -s "[System prompt...]"

```

## Supported Models

All models supported by `llamacpp` natively are supported here too, so do check their [repository](https://github.com/ggerganov/llama.cpp).

Please keep in mind that CUDA is not enabled yet due to complex dependencies so keep the model size in check.

On MacOS, the Metal backend should come included.

## Contributing

Contributions are welcome! Please open an issue or submit a pull request.

## License

This project is licensed under the MIT License.