https://github.com/atome-fe/llama-node

Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.
https://github.com/atome-fe/llama-node

ai embeddings gpt langchain large-language-models llama llama-node llama-rs llamacpp llm napi napi-rs nodejs rwkv

Last synced: over 1 year ago
JSON representation

Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.

Host: GitHub
URL: https://github.com/atome-fe/llama-node
Owner: Atome-FE
License: apache-2.0
Archived: true
Created: 2023-03-20T10:47:19.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-08-03T02:46:18.000Z (almost 3 years ago)
Last Synced: 2025-03-28T15:05:02.240Z (over 1 year ago)
Topics: ai, embeddings, gpt, langchain, large-language-models, llama, llama-node, llama-rs, llamacpp, llm, napi, napi-rs, nodejs, rwkv
Language: Rust
Homepage: https://llama-node.vercel.app/
Size: 30.4 MB
Stars: 869
Watchers: 16
Forks: 64
Open Issues: 47
Metadata Files:
- Readme: README-zh-CN.md
- Funding: .github/FUNDING.yml
- License: LICENSE-APACHE.MD
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          # llama-node

Node.js运行的大语言模型LLaMA。

这个项目处于早期阶段，nodejs的API可能会在未来发生变化，请谨慎使用。



_{图片由Stable diffusion生成}

![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/hlhr202/llama-node/llama-build.yml)

![NPM](https://img.shields.io/npm/l/llama-node)

[](https://www.npmjs.com/package/llama-node)

![npm type definitions](https://img.shields.io/npm/types/llama-node)

[](https://twitter.com/hlhr202)

---

- [llama-node](#llama-node)

  - [介绍](#介绍)

  - [安装](#安装)

  - [模型获取](#模型获取)

    - [模型版本](#模型版本)

      - [llama.cpp](#llamacpp)

      - [llama-rs](#llama-rs)

  - [使用（llama.cpp后端）](#使用llamacpp后端)

    - [推理](#推理)

    - [分词](#分词)

    - [嵌入](#嵌入)

  - [使用（llama-rs后端）](#使用llama-rs后端)

    - [推理](#推理-1)

    - [分词](#分词-1)

    - [嵌入](#嵌入-1)

  - [LangChain.js 扩展!](#langchainjs-扩展)

  - [关于性能](#关于性能)

    - [手动编译 (from node\_modules)](#手动编译-from-node_modules)

    - [手动编译 (from source)](#手动编译-from-source)

  - [未来计划](#未来计划)

---

## 介绍

这是一个基于[llama-rs](https://github.com/rustformers/llama-rs)和[llm-chain-llama-sys](https://github.com/sobelio/llm-chain/tree/main/llm-chain-llama/sys)（为[llama.cpp](https://github.com/ggerganov/llama.cpp)生成的rust绑定）开发的nodejs客户端库，用于Llama（及部分周边模型） LLM。它使用[napi-rs](https://github.com/napi-rs/napi-rs)在node.js和llama线程之间传递消息。

从v0.0.21开始，同时支持llama-rs和llama.cpp后端

当前支持平台:

- darwin-x64

- darwin-arm64

- linux-x64-gnu (glibc >= 2.31)

- linux-x64-musl

- win32-x64-msvc

Node.js最低版本：16

我没有硬件能够测试13B或更大的模型，但我已成功地测试了支持llama 7B模型的ggml llama和ggml alpaca。

---

## 安装

- 安装核心包

```bash

npm install llama-node

```

- 安装llama-rs后端

```bash

npm install @llama-node/core

```

- 安装llama.cpp后端

```bash

npm install @llama-node/llama-cpp

```

---

## 模型获取

llama-node底层调用llama-rs，它使用的模型格式源自llama.cpp。由于meta发布模型仅用于研究机构测试，本项目不提供模型下载。如果你获取到了 **.pth** 原始模型，请阅读[Getting the weights](https://github.com/rustformers/llama-rs#getting-the-weights)这份文档并使用llama-rs提供的convert工具进行转化

### 模型版本

#### llama.cpp

以下是llama.cpp支持的模型类型，ggml.h源码中可找到：

```c

enum ggml_type {

    // explicitly numbered values are used in llama.cpp files

    GGML_TYPE_F32  = 0,

    GGML_TYPE_F16  = 1,

    GGML_TYPE_Q4_0 = 2,

    GGML_TYPE_Q4_1 = 3,

    GGML_TYPE_Q4_2 = 4,

    GGML_TYPE_Q4_3 = 5,

    GGML_TYPE_Q8_0 = 6,

    GGML_TYPE_I8,

    GGML_TYPE_I16,

    GGML_TYPE_I32,

    GGML_TYPE_COUNT,

};

```

#### llama-rs

以下是llama-rs支持的模型类型，从llama-rs的ggml绑定中可找到：

```rust

pub enum Type {

    /// Quantized 4-bit (type 0).

    #[default]

    Q4_0,

    /// Quantized 4-bit (type 1); used by GPTQ.

    Q4_1,

    /// Integer 32-bit.

    I32,

    /// Float 16-bit.

    F16,

    /// Float 32-bit.

    F32,

}

```

llama-rs也支持旧版的ggml/ggmf模型

---

## 使用（llama.cpp后端）

当前版本只支持在一个LLama实例上进行单个推理会话。

如果您希望同时进行多个推理会话，则需要创建多个LLama实例。

### 推理

```typescript

import { LLama } from "llama-node";

import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";

import path from "path";

const model = path.resolve(process.cwd(), "./ggml-vic7b-q5_1.bin");

const llama = new LLama(LLamaCpp);

const config: LoadConfig = {

    path: model,

    enableLogging: true,

    nCtx: 1024,

    nParts: -1,

    seed: 0,

    f16Kv: false,

    logitsAll: false,

    vocabOnly: false,

    useMlock: false,

    embedding: false,

    useMmap: true,

};

llama.load(config);

const template = `How are you`;

const prompt = `### Human:

${template}

### Assistant:`;

llama.createCompletion(

    {

        nThreads: 4,

        nTokPredict: 2048,

        topK: 40,

        topP: 0.1,

        temp: 0.2,

        repeatPenalty: 1,

        stopSequence: "### Human",

        prompt,

    },

    (response) => {

        process.stdout.write(response.token);

    }

);

```

### 分词

```typescript

import { LLama } from "llama-node";

import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";

import path from "path";

const model = path.resolve(process.cwd(), "./ggml-vic7b-q5_1.bin");

const llama = new LLama(LLamaCpp);

const config: LoadConfig = {

    path: model,

    enableLogging: true,

    nCtx: 1024,

    nParts: -1,

    seed: 0,

    f16Kv: false,

    logitsAll: false,

    vocabOnly: false,

    useMlock: false,

    embedding: false,

    useMmap: true,

};

llama.load(config);

const content = "how are you?";

llama.tokenize({ content, nCtx: 2048 }).then(console.log);

```

### 嵌入

```typescript

import { LLama } from "llama-node";

import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";

import path from "path";

const model = path.resolve(process.cwd(), "./ggml-vic7b-q5_1.bin");

const llama = new LLama(LLamaCpp);

const config: LoadConfig = {

    path: model,

    enableLogging: true,

    nCtx: 1024,

    nParts: -1,

    seed: 0,

    f16Kv: false,

    logitsAll: false,

    vocabOnly: false,

    useMlock: false,

    embedding: true,

    useMmap: true,

};

llama.load(config);

const prompt = `Who is the president of the United States?`;

const params = {

    nThreads: 4,

    nTokPredict: 2048,

    topK: 40,

    topP: 0.1,

    temp: 0.2,

    repeatPenalty: 1,

    prompt,

};

llama.getEmbedding(params).then(console.log);

```

---

## 使用（llama-rs后端）

当前版本只支持在一个LLama实例上进行单个推理会话。

如果您希望同时进行多个推理会话，则需要创建多个LLama实例。

### 推理

```typescript

import { LLama } from "llama-node";

import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";

import path from "path";

const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");

const llama = new LLama(LLamaRS);

llama.load({ path: model });

const template = `how are you`;

const prompt = `Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

${template}

### Response:`;

llama.createCompletion(

    {

        prompt,

        numPredict: 128,

        temp: 0.2,

        topP: 1,

        topK: 40,

        repeatPenalty: 1,

        repeatLastN: 64,

        seed: 0,

        feedPrompt: true,

    },

    (response) => {

        process.stdout.write(response.token);

    }

);

```

### 分词

从LLama-rs中获取分词

```typescript

import { LLama } from "llama-node";

import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";

import path from "path";

const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");

const llama = new LLama(LLamaRS);

llama.load({ path: model });

const content = "how are you?";

llama.tokenize(content).then(console.log);

```

### 嵌入

这是一份预览版本的代码，嵌入所使用的尾词在未来可能会发生变化。请勿在生产环境中使用！

```typescript

import { LLama } from "llama-node";

import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";

import path from "path";

import fs from "fs";

const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");

const llama = new LLama(LLamaRS);

llama.load({ path: model });

const getWordEmbeddings = async (prompt: string, file: string) => {

    const data = await llama.getEmbedding({

        prompt,

        numPredict: 128,

        temp: 0.2,

        topP: 1,

        topK: 40,

        repeatPenalty: 1,

        repeatLastN: 64,

        seed: 0,

    });

    console.log(prompt, data);

    await fs.promises.writeFile(

        path.resolve(process.cwd(), file),

        JSON.stringify(data)

    );

};

const run = async () => {

    const dog1 = `My favourite animal is the dog`;

    await getWordEmbeddings(dog1, "./example/semantic-compare/dog1.json");

    const dog2 = `I have just adopted a cute dog`;

    await getWordEmbeddings(dog2, "./example/semantic-compare/dog2.json");

    const cat1 = `My favourite animal is the cat`;

    await getWordEmbeddings(cat1, "./example/semantic-compare/cat1.json");

};

run();

```

---

## LangChain.js 扩展!

从v0.0.28我们增加了LangChain.js的支持！虽然准确性未经我们测试，但希望这个方式可以work！

```typescript

import { MemoryVectorStore } from "langchain/vectorstores/memory";

import { LLamaEmbeddings } from "llama-node/dist/extensions/langchain.js";

import { LLama } from "llama-node";

import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";

import path from "path";

const model = path.resolve(process.cwd(), "../ggml-vic7b-q5_1.bin");

const llama = new LLama(LLamaCpp);

const config: LoadConfig = {

    path: model,

    enableLogging: true,

    nCtx: 1024,

    nParts: -1,

    seed: 0,

    f16Kv: false,

    logitsAll: false,

    vocabOnly: false,

    useMlock: false,

    embedding: true,

    useMmap: true,

};

llama.load(config);

const run = async () => {

    // Load the docs into the vector store

    const vectorStore = await MemoryVectorStore.fromTexts(

        ["Hello world", "Bye bye", "hello nice world"],

        [{ id: 2 }, { id: 1 }, { id: 3 }],

        new LLamaEmbeddings({ maxConcurrency: 1 }, llama)

    );

    // Search for the most similar document

    const resultOne = await vectorStore.similaritySearch("hello world", 1);

    console.log(resultOne);

};

run();

```

---

## 关于性能

我们为linux-x64，win32-x64，apple-x64和apple-silicon提供预先构建的二进制文件。对于其他平台，在安装npm包之前，请安装用于自行构建的rust环境。

由于跨平台编译的复杂性，很难预先构建一个适合所有平台需求并具有最佳性能的二进制文件。

如果您遇到低性能问题，强烈建议您进行手动编译。否则，您需要等待我们提供更好的预编译绑定。我正在调研交叉构建的问题。

### 手动编译 (from node_modules)

- 先安装Rust环境

- 进入 node_modules/@llama-node/core

    ```shell

    npm run build

    ```

### 手动编译 (from source)

- 先安装Rust环境

- Clone之后在项目根目录运行

    ```shell

    npm install && npm run build

    ```

- 在 packages/core 目录运行

    ```shell

    npm run build

    ```

- 到此你可以使用根目录下dist目录中的js入口文件了

---

## 未来计划

- [ ] 提示词扩展

- [ ] 更多平台和处理器架构（在最高的性能条件下）

- [ ] 优化嵌入API，提供可以配置尾词的选项

- [ ] 命令行工具

- [ ] 更新llama-rs以支持更多模型 https://github.com/rustformers/llama-rs/pull/141

- [ ] 更多native推理后端（如rwkv）支持！

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/atome-fe/llama-node

Awesome Lists containing this project

README