Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/atome-fe/llama-node
Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.
https://github.com/atome-fe/llama-node
ai embeddings gpt langchain large-language-models llama llama-node llama-rs llamacpp llm napi napi-rs nodejs rwkv
Last synced: 3 months ago
JSON representation
Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.
- Host: GitHub
- URL: https://github.com/atome-fe/llama-node
- Owner: Atome-FE
- License: apache-2.0
- Archived: true
- Created: 2023-03-20T10:47:19.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-08-03T02:46:18.000Z (over 1 year ago)
- Last Synced: 2024-10-29T08:02:39.175Z (3 months ago)
- Topics: ai, embeddings, gpt, langchain, large-language-models, llama, llama-node, llama-rs, llamacpp, llm, napi, napi-rs, nodejs, rwkv
- Language: Rust
- Homepage: https://llama-node.vercel.app/
- Size: 30.4 MB
- Stars: 862
- Watchers: 15
- Forks: 62
- Open Issues: 47
-
Metadata Files:
- Readme: README-zh-CN.md
- Funding: .github/FUNDING.yml
- License: LICENSE-APACHE.MD
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# llama-node
Node.js运行的大语言模型LLaMA。
这个项目处于早期阶段,nodejs的API可能会在未来发生变化,请谨慎使用。
图片由Stable diffusion生成
![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/hlhr202/llama-node/llama-build.yml)
![NPM](https://img.shields.io/npm/l/llama-node)
[](https://www.npmjs.com/package/llama-node)
![npm type definitions](https://img.shields.io/npm/types/llama-node)
[](https://twitter.com/hlhr202)---
- [llama-node](#llama-node)
- [介绍](#介绍)
- [安装](#安装)
- [模型获取](#模型获取)
- [模型版本](#模型版本)
- [llama.cpp](#llamacpp)
- [llama-rs](#llama-rs)
- [使用(llama.cpp后端)](#使用llamacpp后端)
- [推理](#推理)
- [分词](#分词)
- [嵌入](#嵌入)
- [使用(llama-rs后端)](#使用llama-rs后端)
- [推理](#推理-1)
- [分词](#分词-1)
- [嵌入](#嵌入-1)
- [LangChain.js 扩展!](#langchainjs-扩展)
- [关于性能](#关于性能)
- [手动编译 (from node\_modules)](#手动编译-from-node_modules)
- [手动编译 (from source)](#手动编译-from-source)
- [未来计划](#未来计划)---
## 介绍
这是一个基于[llama-rs](https://github.com/rustformers/llama-rs)和[llm-chain-llama-sys](https://github.com/sobelio/llm-chain/tree/main/llm-chain-llama/sys)(为[llama.cpp](https://github.com/ggerganov/llama.cpp)生成的rust绑定)开发的nodejs客户端库,用于Llama(及部分周边模型) LLM。它使用[napi-rs](https://github.com/napi-rs/napi-rs)在node.js和llama线程之间传递消息。
从v0.0.21开始,同时支持llama-rs和llama.cpp后端
当前支持平台:
- darwin-x64
- darwin-arm64
- linux-x64-gnu (glibc >= 2.31)
- linux-x64-musl
- win32-x64-msvcNode.js最低版本:16
我没有硬件能够测试13B或更大的模型,但我已成功地测试了支持llama 7B模型的ggml llama和ggml alpaca。
---
## 安装
- 安装核心包
```bash
npm install llama-node
```- 安装llama-rs后端
```bash
npm install @llama-node/core
```- 安装llama.cpp后端
```bash
npm install @llama-node/llama-cpp
```---
## 模型获取
llama-node底层调用llama-rs,它使用的模型格式源自llama.cpp。由于meta发布模型仅用于研究机构测试,本项目不提供模型下载。如果你获取到了 **.pth** 原始模型,请阅读[Getting the weights](https://github.com/rustformers/llama-rs#getting-the-weights)这份文档并使用llama-rs提供的convert工具进行转化
### 模型版本
#### llama.cpp
以下是llama.cpp支持的模型类型,ggml.h源码中可找到:
```c
enum ggml_type {
// explicitly numbered values are used in llama.cpp files
GGML_TYPE_F32 = 0,
GGML_TYPE_F16 = 1,
GGML_TYPE_Q4_0 = 2,
GGML_TYPE_Q4_1 = 3,
GGML_TYPE_Q4_2 = 4,
GGML_TYPE_Q4_3 = 5,
GGML_TYPE_Q8_0 = 6,
GGML_TYPE_I8,
GGML_TYPE_I16,
GGML_TYPE_I32,
GGML_TYPE_COUNT,
};
```#### llama-rs
以下是llama-rs支持的模型类型,从llama-rs的ggml绑定中可找到:
```rust
pub enum Type {
/// Quantized 4-bit (type 0).
#[default]
Q4_0,
/// Quantized 4-bit (type 1); used by GPTQ.
Q4_1,
/// Integer 32-bit.
I32,
/// Float 16-bit.
F16,
/// Float 32-bit.
F32,
}
```llama-rs也支持旧版的ggml/ggmf模型
---
## 使用(llama.cpp后端)
当前版本只支持在一个LLama实例上进行单个推理会话。
如果您希望同时进行多个推理会话,则需要创建多个LLama实例。
### 推理
```typescript
import { LLama } from "llama-node";
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";const model = path.resolve(process.cwd(), "./ggml-vic7b-q5_1.bin");
const llama = new LLama(LLamaCpp);
const config: LoadConfig = {
path: model,
enableLogging: true,
nCtx: 1024,
nParts: -1,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: false,
useMmap: true,
};llama.load(config);
const template = `How are you`;
const prompt = `### Human:
${template}
### Assistant:`;
llama.createCompletion(
{
nThreads: 4,
nTokPredict: 2048,
topK: 40,
topP: 0.1,
temp: 0.2,
repeatPenalty: 1,
stopSequence: "### Human",
prompt,
},
(response) => {
process.stdout.write(response.token);
}
);```
### 分词
```typescript
import { LLama } from "llama-node";
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";const model = path.resolve(process.cwd(), "./ggml-vic7b-q5_1.bin");
const llama = new LLama(LLamaCpp);
const config: LoadConfig = {
path: model,
enableLogging: true,
nCtx: 1024,
nParts: -1,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: false,
useMmap: true,
};llama.load(config);
const content = "how are you?";
llama.tokenize({ content, nCtx: 2048 }).then(console.log);
```
### 嵌入
```typescript
import { LLama } from "llama-node";
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";const model = path.resolve(process.cwd(), "./ggml-vic7b-q5_1.bin");
const llama = new LLama(LLamaCpp);
const config: LoadConfig = {
path: model,
enableLogging: true,
nCtx: 1024,
nParts: -1,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: true,
useMmap: true,
};llama.load(config);
const prompt = `Who is the president of the United States?`;
const params = {
nThreads: 4,
nTokPredict: 2048,
topK: 40,
topP: 0.1,
temp: 0.2,
repeatPenalty: 1,
prompt,
};llama.getEmbedding(params).then(console.log);
```
---
## 使用(llama-rs后端)
当前版本只支持在一个LLama实例上进行单个推理会话。
如果您希望同时进行多个推理会话,则需要创建多个LLama实例。
### 推理
```typescript
import { LLama } from "llama-node";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
import path from "path";const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");
const llama = new LLama(LLamaRS);
llama.load({ path: model });
const template = `how are you`;
const prompt = `Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
${template}
### Response:`;
llama.createCompletion(
{
prompt,
numPredict: 128,
temp: 0.2,
topP: 1,
topK: 40,
repeatPenalty: 1,
repeatLastN: 64,
seed: 0,
feedPrompt: true,
},
(response) => {
process.stdout.write(response.token);
}
);
```### 分词
从LLama-rs中获取分词
```typescript
import { LLama } from "llama-node";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
import path from "path";const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");
const llama = new LLama(LLamaRS);
llama.load({ path: model });
const content = "how are you?";
llama.tokenize(content).then(console.log);
```### 嵌入
这是一份预览版本的代码,嵌入所使用的尾词在未来可能会发生变化。请勿在生产环境中使用!
```typescript
import { LLama } from "llama-node";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
import path from "path";
import fs from "fs";const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");
const llama = new LLama(LLamaRS);
llama.load({ path: model });
const getWordEmbeddings = async (prompt: string, file: string) => {
const data = await llama.getEmbedding({
prompt,
numPredict: 128,
temp: 0.2,
topP: 1,
topK: 40,
repeatPenalty: 1,
repeatLastN: 64,
seed: 0,
});console.log(prompt, data);
await fs.promises.writeFile(
path.resolve(process.cwd(), file),
JSON.stringify(data)
);
};const run = async () => {
const dog1 = `My favourite animal is the dog`;
await getWordEmbeddings(dog1, "./example/semantic-compare/dog1.json");const dog2 = `I have just adopted a cute dog`;
await getWordEmbeddings(dog2, "./example/semantic-compare/dog2.json");const cat1 = `My favourite animal is the cat`;
await getWordEmbeddings(cat1, "./example/semantic-compare/cat1.json");
};run();
```---
## LangChain.js 扩展!
从v0.0.28我们增加了LangChain.js的支持!虽然准确性未经我们测试,但希望这个方式可以work!
```typescript
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { LLamaEmbeddings } from "llama-node/dist/extensions/langchain.js";
import { LLama } from "llama-node";
import { LLamaCpp, LoadConfig } from "llama-node/dist/llm/llama-cpp.js";
import path from "path";const model = path.resolve(process.cwd(), "../ggml-vic7b-q5_1.bin");
const llama = new LLama(LLamaCpp);
const config: LoadConfig = {
path: model,
enableLogging: true,
nCtx: 1024,
nParts: -1,
seed: 0,
f16Kv: false,
logitsAll: false,
vocabOnly: false,
useMlock: false,
embedding: true,
useMmap: true,
};llama.load(config);
const run = async () => {
// Load the docs into the vector store
const vectorStore = await MemoryVectorStore.fromTexts(
["Hello world", "Bye bye", "hello nice world"],
[{ id: 2 }, { id: 1 }, { id: 3 }],
new LLamaEmbeddings({ maxConcurrency: 1 }, llama)
);// Search for the most similar document
const resultOne = await vectorStore.similaritySearch("hello world", 1);console.log(resultOne);
};run();
```
---
## 关于性能
我们为linux-x64,win32-x64,apple-x64和apple-silicon提供预先构建的二进制文件。对于其他平台,在安装npm包之前,请安装用于自行构建的rust环境。
由于跨平台编译的复杂性,很难预先构建一个适合所有平台需求并具有最佳性能的二进制文件。
如果您遇到低性能问题,强烈建议您进行手动编译。否则,您需要等待我们提供更好的预编译绑定。我正在调研交叉构建的问题。
### 手动编译 (from node_modules)
- 先安装Rust环境
- 进入 node_modules/@llama-node/core
```shell
npm run build
```### 手动编译 (from source)
- 先安装Rust环境
- Clone之后在项目根目录运行
```shell
npm install && npm run build
```- 在 packages/core 目录运行
```shell
npm run build
```- 到此你可以使用根目录下dist目录中的js入口文件了
---
## 未来计划
- [ ] 提示词扩展
- [ ] 更多平台和处理器架构(在最高的性能条件下)
- [ ] 优化嵌入API,提供可以配置尾词的选项
- [ ] 命令行工具
- [ ] 更新llama-rs以支持更多模型 https://github.com/rustformers/llama-rs/pull/141
- [ ] 更多native推理后端(如rwkv)支持!