{"id":13620646,"url":"https://github.com/Atome-FE/llama-node","last_synced_at":"2025-04-14T22:32:08.297Z","repository":{"id":143634825,"uuid":"616414170","full_name":"Atome-FE/llama-node","owner":"Atome-FE","description":"Believe in AI democratization. llama for nodejs backed by llama-rs, llama.cpp and rwkv.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna/rwkv model.","archived":true,"fork":false,"pushed_at":"2023-08-03T02:46:18.000Z","size":31857,"stargazers_count":870,"open_issues_count":47,"forks_count":64,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-04-08T09:06:21.358Z","etag":null,"topics":["ai","embeddings","gpt","langchain","large-language-models","llama","llama-node","llama-rs","llamacpp","llm","napi","napi-rs","nodejs","rwkv"],"latest_commit_sha":null,"homepage":"https://llama-node.vercel.app/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Atome-FE.png","metadata":{"files":{"readme":"README-zh-CN.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE-APACHE.MD","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":["hlhr202"],"patreon":null,"open_collective":"hlhr202","ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2023-03-20T10:47:19.000Z","updated_at":"2025-04-05T09:48:57.000Z","dependencies_parsed_at":"2023-09-23T11:14:08.544Z","dependency_job_id":null,"html_url":"https://github.com/Atome-FE/llama-node","commit_stats":{"total_commits":211,"total_committers":6,"mean_commits":"35.166666666666664","dds":"0.033175355450236976","last_synced_commit":"3ad64d0af36ea4763e87ec97b4547a1c05d39029"},"previous_names":["hlhr202/llama-node"],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Atome-FE%2Fllama-node","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Atome-FE%2Fllama-node/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Atome-FE%2Fllama-node/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Atome-FE%2Fllama-node/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Atome-FE","download_url":"https://codeload.github.com/Atome-FE/llama-node/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248971951,"owners_count":21191696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","embeddings","gpt","langchain","large-language-models","llama","llama-node","llama-rs","llamacpp","llm","napi","napi-rs","nodejs","rwkv"],"created_at":"2024-08-01T21:00:58.039Z","updated_at":"2025-04-14T22:32:05.877Z","avatar_url":"https://github.com/Atome-FE.png","language":"Rust","funding_links":["https://github.com/sponsors/hlhr202","https://opencollective.com/hlhr202"],"categories":["Rust","ai","Summary","Frameworks","Machine Learning","nodejs"],"sub_categories":[],"readme":"# llama-node\n\nNode.js运行的大语言模型LLaMA。\n\n这个项目处于早期阶段，nodejs的API可能会在未来发生变化，请谨慎使用。\n\n\u003cimg src=\"./doc/assets/llama.png\" width=\"300px\" height=\"300px\" alt=\"LLaMA generated by Stable diffusion\"/\u003e\n\n\u003csub\u003e图片由Stable diffusion生成\u003c/sub\u003e\n\n\n![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/hlhr202/llama-node/llama-build.yml)\n![NPM](https://img.shields.io/npm/l/llama-node)\n[\u003cimg alt=\"npm\" src=\"https://img.shields.io/npm/v/llama-node\"\u003e](https://www.npmjs.com/package/llama-node)\n![npm type definitions](https://img.shields.io/npm/types/llama-node)\n[\u003cimg alt=\"twitter\" src=\"https://img.shields.io/twitter/url?url=https%3A%2F%2Ftwitter.com%2Fhlhr202\"\u003e](https://twitter.com/hlhr202)\n\n---\n\n- [llama-node](#llama-node)\n  - [介绍](#介绍)\n  - [安装](#安装)\n  - [模型获取](#模型获取)\n    - [模型版本](#模型版本)\n      - [llama.cpp](#llamacpp)\n      - [llama-rs](#llama-rs)\n  - [使用（llama.cpp后端）](#使用llamacpp后端)\n    - [推理](#推理)\n    - [分词](#分词)\n    - [嵌入](#嵌入)\n  - [使用（llama-rs后端）](#使用llama-rs后端)\n    - [推理](#推理-1)\n    - [分词](#分词-1)\n    - [嵌入](#嵌入-1)\n  - [LangChain.js 扩展!](#langchainjs-扩展)\n  - [关于性能](#关于性能)\n    - [手动编译 (from node\\_modules)](#手动编译-from-node_modules)\n    - [手动编译 (from source)](#手动编译-from-source)\n  - [未来计划](#未来计划)\n\n---\n\n## 介绍\n\n这是一个基于[llama-rs](https://github.com/rustformers/llama-rs)和[llm-chain-llama-sys](https://github.com/sobelio/llm-chain/tree/main/llm-chain-llama/sys)（为[llama.cpp](https://github.com/ggerganov/llama.cpp)生成的rust绑定）开发的nodejs客户端库，用于Llama（及部分周边模型） LLM。它使用[napi-rs](https://github.com/napi-rs/napi-rs)在node.js和llama线程之间传递消息。\n\n从v0.0.21开始，同时支持llama-rs和llama.cpp后端\n\n当前支持平台:\n- darwin-x64\n- darwin-arm64\n- linux-x64-gnu (glibc \u003e= 2.31)\n- linux-x64-musl\n- win32-x64-msvc\n\nNode.js最低版本：16\n\n\n我没有硬件能够测试13B或更大的模型，但我已成功地测试了支持llama 7B模型的ggml llama和ggml alpaca。\n\n\u003c!-- Download one of the llama ggml models from the following links:\n- [llama 7B int4 (old model for llama.cpp)](https://huggingface.co/hlhr202/llama-7B-ggml-int4/blob/main/ggml-model-q4_0.bin)\n- [alpaca 7B int4](https://huggingface.co/hlhr202/alpaca-7B-ggml-int4/blob/main/ggml-alpaca-7b-q4.bin) --\u003e\n\n---\n\n## 安装\n\n- 安装核心包\n```bash\nnpm install llama-node\n```\n\n- 安装llama-rs后端\n```bash\nnpm install @llama-node/core\n```\n\n- 安装llama.cpp后端\n```bash\nnpm install @llama-node/llama-cpp\n```\n\n---\n\n## 模型获取\n\nllama-node底层调用llama-rs，它使用的模型格式源自llama.cpp。由于meta发布模型仅用于研究机构测试，本项目不提供模型下载。如果你获取到了 **.pth** 原始模型，请阅读[Getting the weights](https://github.com/rustformers/llama-rs#getting-the-weights)这份文档并使用llama-rs提供的convert工具进行转化\n\n### 模型版本\n\n#### llama.cpp\n\n以下是llama.cpp支持的模型类型，ggml.h源码中可找到：\n\n```c\nenum ggml_type {\n    // explicitly numbered values are used in llama.cpp files\n    GGML_TYPE_F32  = 0,\n    GGML_TYPE_F16  = 1,\n    GGML_TYPE_Q4_0 = 2,\n    GGML_TYPE_Q4_1 = 3,\n    GGML_TYPE_Q4_2 = 4,\n    GGML_TYPE_Q4_3 = 5,\n    GGML_TYPE_Q8_0 = 6,\n    GGML_TYPE_I8,\n    GGML_TYPE_I16,\n    GGML_TYPE_I32,\n    GGML_TYPE_COUNT,\n};\n```\n\n#### llama-rs\n\n以下是llama-rs支持的模型类型，从llama-rs的ggml绑定中可找到：\n\n```rust\npub enum Type {\n    /// Quantized 4-bit (type 0).\n    #[default]\n    Q4_0,\n    /// Quantized 4-bit (type 1); used by GPTQ.\n    Q4_1,\n    /// Integer 32-bit.\n    I32,\n    /// Float 16-bit.\n    F16,\n    /// Float 32-bit.\n    F32,\n}\n```\n\nllama-rs也支持旧版的ggml/ggmf模型\n\n---\n\n## 使用（llama.cpp后端）\n\n当前版本只支持在一个LLama实例上进行单个推理会话。\n\n如果您希望同时进行多个推理会话，则需要创建多个LLama实例。\n\n### 推理\n\n```typescript\nimport { LLama } from \"llama-node\";\nimport { LLamaCpp, LoadConfig } from \"llama-node/dist/llm/llama-cpp.js\";\nimport path from \"path\";\n\nconst model = path.resolve(process.cwd(), \"./ggml-vic7b-q5_1.bin\");\n\nconst llama = new LLama(LLamaCpp);\n\nconst config: LoadConfig = {\n    path: model,\n    enableLogging: true,\n    nCtx: 1024,\n    nParts: -1,\n    seed: 0,\n    f16Kv: false,\n    logitsAll: false,\n    vocabOnly: false,\n    useMlock: false,\n    embedding: false,\n    useMmap: true,\n};\n\nllama.load(config);\n\nconst template = `How are you`;\n\nconst prompt = `### Human:\n\n${template}\n\n### Assistant:`;\n\nllama.createCompletion(\n    {\n        nThreads: 4,\n        nTokPredict: 2048,\n        topK: 40,\n        topP: 0.1,\n        temp: 0.2,\n        repeatPenalty: 1,\n        stopSequence: \"### Human\",\n        prompt,\n    },\n    (response) =\u003e {\n        process.stdout.write(response.token);\n    }\n);\n\n```\n\n### 分词\n\n```typescript\nimport { LLama } from \"llama-node\";\nimport { LLamaCpp, LoadConfig } from \"llama-node/dist/llm/llama-cpp.js\";\nimport path from \"path\";\n\nconst model = path.resolve(process.cwd(), \"./ggml-vic7b-q5_1.bin\");\n\nconst llama = new LLama(LLamaCpp);\n\nconst config: LoadConfig = {\n    path: model,\n    enableLogging: true,\n    nCtx: 1024,\n    nParts: -1,\n    seed: 0,\n    f16Kv: false,\n    logitsAll: false,\n    vocabOnly: false,\n    useMlock: false,\n    embedding: false,\n    useMmap: true,\n};\n\nllama.load(config);\n\nconst content = \"how are you?\";\n\nllama.tokenize({ content, nCtx: 2048 }).then(console.log);\n\n```\n\n### 嵌入\n```typescript\nimport { LLama } from \"llama-node\";\nimport { LLamaCpp, LoadConfig } from \"llama-node/dist/llm/llama-cpp.js\";\nimport path from \"path\";\n\nconst model = path.resolve(process.cwd(), \"./ggml-vic7b-q5_1.bin\");\n\nconst llama = new LLama(LLamaCpp);\n\nconst config: LoadConfig = {\n    path: model,\n    enableLogging: true,\n    nCtx: 1024,\n    nParts: -1,\n    seed: 0,\n    f16Kv: false,\n    logitsAll: false,\n    vocabOnly: false,\n    useMlock: false,\n    embedding: true,\n    useMmap: true,\n};\n\nllama.load(config);\n\nconst prompt = `Who is the president of the United States?`;\n\nconst params = {\n    nThreads: 4,\n    nTokPredict: 2048,\n    topK: 40,\n    topP: 0.1,\n    temp: 0.2,\n    repeatPenalty: 1,\n    prompt,\n};\n\nllama.getEmbedding(params).then(console.log);\n\n```\n\n---\n\n## 使用（llama-rs后端）\n\n当前版本只支持在一个LLama实例上进行单个推理会话。\n\n如果您希望同时进行多个推理会话，则需要创建多个LLama实例。\n\n### 推理\n\n```typescript\nimport { LLama } from \"llama-node\";\nimport { LLamaRS } from \"llama-node/dist/llm/llama-rs.js\";\nimport path from \"path\";\n\nconst model = path.resolve(process.cwd(), \"./ggml-alpaca-7b-q4.bin\");\n\nconst llama = new LLama(LLamaRS);\n\nllama.load({ path: model });\n\nconst template = `how are you`;\n\nconst prompt = `Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\n${template}\n\n### Response:`;\n\nllama.createCompletion(\n    {\n        prompt,\n        numPredict: 128,\n        temp: 0.2,\n        topP: 1,\n        topK: 40,\n        repeatPenalty: 1,\n        repeatLastN: 64,\n        seed: 0,\n        feedPrompt: true,\n    },\n    (response) =\u003e {\n        process.stdout.write(response.token);\n    }\n);\n```\n\n### 分词\n\n从LLama-rs中获取分词\n\n```typescript\nimport { LLama } from \"llama-node\";\nimport { LLamaRS } from \"llama-node/dist/llm/llama-rs.js\";\nimport path from \"path\";\n\nconst model = path.resolve(process.cwd(), \"./ggml-alpaca-7b-q4.bin\");\n\nconst llama = new LLama(LLamaRS);\n\nllama.load({ path: model });\n\nconst content = \"how are you?\";\n\nllama.tokenize(content).then(console.log);\n```\n\n### 嵌入\n\n这是一份预览版本的代码，嵌入所使用的尾词在未来可能会发生变化。请勿在生产环境中使用！\n\n```typescript\nimport { LLama } from \"llama-node\";\nimport { LLamaRS } from \"llama-node/dist/llm/llama-rs.js\";\nimport path from \"path\";\nimport fs from \"fs\";\n\nconst model = path.resolve(process.cwd(), \"./ggml-alpaca-7b-q4.bin\");\n\nconst llama = new LLama(LLamaRS);\n\nllama.load({ path: model });\n\nconst getWordEmbeddings = async (prompt: string, file: string) =\u003e {\n    const data = await llama.getEmbedding({\n        prompt,\n        numPredict: 128,\n        temp: 0.2,\n        topP: 1,\n        topK: 40,\n        repeatPenalty: 1,\n        repeatLastN: 64,\n        seed: 0,\n    });\n\n    console.log(prompt, data);\n\n    await fs.promises.writeFile(\n        path.resolve(process.cwd(), file),\n        JSON.stringify(data)\n    );\n};\n\nconst run = async () =\u003e {\n    const dog1 = `My favourite animal is the dog`;\n    await getWordEmbeddings(dog1, \"./example/semantic-compare/dog1.json\");\n\n    const dog2 = `I have just adopted a cute dog`;\n    await getWordEmbeddings(dog2, \"./example/semantic-compare/dog2.json\");\n\n    const cat1 = `My favourite animal is the cat`;\n    await getWordEmbeddings(cat1, \"./example/semantic-compare/cat1.json\");\n};\n\nrun();\n```\n\n---\n\n## LangChain.js 扩展!\n\n从v0.0.28我们增加了LangChain.js的支持！虽然准确性未经我们测试，但希望这个方式可以work！\n\n```typescript\nimport { MemoryVectorStore } from \"langchain/vectorstores/memory\";\nimport { LLamaEmbeddings } from \"llama-node/dist/extensions/langchain.js\";\nimport { LLama } from \"llama-node\";\nimport { LLamaCpp, LoadConfig } from \"llama-node/dist/llm/llama-cpp.js\";\nimport path from \"path\";\n\nconst model = path.resolve(process.cwd(), \"../ggml-vic7b-q5_1.bin\");\n\nconst llama = new LLama(LLamaCpp);\n\nconst config: LoadConfig = {\n    path: model,\n    enableLogging: true,\n    nCtx: 1024,\n    nParts: -1,\n    seed: 0,\n    f16Kv: false,\n    logitsAll: false,\n    vocabOnly: false,\n    useMlock: false,\n    embedding: true,\n    useMmap: true,\n};\n\nllama.load(config);\n\nconst run = async () =\u003e {\n    // Load the docs into the vector store\n    const vectorStore = await MemoryVectorStore.fromTexts(\n        [\"Hello world\", \"Bye bye\", \"hello nice world\"],\n        [{ id: 2 }, { id: 1 }, { id: 3 }],\n        new LLamaEmbeddings({ maxConcurrency: 1 }, llama)\n    );\n\n    // Search for the most similar document\n    const resultOne = await vectorStore.similaritySearch(\"hello world\", 1);\n\n    console.log(resultOne);\n};\n\nrun();\n\n```\n\n---\n\n## 关于性能\n\n我们为linux-x64，win32-x64，apple-x64和apple-silicon提供预先构建的二进制文件。对于其他平台，在安装npm包之前，请安装用于自行构建的rust环境。\n\n由于跨平台编译的复杂性，很难预先构建一个适合所有平台需求并具有最佳性能的二进制文件。\n\n如果您遇到低性能问题，强烈建议您进行手动编译。否则，您需要等待我们提供更好的预编译绑定。我正在调研交叉构建的问题。\n\n\n### 手动编译 (from node_modules)\n\n- 先安装Rust环境\n\n- 进入 node_modules/@llama-node/core\n\n    ```shell\n    npm run build\n    ```\n\n### 手动编译 (from source)\n\n- 先安装Rust环境\n\n- Clone之后在项目根目录运行\n\n    ```shell\n    npm install \u0026\u0026 npm run build\n    ```\n\n- 在 packages/core 目录运行\n    ```shell\n    npm run build\n    ```\n\n- 到此你可以使用根目录下dist目录中的js入口文件了\n\n---\n\n## 未来计划\n- [ ] 提示词扩展\n- [ ] 更多平台和处理器架构（在最高的性能条件下）\n- [ ] 优化嵌入API，提供可以配置尾词的选项\n- [ ] 命令行工具\n- [ ] 更新llama-rs以支持更多模型 https://github.com/rustformers/llama-rs/pull/141\n- [ ] 更多native推理后端（如rwkv）支持！","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAtome-FE%2Fllama-node","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAtome-FE%2Fllama-node","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAtome-FE%2Fllama-node/lists"}