{"id":24765940,"url":"https://github.com/infinitensor/infinilm","last_synced_at":"2026-04-02T13:49:37.374Z","repository":{"id":222048675,"uuid":"754533466","full_name":"InfiniTensor/InfiniLM","owner":"InfiniTensor","description":null,"archived":false,"fork":false,"pushed_at":"2024-05-20T05:50:16.000Z","size":799,"stargazers_count":19,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-05-20T06:43:37.148Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InfiniTensor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-08T09:03:38.000Z","updated_at":"2024-05-20T06:44:06.511Z","dependencies_parsed_at":"2024-05-20T06:43:53.990Z","dependency_job_id":"14371871-3dbd-460b-9e77-617c9720ef76","html_url":"https://github.com/InfiniTensor/InfiniLM","commit_stats":null,"previous_names":["ydrmaster/transformer","infinitensor/transformer-rs","infinitensor/infinilm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2FInfiniLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2FInfiniLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2FInfiniLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2FInfiniLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InfiniTensor","download_url":"https://codeload.github.com/InfiniTensor/InfiniLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236102622,"owners_count":19095208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-28T23:17:48.796Z","updated_at":"2026-04-02T13:49:37.364Z","avatar_url":"https://github.com/InfiniTensor.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# InfiniLM\n\n本项目是基于 [`InfiniCore`](https://github.com/InfiniTensor/InfiniCore) 的推理引擎。\n\n## 使用方式\n\n- 编译并安装 `InfiniCore` 。注意根据提示设置好 `INFINI_ROOT` 环境变量（默认为 `$HOME/.infini`）。\n\n- 编译并安装 `InfiniLM`\n\n```bash\nxmake \u0026\u0026 xmake install\n```\n\n- 运行模型推理测试\n\n```bash\npython scripts/jiuge.py [--cpu | --nvidia | --qy | --cambricon | --ascend | --metax | --moore | --iluvatar | --kunlun | --hygon | --ali] path/to/model_dir [n_device]\n```\n\n- 部署模型推理服务\n\n```bash\npython scripts/launch_server.py --model-path MODEL_PATH [-h] [--dev {cpu,nvidia,qy, cambricon,ascend,metax,moore,iluvatar,kunlun,hygon}] [--ndev NDEV] [--max-batch MAX_BATCH] [--max-tokens MAX_TOKENS]\n```\n\n- 测试模型推理服务性能\n\n```bash\npython scripts/test_perf.py\n```\n\n- 使用推理服务测试模型困惑度（Perplexity）\n\n```bash\npython scripts/test_ppl.py --model-path MODEL_PATH [--ndev NDEV] [--max-batch MAX_BATCH] [--max-tokens MAX_TOKENS]\n```\n\n## 使用方式(新版)\n#### 一、编译并安装 `InfiniCore`\n编译并安装 `InfiniCore`， 详情见 InfiniCore的 [`README`](https://github.com/InfiniTensor/InfiniCore) :\n\n- 注意根据提示设置好 `INFINI_ROOT` 环境变量（默认为 `$HOME/.infini`）\n- 根据硬件平台，选择 xmake 构建配置\n- 编译安装InfiniCore\n- 安装 C++ 库\n- 安装 Python 包\n\n\n#### 二、编译并安装  `InfiniLM`\n  - 克隆项目\n\n    由于仓库中含有子模块，所以在克隆时请添加 `--recursive` 或 `--recurse-submodules`，如：\n\n    ```shell\n    git clone --recursive https://github.com/InfiniTensor/InfiniLM.git\n    ```\n\n    或者在普通克隆后进行更新：\n\n    ```shell\n    git submodule update --init --recursive\n    ```\n\n\n  - 选择是否使用kv caching，默认为false；在支持了此算子的平台(英伟达、阿里、天数、沐曦、海光、QY)可以使用\n    ```bash\n      xmake f --use-kv-caching= [true | false] -cv\n    ```\n\n\n  - 安装 InfiniLM Python 包\n    ```bash\n      pip install -e .\n    ```\n\n  - 单次推理测试\n    - llama示例\n    ```bash\n    python examples/jiuge.py [--cpu | --nvidia | --qy | --metax | --moore | --iluvatar | --ali | --cambricon | --hygon] --model_path=\u003cpath/to/model_dir\u003e\n    ```\n    - 例如：\n    ```bash\n    python examples/jigue.py --nvidia --model_path=/models/TinyLlama-1.1B-Chat-v1.0\n    ```\n  - 分布式推理测试\n      - 9g示例\n      ```bash\n    python examples/jiuge.py [---nvidia] --model_path=\u003cpath/to/model_dir\u003e --backend=cpp --tp=NDEV --batch_size=MAX_BATCH\n    ```\n\n    - 例如： 9G7B模型，cpp后端，batch_size为16，4卡分布式\n    ```bash\n    python examples/jiuge.py --nvidia --model_path=/models/9G7B_MHA/ --backend=cpp --tp=4 --batch_size=16\n    ```\n\n\n  - 推理服务测试\n    - 启动推理服务\n      ```bash\n      python python/infinilm/server/inference_server.py [--cpu | --nvidia | --metax | --moore | --iluvatar | --cambricon] --model_path=\u003cpath/to/model_dir\u003e --max_tokens=MAX_TOKENS --max_batch_size=MAX_BATCH --tp=NDEV --temperature=TEMP --top_p=TOP_P --top_k=TOP_K --host=HOST --port=PORT\n      ```\n    \n    - 单卡示例：\n      ```bash\n      CUDA_VISIBLE_DEVICES=0 python python/infinilm/server/inference_server.py --nvidia --model_path=/models/9G7B_MHA/ --max_tokens=100 --max_batch_size=32 --tp=1 --temperature=1.0 --top_p=0.8 --top_k=1\n      ```\n    \n    - 多卡分布式示例：\n      ```bash\n      CUDA_VISIBLE_DEVICES=0,1,2,3 python python/infinilm/server/inference_server.py --nvidia --model_path=/models/9G7B_MHA/ --max_tokens=100 --max_batch_size=32 --tp=4 --temperature=1.0 --top_p=0.8 --top_k=1\n      ```\n    \n    - 测试推理服务性能：\n      ```bash\n      python scripts/test_perf.py --verbose\n      ```\n\n  - 运行推理基准测试（C-Eval/MMLU）\n\n    ```bash\n    python test/bench/test_benchmark.py [--cpu | --nvidia | --cambricon | --ascend | --metax | --moore | --iluvatar | --kunlun | --hygon | --ali] \u003cpath/to/model_dir\u003e --bench {ceval|mmlu} [--backend cpp] [--ndev N] [--subject SUBJECT] [--num_samples N] [--max_new_tokens N] [--output_csv PATH] [--cache_dir PATH]\n    ```\n\n    - 参数说明：\n      - `--subject`: 指定科目，支持单个科目、多个科目（逗号分隔）或 `all`（默认值，加载全部科目）\n      - `--output_csv`: 可选，指定CSV输出文件路径。如未指定则不生成CSV文件。CSV包含每个科目的结果和总体结果\n      - `--cache_dir`: 可选，指定数据集缓存目录的父目录。应指向包含 `ceval___ceval-exam` 和 `cais___mmlu` 等数据集子目录的父目录（例如 `~/.cache/huggingface/datasets/`）。设置后脚本优先使用本地 CSV（`pandas.read_csv`）离线加载数据，避免 `load_dataset` 的网络请求\n\n    - C-Eval示例：\n      - 单个科目：\n        ```bash\n        python test/bench/test_benchmark.py --nvidia /models/9G7B_MHA --bench ceval --subject middle_school_mathematics --num_samples 100 --backend cpp --ndev 1\n        ```\n      - 多个科目（逗号分隔）：\n        ```bash\n        python test/bench/test_benchmark.py --nvidia /models/9G7B_MHA --bench ceval --subject middle_school_mathematics,high_school_physics --backend cpp --ndev 1 --output_csv results.csv\n        ```\n      - 全部科目并输出CSV：\n        ```bash\n        python test/bench/test_benchmark.py --nvidia /models/9G7B_MHA --bench ceval --subject all --backend cpp --ndev 1 --output_csv results.csv\n        ```\n      - 使用缓存目录加速加载：\n        ```bash\n        python test/bench/test_benchmark.py --nvidia /models/9G7B_MHA --bench ceval --subject middle_school_mathematics --backend cpp --ndev 1 --cache_dir ~/.cache/huggingface/datasets/\n        ```\n        \u003e 注意：`--cache_dir` 应指向包含 `ceval___ceval-exam` 和 `cais___mmlu` 等数据集子目录的父目录，而不是直接指向这些子目录\n\n    - MMLU示例：\n      - 单个科目：\n        ```bash\n        python test/bench/test_benchmark.py --nvidia /models/9G7B_MHA --bench mmlu --subject abstract_algebra --backend cpp --ndev 1\n        ```\n      - 多个科目（逗号分隔）：\n        ```bash\n        python test/bench/test_benchmark.py --nvidia /models/9G7B_MHA --bench mmlu --subject abstract_algebra,anatomy,astronomy --backend cpp --ndev 1 --output_csv results.csv\n        ```\n      - 使用缓存目录加速加载：\n        ```bash\n        python test/bench/test_benchmark.py --nvidia /models/9G7B_MHA --bench mmlu --subject abstract_algebra --backend cpp --ndev 1 --cache_dir ~/.cache/huggingface/datasets/\n        ```\n        \u003e 注意：`--cache_dir` 应指向包含 `ceval___ceval-exam` 和 `cais___mmlu` 等数据集子目录的父目录，而不是直接指向这些子目录\n\n  - 试验中功能\n    - Warm Up\n      ```bash\n      python examples/bench.py --nvidia --model=\u003cmodel-path\u003e --warmup\n      ```\n    - Paged Attention\n      ```bash\n      python examples/bench.py --nvidia --model=\u003cmodel-path\u003e --enable-paged-attn\n      ```\n    - CUDA Graph\n      ```bash\n      python examples/bench.py --nvidia --model=\u003cmodel-path\u003e --enable-paged-attn --enable-graph\n      ```\n    - 选择attention后端 (使用flash attention后端需要先在InfiniCore完成相关配置和编译)\n      ```bash\n      python examples/bench.py --nvidia --model=\u003cmodel-path\u003e --enable-paged-attn [--attn=default | --attn=flash-attn]\n      ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitensor%2Finfinilm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfinitensor%2Finfinilm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitensor%2Finfinilm/lists"}