{"id":30577704,"url":"https://github.com/infinitensor/llaisys","last_synced_at":"2025-08-29T02:42:17.761Z","repository":{"id":308271736,"uuid":"1031749055","full_name":"InfiniTensor/llaisys","owner":"InfiniTensor","description":"Let's Learn AI SYStem","archived":false,"fork":false,"pushed_at":"2025-08-18T08:10:58.000Z","size":52,"stargazers_count":5,"open_issues_count":0,"forks_count":29,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-18T09:21:09.542Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/InfiniTensor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-04T09:18:34.000Z","updated_at":"2025-08-18T08:11:02.000Z","dependencies_parsed_at":"2025-08-05T04:19:36.577Z","dependency_job_id":"f8f65e11-504f-4f3e-bbb4-c572cde2300b","html_url":"https://github.com/InfiniTensor/llaisys","commit_stats":null,"previous_names":["infinitensor/llaisys"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/InfiniTensor/llaisys","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2Fllaisys","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2Fllaisys/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2Fllaisys/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2Fllaisys/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/InfiniTensor","download_url":"https://codeload.github.com/InfiniTensor/llaisys/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/InfiniTensor%2Fllaisys/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272612283,"owners_count":24964391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-29T02:00:10.610Z","response_time":87,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-29T02:42:08.908Z","updated_at":"2025-08-29T02:42:17.713Z","avatar_url":"https://github.com/InfiniTensor.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Welcome to LLAISYS\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"README.md\" target=\"README.md\"\u003eEnglish\u003c/a\u003e ｜\n\u003ca href=\"README_ZN.md\" target=\"README_ZN.md\"\u003e中文\u003c/a\u003e\n\u003c/p\u003e\n\n## Introduction\n\nLLAISYS (Let's Learn AI SYStem) is an educational project that aims to provide a platform for new and future AI engineers to learn how to build AI systems from scratch. LLAISYS consists of several assignments, which help students learn and build the basic modules, and projects that challenge them to add more fancy features to their systems. LLAISYS uses C++ as primary programming language for system backend, and is compiled into shared libraries exposing C language APIs. Frontend codes are written in Python which calls these APIs to provide more convenient testing and interaction with other architectures such as PyTorch.\n\n### Project Structure Overview\n\n- `\\include`: directory that contains of the header files which defines all the C APIs exposed by the shared library. (Functions declarations start with `__export`)\n\n- `\\src`: C++ source files.\n  - `\\src\\llaisys` contains all the direct implementation of waht are defined in the header files and follows the same directory structure as the `\\include`. This is also as far as C++ codes can go.\n  - other directories contain the actual implementaion of different modules.\n\n- `xmake.lua`: build rules for llaisys backend. `\\xmake` directory contains the sub-xmake files for different devices. You may add `nvidia.lua` in the directory in the future for instance to support CUDA.\n\n- `\\python`: Python source files.\n  - `\\python\\llaisys\\libllaisys` contains all the ctypes wrapper functions of llaisys APIs. It basically matches the structure of C header files.\n  - `\\python\\llaisys` contains Python warppers of the ctypes functions to make the package more Python-like.\n\n- `\\test`: Python test files that import llaisys python package.\n\n## Assignment #0: Getting Started\n\n### Task-0.1 Install Prerequisites\n\n- Compile Tool: [Xmake](https://xmake.io/)\n- C++ Compiler: MSVC (Windows) or Clang or GCC\n- Python \u003e= 3.9 (PyTorch, Transformers, etc.)\n- Clang-Format-16 (Optional): for formatting C++ codes.\n\n### Task-0.2 Fork and Build LLAISYS\n\n- FORK LLAISYS Repository and Clone it to your local machine. Both Windows and Linux are supported.\n\n- Compile and Install\n\n  ```bash\n  # compile c++ codes\n  xmake\n  # install llaisys shared library\n  xmake install\n  # install llaisys python package\n  pip install ./python/\n  ```\n\n- Github Auto Tests\n\n  LLAISYS uses Github Actions to run automated tests on every push and pull request. You can see testing results on your repo page. All tests should pass once you have finished all assignment tasks.\n\n### Task-0.3 Run LLAISYS for the First Time\n\n- Run cpu runtime tests\n\n  ```bash\n  python test/test_runtime.py --device cpu\n  ```\n\n  You should see the test passed.\n\n### Task-0.4 Download test model\n\n- The model we use for assignments is [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).\n\n- Run an inference test with the model using PyTorch\n\n  ```bash\n  python test/test_infer.py --model [dir_path/to/model]\n  ```\n\n  You can see that PyTorch is able to load the model and perform inference with the sample input. You can debug into `transformers` library codes to see how what is going on behind. Right now, your code cannot do anything yet, but you are going to build a system that can achieve the same functionality in the assignments.\n\n## Assignment #1: Tensor\n\nTensor is a data structure that represents multi-dimensional data. It is the basic building block of LLAISYS, and most AI frameworks such as PyTorch. In this assignment, you will learn how to implement a basic tensor class.\n\nA Tensor object has the following fields:\n\n- `storage`: a shared pointer to a memory block that stores the tensor's data. It can be shared by multiple tensors. Check storage class for more details.\n- `offset`:  the starting index (in bytes) of the tensor in the storage.\n- `meta`: metadata that describes the tensor's shape, data type, and strides.\n\nImplement the following functions defined in the `src/tensor/tensor.hpp`:\n\n### Task-1.1\n\n```c++\nvoid load(const void *src);\n```\n\nLoad host (cpu) data to the tensor (can be on device). Check contructor to see how to get runtime apis of the current device context, and do a memcpy from host to device.\n\n### Task-1.2\n\n```c++\nbool isContiguous() const; \n```\n\nCheck shape and strides of the tensor, and tell wether it is contiguous in memory.\n\n### Task-1.3\n\n```c++\ntensor_t view(const std::vector\u003csize_t\u003e \u0026shape) const;\n```\n\nCreate a new tensor which reshapes the original tensor to the given shape by splitting or merging the original dimensions. No data transfer is involved. For example change a tensor of shape (2, 3, 5) to (2, 15) by merging the last two dimensions.\n\nThis function is not as easy as simply changing the shape of the tensor, although the test will pass. It should raise an error if new view is not compatible with the original tensor. Think about a tensor of shape (2, 3, 5) and strides (30, 10, 1). Can you still reshape it to (2, 15) without data transfer?\n\n### Task-1.4\n\n```c++\ntensor_t permute(const std::vector\u003csize_t\u003e \u0026order) const;\n```\n\nCreate a new tensor which changes the order of the dimensions of original tensor. Transpose can be achieved by this function without moving data around.\n\n### Task-1.5\n\n```c++\ntensor_t slice(size_t dim, size_t start, size_t end) const;\n```\n\nCreate a new tensor which slices the original tensor along the given dimension,\nstart (inclusive) and end (exclusive) indices.\n\n### Task-1.6\n\nRun tensor tests.\n\n```bash\npython test/test_tensor.py\n```\n\nYou should see all tests passed. Commit and push your changes. You should see the auto tests for assignment #1 passed.\n\n## Assignment #2: Operators\n\nIn this assignment, you will implement the cpu verision the following operators:\n\n- argmax\n- embedding\n- linear\n- rms_norm\n- rope\n- self_attention\n- swiglu\n\nRead the codes in `src/ops/add/` to see how \"add\" operator is implemented. Make sure you understand how the operator codes are organized, compiled, linked, and exposed to Python frontend. **Your operators should at least support Float32, Float16 and BFloat16 data types**. A helper function for naive type casting is provided in `src/utils/`. All python tests are in `test/ops`, you implementation should at least pass these tests. Try running the test script for \"add\" operator for starting.\n\n### Task-2.1 argmax\n\n```c++\nvoid argmax(tensor_t max_idx, tensor_t max_val, tensor_t vals);\n```\n\nGet the max value and its index of tensor `vals`, and store them in `max_val` and `max_idx` respectively. You can assume that `vals` is a 1D tensor for now, and `max_idx` and `max_val` are both 1D tensors with a single element (, which means the dimension of `vals` is kept).\n\nYou should be able to pass the test cases in `test/ops/argmax.py` after you finish the implementation.\n\n### Task-2.2 embedding\n\n```c++\nvoid embedding(tensor_t out, tensor_t index, tensor_t weight);\n```\n\nCopy the rows in `index` (1-D) from `weight` (2-D) to `output` (2-D). `index` must be of type Int64 (the default data type for int of PyTorch).\n\nYou should be able to pass the test cases in `test/ops/embedding.py` after you finish the implementation.\n\n### Task-2.3 linear\n\n```c++\nvoid linear(tensor_t out, tensor_t in, tensor_t weight, tensor_t bias);\n```\n\nCompute the following:\n\n$$\nY = xW^T + b\n$$\n\n- `out`: output $Y$ . You can assume output is a 2D contiguous tensor  and no broadcasting is involved for now.\n- `input`: input $X$ . You can assume input is a 2D contiguous tensor  and no broadcasting is involved for now.\n- `weight`: weight $W$ . 2D contiguous tensor. Note that weight tensor is not transposed. You need to deal with this during your calculation.\n- `bias` (optional): bias $b$ . 1D tensor. You need to support the situation where bias is not provided.\n\nYou should be able to pass the test cases in `test/ops/linear.py` after you finish the implementation.\n\n### Task-2.4 rms normalization\n\n```c++\nvoid rms_norm(tensor_t out, tensor_t in, tensor_t weight, float eps);\n```\n\nCompute the following for each row:\n\n$$\nY_i = \\frac{W_i \\times  X_i}{\\sqrt{\\frac{1}{d}(\\sum_{j=1}^d X_j^2) + \\epsilon}}\n$$\n\n- `out`: output $Y$ . You can assume output is a 2D contiguous tensor and no broadcasting is involved for now.\n- `input`: input $X$ . You can assume input is a 2D contiguous tensor and no broadcasting is involved for now. The normalization is performed along the last dimension (a.k.a. each row of length $d$ ) of the input tensor.\n- `weight`: weight $W$ . 1D tensor, same length as a row of input tensor.\n- `eps`: small value $\\epsilon$ to avoid division by zero.\n\nYou should be able to pass the test cases in `test/ops/rms_norm.py` after you finish the implementation.\n\n### Task-2.5 rope\n\n```c++\nvoid rope(tensor_t out, tensor_t in, tensor_t pos_ids, float theta);\n```\n\nCompute the following for each vector of input tensor `in`, corresponding to a position id in `pos_ids`:\n\nLet $\\mathbf{x}_i = [\\mathbf{a}_i, \\mathbf{b}_i] \\in \\mathbb{R}^d$ be the input vector and $\\mathbf{y}_i = [\\mathbf{a}'_i, \\mathbf{b}'_i] \\in \\mathbb{R}^d$ be the output vector at index $i$, where $\\mathbf{a}_i, \\mathbf{b}_i,\\mathbf{a}'_i, \\mathbf{b}'_i \\in \\mathbb{R}^{d/2}$ .\n\nLet $\\theta$ be a fixed base (e.g. $\\theta = 10000$) and $j = 0, 1, \\ldots, d/2 - 1$.\n\nLet $p_i \\in \\mathbb{N}$ is the position id for token at input index i.\n\nThen the angle for RoPE is $\\phi_{i,j} = \\frac{p_i}{\\theta^{2j/d}}$\n\nThe output vector $\\mathbf{y}_i = [\\mathbf{a}'_i, \\mathbf{b}'_i]$ is computed as follows:\n\n$$a_{i,j}' = a_{i,j} \\cos(\\phi_{i,j}) - b_{i,j} \\sin(\\phi_{i,j})$$\n\n$$b_{i,j}' = b_{i,j} \\cos(\\phi_{i,j}) + a_{i,j} \\sin(\\phi_{i,j})$$\n\n- `out`: the resulting **q** or **k** tensor. Shape should be [seqlen, nhead, d] or [seqlen, nkvhead, d]. You can assume that the tensor is contiguous for now.\n- `in`: the orignal **q** or **k** tensor. Shape should be [seqlen, nhead, d] or [seqlen, nkvhead, d]. You can assume that the tensor is contiguous for now.\n- `pos_ids`: the position id (index in the whole context) for each token in the input sequence. Shape should be [seqlen,], dtype should be int64.\n- `theta`: the base value for the frequency vector.\n\nYou should be able to pass the test cases in `test/ops/rope.py` after you finish the implementation.\n\n### Task-2.6 self-attention\n\n```c++\nvoid self_attention(tensor_t attn_val, tensor_t q, tensor_t k, tensor_t v, float scale);\n```\n\nCompute the self-attention for query tensor `q`, key tensor `k`, and value tensor `v`. You should concat kvcache tensors, if needed, before doing this calculation.\n\n$$\nA = Q K^\\top * scale \\\\\n$$\n\n$$\nY = \\mathrm{causalsoftmax}(A) \\cdot V \\\\\n$$\n\n- `attn_val`: the resulting attention value tensor. Shape should be [seqlen, nhead, dv]. You can assume that the tensor is contiguous for now.\n- `q`: the query tensor. Shape should be [seqlen, nhead, d]. You can assume that the tensor is contiguous for now.\n- `k`: the key tensor. Shape should be [total_len, nkvhead, d]. You can assume that the tensor is contiguous for now.\n- `v`: the value tensor. Shape should be [total_len, nkvhead, dv]. You can assume that the tensor is contiguous for now.\n- `scale`: a scaling factor. It is set to $\\frac{1}{\\sqrt{d}}$ in most cases.\n\nYou should be able to pass the test cases in `test/ops/self_attention.py` after you finish the implementation.\n\n### Task-2.7 swiglu\n\n```c++\nvoid swiglu(tensor_t out, tensor_t gate, tensor_t up);\n```\n\nThis is an element-wise function that computes the following:\n\n$$\nout_{i} = up_{i} \\circ \\frac { gate_{i}}{1 + e^{-gate_{i}}}\n$$\n\n`out`, `up` and `gate` are 2D contiguous tensors with the same shape [seqlen, intermediate_size].\n\nYou should be able to pass the test cases in `test/ops/swiglu.py` after you finish the implementation.\n\n### Task-2.8\n\nRun operator tests.\n\n```bash\npython test/test_ops.py\n```\n\nYou should see all tests passed. Commit and push your changes. You should see the auto tests for assignment #2 passed.\n\n### Task-2.9 (Optional) rearrange\n\nThis is a bonus task. You may or may not need it for model inference.\n\n```c++\nvoid rearrange(tensor_t out, tensor_t in);\n```\n\nThis operator is used to copy data from a tensor to another tensor with the same shape but different strides. With this, you can easily implement `contiguous` functionality for tensors.\n\n## Assignment #3: Large Language Model Inference\n\nFinally, it is the time for you to achieve text generation with LLAISYS.\n\n- In `test/test_infer.py`, your implementation should be able to generate the same texts as PyTorch, using argmax sampling. The model we use for this assignment is [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).\n\n- The python wrapper of your implementation is in `python/llaisys/models/qwen2.py`. You are NOT allowed to implement your model infer logic here using any python based frameworks, such as PyTorch. Instead, you need to implement the model with C/C++ in LLAISYS backend. The script loads each tensor in the safetensors file, and you will need to load data from them into your model backend.\n\n- In `include/llaisys/models/qwen2.h`, a prototype is defined for you. Feel free to modify the codes as you want, but you should at least provide basic APIs for model creation, destruction, data loading, and infer. Implement your C APIs in `src/llaisys/` and organize your C++ codes as other modules in `src/`. Remember to define the compiling procedures in `xmake.lua`.\n\n- In `python/llaisys/libllaisys/`, define the ctypes wrapper functions for your C APIs. Implement `python/llaisys/models/qwen2.py` with your wrapper functions.\n\n- Debug until your model works. Take advantage of tensor's `debug` function which prints the tensor data. It allows you to compare the data of any tensor during the model inference with PyTorch.\n\nAfter you finish the implementation, you can run the following command to test your model:\n\n```bash\npython test/test_infer.py --model [dir_path/to/model] --test\n```\n\nCommit and push your changes. You should see the auto tests for assignment #3 passed.\n\n## Project #1: Build an AI chatbot\n\ncoming soon...\n\n## Project #2: Intigrate CUDA into LLAISYS\n\ncoming soon...\n\n## Project #3: Serving Multiple Users\n\ncoming soon...\n\n## Bonus Project: Optimize Your System\n\ncoming soon...\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitensor%2Fllaisys","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finfinitensor%2Fllaisys","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finfinitensor%2Fllaisys/lists"}