{"id":18688225,"url":"https://github.com/flagopen/flaggems","last_synced_at":"2025-05-15T11:09:06.648Z","repository":{"id":242363328,"uuid":"775232941","full_name":"FlagOpen/FlagGems","owner":"FlagOpen","description":"FlagGems is an operator library for large language models implemented in the Triton Language.","archived":false,"fork":false,"pushed_at":"2025-05-14T03:53:36.000Z","size":7080,"stargazers_count":531,"open_issues_count":73,"forks_count":92,"subscribers_count":20,"default_branch":"master","last_synced_at":"2025-05-14T04:04:22.235Z","etag":null,"topics":["pytorch","triton","triton-kernels"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FlagOpen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-03-21T02:04:53.000Z","updated_at":"2025-05-13T20:03:10.000Z","dependencies_parsed_at":"2024-07-28T07:31:43.714Z","dependency_job_id":"dbf65953-6b2f-42db-af95-45d6e21f0687","html_url":"https://github.com/FlagOpen/FlagGems","commit_stats":{"total_commits":297,"total_committers":34,"mean_commits":8.735294117647058,"dds":0.7946127946127945,"last_synced_commit":"c126ba8f1ecf391704be8e5408315d92d70147f7"},"previous_names":["flagopen/flaggems"],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlagOpen%2FFlagGems","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlagOpen%2FFlagGems/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlagOpen%2FFlagGems/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlagOpen%2FFlagGems/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FlagOpen","download_url":"https://codeload.github.com/FlagOpen/FlagGems/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254328386,"owners_count":22052632,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pytorch","triton","triton-kernels"],"created_at":"2024-11-07T10:35:59.087Z","updated_at":"2025-05-15T11:09:06.642Z","avatar_url":"https://github.com/FlagOpen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[中文版](./README_cn.md)\n\n![img_v3_02gp_8115f603-cc89-4e96-ae9d-f01b4fef796g](https://github.com/user-attachments/assets/97950fc6-62bb-4b6a-b8d5-5751c14492fa)\n\n## Introduction\n\nFlagGems is a high-performance general operator library implemented in [OpenAI Triton](https://github.com/openai/triton). It aims to provide a suite of kernel functions to accelerate LLM training and inference.\n\nBy registering with the ATen backend of PyTorch, FlagGems facilitates a seamless transition, allowing users to switch to the Triton function library without the need to modify their model code. Users can still utilize the ATen backend as usual while experiencing significant performance enhancement. The Triton language offers benefits in readability, user-friendliness and performance comparable to CUDA. This convenience allows developers to engage in the development of FlagGems with minimal learning investment.\n\nWe created WeChat group for FlagGems. Scan the QR code to join the group chat! To get the first hand message about our updates and new release, or having any questions or ideas, join us now!\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"https://github.com/user-attachments/assets/69019a23-0550-44b1-ac42-e73f06cb55d6\" alt=\"bge_wechat_group\" class=\"center\" width=\"200\"\u003e\n\u003c/p\u003e\n\n## Features\n\n### Multi-Backend Hardware Support\nFlagGems supports a wide range of hardware platforms and has been extensively tested across different hardware configurations.\n\n### Automatic Codegen\nFlagGems provides an automatic code generation mechanism that enables developers to easily generate both pointwise and fused operators.\nThe auto-generation system supports a variety of needs, including standard element-wise computations, non-tensor parameters, and specifying output types.\nFor more details, please refer to pointwise_dynamic(docs/pointwise_dynamic.md).\n\n### LibEntry\nFlagGems introduces `LibEntry`, which independently manages the kernel cache and bypasses the runtime of `Autotuner`, `Heuristics`, and `JitFunction`. To use it, simply decorate the Triton kernel with LibEntry.\n\n`LibEntry` also supports direct wrapping of `Autotuner`, `Heuristics`, and `JitFunction`, preserving full tuning functionality. However, it avoids nested runtime type invocations, eliminating redundant parameter processing. This means no need for binding or type wrapping, resulting in a simplified cache key format and reduced unnecessary key computation.\n\n### C++ Runtime\nFlagGems can be installed either as a pure Python package or as a package with C++ extensions. The C++ runtime is designed to address the overhead of the Python runtime and improve end-to-end performance.\nFor more details, please refer to [c++ extensions](docs/build_flaggems_with_c_extensions.md).\n\n## Changelog\n\n### v1.0\n- support BLAS operators: addmm, bmm, mm\n- support pointwise operators: abs, add, div, dropout, exp, gelu, mul, pow, reciprocal, relu, rsqrt, silu, sub, triu\n- support reduction operators: cumsum, layernorm, mean, softmax\n\n### v2.0\n- support BLAS operators: mv, outer\n- support pointwise operators: bitwise_and, bitwise_not, bitwise_or, cos, clamp, eq, ge, gt, isinf, isnan, le, lt, ne, neg, or, sin, tanh, sigmoid\n- support reduction operators: all, any, amax, argmax, max, min, prod, sum, var_mean, vector_norm, cross_entropy_loss, group_norm, log_softmax, rms_norm\n- support fused operators: skip_rms_norm, skip_layer_norm, gelu_and_mul, silu_and_mul, apply_rotary_position_embedding\n\n### v2.1\n- support Tensor operators: where, arange, repeat, masked_fill, tile, unique, index_select, masked_select, ones, ones_like, zeros, zeros_like, full, full_like, flip, pad\n- support neural network operator: embedding\n- support basic math operators: allclose, isclose, isfinite, floor_divide, trunc_divide, maximum, minimum\n- support distribution operators: normal, uniform_, exponential_, multinomial, nonzero, topk, rand, randn, rand_like, randn_like\n- support science operators: erf, resolve_conj, resolve_neg\n\n## Get Start\n\nFor a quick start with installing and using flag_gems, please refer to the documentation [GetStart](docs/get_start_with_flaggems.md).\n\n## Supported Operators\n\nOperators will be implemented according to [OperatorList](docs/operator_list.md).\n\n## Supported Models\n\n- Bert-base-uncased\n- Llama-2-7b\n- Llava-1.5-7b\n\n## Supported Platforms\n\n| Platform | float16 | float32 | bfloat16 |\n| :---: | :---: | :---: | :---: |\n| Nvidia GPU | ✓ | ✓ | ✓ |\n\n## Performance\n\nThe following chart shows the speedup of FlagGems compared with PyTorch ATen library in eager mode. The speedup is calculated by averaging the speedup on each shape, representing the overall performance of the operator.\n\n![Operator Speedup](./docs/assets/speedup-20250423.png)\n\n## Contributions\n\nIf you are interested in contributing to the FlagGems project, please refer to [CONTRIBUTING.md](./CONTRIBUTING.md). Any contributions would be highly appreciated.\n\n## Citation\n\nIf you find our work useful, please consider citing our project:\n\n```bibtex\n@misc{flaggems2024,\n    title={FlagOpen/FlagGems: FlagGems is an operator library for large language models implemented in the Triton language.},\n    url={https://github.com/FlagOpen/FlagGems},\n    journal={GitHub},\n    author={BAAI FlagOpen team},\n    year={2024}\n}\n```\n\n## Contact us\n\nIf you have any questions about our project, please submit an issue, or contact us through \u003ca href=\"mailto:flaggems@baai.ac.cn\"\u003eflaggems@baai.ac.cn\u003c/a\u003e.\n\n## License\n\nThe FlagGems project is based on [Apache 2.0](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflagopen%2Fflaggems","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflagopen%2Fflaggems","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflagopen%2Fflaggems/lists"}