Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with flash-attention
A curated list of projects in awesome lists tagged with flash-attention .
https://github.com/QwenLM/Qwen-7B
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
chinese flash-attention large-language-models llm natural-language-processing pretrained-models
Last synced: 14 Dec 2024
https://github.com/qwenlm/qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
chinese flash-attention large-language-models llm natural-language-processing pretrained-models
Last synced: 16 Dec 2024
https://github.com/QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
chinese flash-attention large-language-models llm natural-language-processing pretrained-models
Last synced: 27 Oct 2024
https://github.com/ymcui/chinese-llama-alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn
Last synced: 17 Dec 2024
https://github.com/ymcui/Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn
Last synced: 29 Oct 2024
https://github.com/internlm/internlm
Official release of InternLM2 7B and 20B base and chat models. 200K context support
chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf
Last synced: 17 Dec 2024
https://github.com/InternLM/InternLM
Official release of InternLM2 7B and 20B base and chat models. 200K context support
chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf
Last synced: 27 Oct 2024
https://github.com/flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
cuda flash-attention gpu jit large-large-models llm-inference pytorch
Last synced: 19 Dec 2024
https://github.com/DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
block-reduce cuda cuda-programming elementwise flash-attention flash-attention-2 flash-attention-3 gemm gemv layernorm pytorch rmsnorm softmax triton warp-reduce
Last synced: 27 Oct 2024
https://github.com/internlm/internevo
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
910b deepspeed-ulysses flash-attention gemma internlm internlm2 llama3 llava llm-framework llm-training multi-modal pipeline-parallelism pytorch ring-attention sequence-parallelism tensor-parallelism transformers-models zero3
Last synced: 14 Dec 2024
https://github.com/InternLM/InternEvo
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
910b deepspeed-ulysses flash-attention gemma internlm internlm2 llama3 llava llm-framework llm-training multi-modal pipeline-parallelism pytorch ring-attention sequence-parallelism tensor-parallelism transformers-models zero3
Last synced: 30 Oct 2024
https://github.com/DAMO-NLP-SG/Inf-CLIP
💣💣 The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
clip contrastive-learning flash-attention infinite-batch-size memory-efficient ring-attention
Last synced: 31 Oct 2024
https://github.com/damo-nlp-sg/inf-clip
💣💣 The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
clip contrastive-learning flash-attention infinite-batch-size memory-efficient ring-attention
Last synced: 17 Dec 2024
https://github.com/kklemon/flashperceiver
Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.
attention-mechanism deep-learning flash-attention nlp perceiver transformer
Last synced: 19 Nov 2024
https://github.com/bruce-lee-ly/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core
Last synced: 15 Nov 2024
https://github.com/bruce-lee-ly/decoding_attention
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
cuda cuda-core decoding-attention flash-attention flashinfer gpu inference large-language-model llm mha multi-head-attention nvidia
Last synced: 23 Oct 2024
https://github.com/kyegomez/flashmha
An simple pytorch implementation of Flash MultiHead Attention
artificial-intelligence artificial-neural-networks attention attention-mechanisms attentionisallyouneed flash-attention gpt4 transformer
Last synced: 09 Nov 2024
https://github.com/erfanzar/jax-flash-attn2
Flash Attention Implementation with Multiple Backend Support and Sharding This module provides a flexible implementation of Flash Attention with support for different backends (GPU, TPU, CPU) and platforms (Triton, Pallas, JAX).
flash-attention flash-attention-2 jax pallas
Last synced: 07 Nov 2024
https://github.com/kreasof-ai/homunculus-project
Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.
bitnet deep-learning flash-attention jupyter-notebook large-language-models low-rank-adaptation machine-learning python pytorch pytorch-lightning transformer vision-transformer
Last synced: 17 Dec 2024
https://github.com/lukasdrews97/dumblellm
Decoder-only LLM trained on the Harry Potter books.
byte-pair-encoding flash-attention grouped-query-attention large-language-model rotary-position-embedding transformer
Last synced: 18 Dec 2024