Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with flash-attention

A curated list of projects in awesome lists tagged with flash-attention .

https://github.com/QwenLM/Qwen-7B

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

chinese flash-attention large-language-models llm natural-language-processing pretrained-models

Last synced: 14 Dec 2024

https://github.com/qwenlm/qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

chinese flash-attention large-language-models llm natural-language-processing pretrained-models

Last synced: 16 Dec 2024

https://github.com/QwenLM/Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

chinese flash-attention large-language-models llm natural-language-processing pretrained-models

Last synced: 27 Oct 2024

https://github.com/ymcui/chinese-llama-alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn

Last synced: 17 Dec 2024

https://github.com/ymcui/Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn

Last synced: 29 Oct 2024

https://github.com/internlm/internlm

Official release of InternLM2 7B and 20B base and chat models. 200K context support

chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf

Last synced: 17 Dec 2024

https://github.com/InternLM/InternLM

Official release of InternLM2 7B and 20B base and chat models. 200K context support

chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf

Last synced: 27 Oct 2024

https://github.com/flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

cuda flash-attention gpu jit large-large-models llm-inference pytorch

Last synced: 19 Dec 2024

https://github.com/DefTruth/CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

block-reduce cuda cuda-programming elementwise flash-attention flash-attention-2 flash-attention-3 gemm gemv layernorm pytorch rmsnorm softmax triton warp-reduce

Last synced: 27 Oct 2024

https://github.com/internlm/internevo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

910b deepspeed-ulysses flash-attention gemma internlm internlm2 llama3 llava llm-framework llm-training multi-modal pipeline-parallelism pytorch ring-attention sequence-parallelism tensor-parallelism transformers-models zero3

Last synced: 14 Dec 2024

https://github.com/InternLM/InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

910b deepspeed-ulysses flash-attention gemma internlm internlm2 llama3 llava llm-framework llm-training multi-modal pipeline-parallelism pytorch ring-attention sequence-parallelism tensor-parallelism transformers-models zero3

Last synced: 30 Oct 2024

https://github.com/DAMO-NLP-SG/Inf-CLIP

💣💣 The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.

clip contrastive-learning flash-attention infinite-batch-size memory-efficient ring-attention

Last synced: 31 Oct 2024

https://github.com/damo-nlp-sg/inf-clip

💣💣 The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.

clip contrastive-learning flash-attention infinite-batch-size memory-efficient ring-attention

Last synced: 17 Dec 2024

https://github.com/kklemon/flashperceiver

Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

attention-mechanism deep-learning flash-attention nlp perceiver transformer

Last synced: 19 Nov 2024

https://github.com/bruce-lee-ly/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core

Last synced: 15 Nov 2024

https://github.com/bruce-lee-ly/decoding_attention

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

cuda cuda-core decoding-attention flash-attention flashinfer gpu inference large-language-model llm mha multi-head-attention nvidia

Last synced: 23 Oct 2024

https://github.com/erfanzar/jax-flash-attn2

Flash Attention Implementation with Multiple Backend Support and Sharding This module provides a flexible implementation of Flash Attention with support for different backends (GPU, TPU, CPU) and platforms (Triton, Pallas, JAX).

flash-attention flash-attention-2 jax pallas

Last synced: 07 Nov 2024

https://github.com/kreasof-ai/homunculus-project

Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.

bitnet deep-learning flash-attention jupyter-notebook large-language-models low-rank-adaptation machine-learning python pytorch pytorch-lightning transformer vision-transformer

Last synced: 17 Dec 2024