Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/DefTruth/Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

awesome-llm awq flash-attention flash-attention-2 flash-decoding inferflow kv-quant mamba paged-attention sora streaming-llm streamingllm tensorrt-llm vllm

Last synced: 06 Jul 2024

https://github.com/DefTruth/CUDA-Learn-Notes

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

block-reduce cuda cuda-kernels cuda-programming elementwise flash-attention flash-attention-2 gemm gemv layernorm rmsnorm softmax warp-reduce

Last synced: 20 Apr 2024

https://github.com/arihanv/Shush

Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app

flash-attention-2 huggingface-transformers machine-learning modal shadcn-ui transcription whisper

Last synced: 10 Apr 2024