Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
awesome-llm awq flash-attention flash-attention-2 flash-decoding inferflow kv-quant mamba paged-attention sora streaming-llm streamingllm tensorrt-llm vllm
Last synced: 06 Jul 2024
![](https://github.com/DefTruth.png)
https://github.com/DefTruth/CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
block-reduce cuda cuda-kernels cuda-programming elementwise flash-attention flash-attention-2 gemm gemv layernorm rmsnorm softmax warp-reduce
Last synced: 20 Apr 2024
![](https://github.com/DefTruth.png)
https://github.com/arihanv/Shush
Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
flash-attention-2 huggingface-transformers machine-learning modal shadcn-ui transcription whisper
Last synced: 10 Apr 2024
![](https://github.com/arihanv.png)