Projects in Awesome Lists tagged with multi-head-attention
A curated list of projects in awesome lists tagged with multi-head-attention .
https://github.com/sooftware/attentions
PyTorch implementation of some attentions for Deep Learning Researchers.
additive-attention attention dot-product-attention location-aware-attention location-sensitive-attension multi-head-attention pytorch relative-multi-head-attention relative-positional-encoding
Last synced: 05 Apr 2025
https://github.com/poloclub/dodrio
Exploring attention weights in transformer-based models with linguistic knowledge.
attention-mechanism deep-learning interactive-visualizations multi-head-attention nlp transformer visualization
Last synced: 13 May 2025
https://github.com/monk1337/various-attention-mechanisms
This repository contain various types of attention mechanism like Bahdanau , Soft attention , Additive Attention , Hierarchical Attention etc in Pytorch, Tensorflow, Keras
attention attention-lstm attention-mechanism attention-mechanisms attention-model attention-network bahdanau-attention hierarchical-attention keras luong-attention multi-head-attention pytorch scaled-dot-product-attention self-attention sentence-attention
Last synced: 10 Apr 2025
https://github.com/zhaocq-nlp/attention-visualization
Visualization for simple attention and Google's multi-head attention.
attention attention-visualization machine-translation multi-head-attention neural-machine-translation visualization
Last synced: 13 Apr 2025
https://github.com/bruce-lee-ly/decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
cuda cuda-core decoding-attention flash-attention flashinfer flashmla gpu gqa inference large-language-model llm mha mla mqa multi-head-attention nvidia
Last synced: 05 May 2025
https://github.com/bruce-lee-ly/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core
Last synced: 13 Apr 2025
https://github.com/shreyas-kowshik/nlp4if
Code for the runners up entry on the English subtask on the Shared-Task-On-Fighting the COVID-19 Infodemic, NLP4IF workshop, NAACL'21.
deep-learning multi-head-attention multi-task-learning naacl2021 natural-language-processing
Last synced: 20 Nov 2024
https://github.com/liaoyanqing666/transformer_pytorch
完整的原版transformer程序,complete origin transformer program
beginner multi-head-attention positional-encoding python pytorch transformer
Last synced: 28 Jan 2025
https://github.com/junfanz1/minigpt-and-deepseek-mla-multi-head-latent-attention
An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designed and implemented the Multi-Head Latent Attention (MLA) module as a drop-in replacement for traditional multi-head attention (MHA) in large language models.
attention-mechanism deepseek llm mla multi-head-attention pytorch
Last synced: 13 Apr 2025
https://github.com/ashishbodhankar/transformer_nmt
Attention is all you need: Discovering the Transformer model
attention machine-translation mask multi-head-attention natural-language-processing transformer vaswani
Last synced: 20 Mar 2025