Projects in Awesome Lists tagged with multi-head-attention

https://github.com/sooftware/attentions

PyTorch implementation of some attentions for Deep Learning Researchers.

additive-attention attention dot-product-attention location-aware-attention location-sensitive-attension multi-head-attention pytorch relative-multi-head-attention relative-positional-encoding

Last synced: 05 Apr 2025

https://github.com/poloclub/dodrio

Exploring attention weights in transformer-based models with linguistic knowledge.

attention-mechanism deep-learning interactive-visualizations multi-head-attention nlp transformer visualization

Last synced: 13 May 2025

https://github.com/monk1337/various-attention-mechanisms

This repository contain various types of attention mechanism like Bahdanau , Soft attention , Additive Attention , Hierarchical Attention etc in Pytorch, Tensorflow, Keras

attention attention-lstm attention-mechanism attention-mechanisms attention-model attention-network bahdanau-attention hierarchical-attention keras luong-attention multi-head-attention pytorch scaled-dot-product-attention self-attention sentence-attention

Last synced: 10 Apr 2025

https://github.com/zhaocq-nlp/attention-visualization

Visualization for simple attention and Google's multi-head attention.

attention attention-visualization machine-translation multi-head-attention neural-machine-translation visualization

Last synced: 13 Apr 2025

https://github.com/bruce-lee-ly/decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

cuda cuda-core decoding-attention flash-attention flashinfer flashmla gpu gqa inference large-language-model llm mha mla mqa multi-head-attention nvidia

Last synced: 05 May 2025

https://github.com/bruce-lee-ly/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core

Last synced: 13 Apr 2025

https://github.com/shreyas-kowshik/nlp4if

Code for the runners up entry on the English subtask on the Shared-Task-On-Fighting the COVID-19 Infodemic, NLP4IF workshop, NAACL'21.

deep-learning multi-head-attention multi-task-learning naacl2021 natural-language-processing

Last synced: 20 Nov 2024

https://github.com/liaoyanqing666/transformer_pytorch

完整的原版transformer程序，complete origin transformer program

beginner multi-head-attention positional-encoding python pytorch transformer

Last synced: 28 Jan 2025

https://github.com/junfanz1/minigpt-and-deepseek-mla-multi-head-latent-attention

An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designed and implemented the Multi-Head Latent Attention (MLA) module as a drop-in replacement for traditional multi-head attention (MHA) in large language models.

attention-mechanism deepseek llm mla multi-head-attention pytorch