{"id":13718715,"url":"https://github.com/philipperemy/keras-attention","last_synced_at":"2026-01-03T17:12:37.449Z","repository":{"id":38808864,"uuid":"92132849","full_name":"philipperemy/keras-attention","owner":"philipperemy","description":"Keras Attention Layer (Luong and Bahdanau scores).","archived":false,"fork":false,"pushed_at":"2023-11-17T10:37:02.000Z","size":4270,"stargazers_count":2809,"open_issues_count":3,"forks_count":670,"subscribers_count":76,"default_branch":"master","last_synced_at":"2025-05-06T23:03:15.997Z","etag":null,"topics":["attention-mechanism","attention-model","deep-learning","keras","keras-neural-networks"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philipperemy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["philipperemy"]}},"created_at":"2017-05-23T05:28:01.000Z","updated_at":"2025-04-30T16:11:49.000Z","dependencies_parsed_at":"2024-04-09T09:47:30.878Z","dependency_job_id":null,"html_url":"https://github.com/philipperemy/keras-attention","commit_stats":null,"previous_names":["philipperemy/keras-attention-mechanism"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipperemy%2Fkeras-attention","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipperemy%2Fkeras-attention/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipperemy%2Fkeras-attention/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipperemy%2Fkeras-attention/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philipperemy","download_url":"https://codeload.github.com/philipperemy/keras-attention/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253919628,"owners_count":21984263,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention-mechanism","attention-model","deep-learning","keras","keras-neural-networks"],"created_at":"2024-08-03T01:00:36.426Z","updated_at":"2026-01-03T17:12:37.408Z","avatar_url":"https://github.com/philipperemy.png","language":"Python","funding_links":["https://github.com/sponsors/philipperemy"],"categories":["Examples/Notebooks"],"sub_categories":[],"readme":"# Keras Attention Layer\n\n[![Downloads](https://pepy.tech/badge/attention)](https://pepy.tech/project/attention)\n[![Downloads](https://pepy.tech/badge/attention/month)](https://pepy.tech/project/attention)\n[![license](https://img.shields.io/badge/License-Apache_2.0-brightgreen.svg)](https://github.com/philipperemy/keras-attention-mechanism/blob/master/LICENSE) [![dep1](https://img.shields.io/badge/Tensorflow-2.0+-brightgreen.svg)](https://www.tensorflow.org/)\n\nAttention Layer for Keras. Supports the score functions of Luong and Bahdanau.\n\nTested with Tensorflow 2.8, 2.9, 2.10, 2.11, 2.12, 2.13 and 2.14 (Sep 26, 2023).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"examples/equations.png\" width=\"600\"\u003e\n\u003c/p\u003e\n\n\n## Installation\n\n*PyPI*\n\n```bash\npip install attention\n```\n\n## Attention Layer\n\n```python\nAttention(\n    units=128,\n    score='luong',\n    **kwargs\n)\n```\n\n### Arguments\n\n- `units`: Integer. The number of (output) units in the attention vector ($a_t$).\n- `score`: String. The score function $score(h_t, \\bar{h_s})$. Possible values are `luong` or `bahdanau`.\n\n\n   - Luong's multiplicative style. Link to [paper](https://arxiv.org/abs/1508.04025).\n   - Bahdanau's additive style. Link to [paper](https://arxiv.org/abs/1409.0473).\n\n\n### Input shape\n\n3D tensor with shape `(batch_size, timesteps, input_dim)`.\n\n### Output shape\n\n- 2D tensor with shape `(batch_size, num_units)` ($a_t$).\n\nIf you want to visualize the attention weights, refer to this example [examples/add_two_numbers.py](examples/add_two_numbers.py).\n\n\n## Example\n\n```python\nimport numpy as np\nfrom tensorflow.keras import Input\nfrom tensorflow.keras.layers import Dense, LSTM\nfrom tensorflow.keras.models import load_model, Model\n\nfrom attention import Attention\n\n\ndef main():\n    # Dummy data. There is nothing to learn in this example.\n    num_samples, time_steps, input_dim, output_dim = 100, 10, 1, 1\n    data_x = np.random.uniform(size=(num_samples, time_steps, input_dim))\n    data_y = np.random.uniform(size=(num_samples, output_dim))\n\n    # Define/compile the model.\n    model_input = Input(shape=(time_steps, input_dim))\n    x = LSTM(64, return_sequences=True)(model_input)\n    x = Attention(units=32)(x)\n    x = Dense(1)(x)\n    model = Model(model_input, x)\n    model.compile(loss='mae', optimizer='adam')\n    model.summary()\n\n    # train.\n    model.fit(data_x, data_y, epochs=10)\n\n    # test save/reload model.\n    pred1 = model.predict(data_x)\n    model.save('test_model.h5')\n    model_h5 = load_model('test_model.h5', custom_objects={'Attention': Attention})\n    pred2 = model_h5.predict(data_x)\n    np.testing.assert_almost_equal(pred1, pred2)\n    print('Success.')\n\n\nif __name__ == '__main__':\n    main()\n```\n\n## Other Examples\n\nBrowse [examples](examples).\n\nInstall the requirements before running the examples: `pip install -r examples/examples-requirements.txt`.\n\n\n### IMDB Dataset\n\nIn this experiment, we demonstrate that using attention yields a higher accuracy on the IMDB dataset. We consider two\nLSTM networks: one with this attention layer and the other one with a fully connected layer. Both have the same number\nof parameters for a fair comparison (250K).\n\nHere are the results on 10 runs. For every run, we record the max accuracy on the test set for 10 epochs.\n\n\n| Measure  | No Attention (250K params) | Attention (250K params) |\n| ------------- | ------------- | ------------- |\n| MAX Accuracy | 88.22 | 88.76 |\n| AVG Accuracy | 87.02 | 87.62 |\n| STDDEV Accuracy | 0.18 | 0.14 |\n\nAs expected, there is a boost in accuracy for the model with attention. It also reduces the variability between the runs, which is something nice to have.\n\n\n### Adding two numbers\n\nLet's consider the task of adding two numbers that come right after some delimiters (0 in this case):\n\n`x = [1, 2, 3, 0, 4, 5, 6, 0, 7, 8]`. Result is `y = 4 + 7 = 11`.\n\nThe attention is expected to be the highest after the delimiters. An overview of the training is shown below, where the\ntop represents the attention map and the bottom the ground truth. As the training  progresses, the model learns the \ntask and the attention map converges to the ground truth.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"examples/attention.gif\" width=\"320\"\u003e\n\u003c/p\u003e\n\n### Finding max of a sequence\n\nWe consider many 1D sequences of the same length. The task is to find the maximum of each sequence. \n\nWe give the full sequence processed by the RNN layer to the attention layer. We expect the attention layer to focus on the maximum of each sequence.\n\nAfter a few epochs, the attention layer converges perfectly to what we expected.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"examples/readme/example.png\" width=\"320\"\u003e\n\u003c/p\u003e\n\n## References\n\n- [Hierarchical Attention Networks for Document Classification](https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf)\n- [Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/abs/1508.04025)\n- [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilipperemy%2Fkeras-attention","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilipperemy%2Fkeras-attention","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilipperemy%2Fkeras-attention/lists"}