{"id":16487327,"url":"https://github.com/andremaz/dnn-attention","last_synced_at":"2026-05-05T07:31:44.808Z","repository":{"id":39730149,"uuid":"241880607","full_name":"AndreMaz/dnn-attention","owner":"AndreMaz","description":"Sequence 2 Sequence with Attention Mechanisms in Tensorflow v2","archived":false,"fork":false,"pushed_at":"2022-12-08T03:47:06.000Z","size":241,"stargazers_count":1,"open_issues_count":6,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-04T17:50:28.707Z","etag":null,"topics":["attention-decoder","attention-lstm","attention-mechanism","attention-network","bahdanau-attention","luong-attention","pointer-networks","ptr-net","seq2seq","tensorflow","tensorflow2","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AndreMaz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-20T12:39:00.000Z","updated_at":"2020-11-29T21:33:14.000Z","dependencies_parsed_at":"2023-01-24T09:00:06.585Z","dependency_job_id":null,"html_url":"https://github.com/AndreMaz/dnn-attention","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AndreMaz/dnn-attention","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreMaz%2Fdnn-attention","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreMaz%2Fdnn-attention/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreMaz%2Fdnn-attention/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreMaz%2Fdnn-attention/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AndreMaz","download_url":"https://codeload.github.com/AndreMaz/dnn-attention/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AndreMaz%2Fdnn-attention/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32640533,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-04T10:08:07.713Z","status":"online","status_checked_at":"2026-05-05T02:00:06.033Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention-decoder","attention-lstm","attention-mechanism","attention-network","bahdanau-attention","luong-attention","pointer-networks","ptr-net","seq2seq","tensorflow","tensorflow2","transformer"],"created_at":"2024-10-11T13:33:44.647Z","updated_at":"2026-05-05T07:31:44.792Z","avatar_url":"https://github.com/AndreMaz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sequence 2 Sequence with Attention Mechanisms\nThis repo contains implementation of:\n- Classical Sequence 2 Sequence model without attention. Used in [Date Conversion Problem](#date-conversion-problem)\n- Luong's Dot Attention. Used in [Date Conversion Problem](#date-conversion-problem)\n- Bahdanau's Attention. Used in [Date Conversion Problem](#date-conversion-problem)\n- Pointer Networks a.k.a. Ptr-Net. Used in [Sorting Numbers](#sorting-numbers)\n- Pointer Networks with Masking. Used in [Sorting Numbers](#sorting-numbers)\n\nI've tried, as much as possible, to avoid building custom layers in order to ease the readability of the code. Also, note that I've deliberately didn't create generic layers. This means that the code in `/model` folder contains repeated elements (e.g., `Encoder` is almost the same for all the models). However, I think that this way of structuring things will ease the readability of the code and help to understand how the data flows through the layers.\n\n## Date Conversion Problem\nConvert dates in different formats (e.g., `\"08/30/21\"`, `\"080120\"`, `\"AUG 01, 2020\"`) into ISO standard (e.g., `\"2021-08-30\"`, `\"2020-08-01\"`) format. For more info check the [useful links section](#useful-links).\n\n### Problem Stats\n- Input vocabulary size: 35\n- Input length: 12\n- Output vocabulary size: 13\n- Output length: 10\n\n### Configs\nConfigs are located at `date-conversion/config.json`\n```js\n{\n    // Range of dates that will be used to generate the dataset\n    \"num_epochs\": 3, // Number of epochs for training\n    \"batch_size\": 128, // Batch size during training\n    \"min_year\": \"1950-01-01\",\n    \"max_year\": \"2050-01-01\",\n    \"embedding_dims\": 64, // Encoder's and Decoder's embedding dims\n    \"lstm_units\": 64, // Encoder's and Decoder's LSTM units\n    \"num_tests\": 10 // Number of dates that will be used for testing. Each date will be converted into all possible format (there are 20 formats).\n}\n```\n\n**Input Vocabulary Mappings**\n```\nChar \"\\n\": Index 0\nChar \"0\": Index 1\nChar \"1\": Index 2\nChar \"2\": Index 3\nChar \"3\": Index 4\nChar \"4\": Index 5\nChar \"5\": Index 6\nChar \"6\": Index 7\nChar \"7\": Index 8\nChar \"8\": Index 9\nChar \"9\": Index 10\nChar \"/\": Index 11\nChar \"-\": Index 12\nChar \".\": Index 13\nChar \",\": Index 14\nChar \" \": Index 15\nChar \"J\": Index 16\nChar \"A\": Index 17\nChar \"N\": Index 18\nChar \"F\": Index 19\nChar \"E\": Index 20\nChar \"B\": Index 21\nChar \"M\": Index 22\nChar \"R\": Index 23\nChar \"P\": Index 24\nChar \"Y\": Index 25\nChar \"U\": Index 26\nChar \"L\": Index 27\nChar \"G\": Index 28\nChar \"S\": Index 29\nChar \"O\": Index 30\nChar \"C\": Index 31\nChar \"T\": Index 32\nChar \"V\": Index 33\nChar \"D\": Index 34\n```\n\n**Output Vocabulary Mappings**\n```\nChar \"\\n\": Index 0\nChar \"\\t\": Index 1\nChar \"0\": Index 2\nChar \"1\": Index 3\nChar \"2\": Index 4\nChar \"3\": Index 5\nChar \"4\": Index 6\nChar \"5\": Index 7\nChar \"6\": Index 8\nChar \"7\": Index 9\nChar \"8\": Index 10\nChar \"9\": Index 11\nChar \"-\": Index 12\n```\n\n**Possible Input Formats**\n\nNumber input formats: 20\n```\n1 - 01OCT2019\n2 - 100119\n3 - 10/01/19\n4 - 10/01/2019\n5 - 10/1/2019\n6 - 01-10-2019\n7 - 1-10-2019\n8 - OCT 01 19\n9 - 10/1/19\n10 - OCT 01 2019\n11 - OCT 01, 19\n12 - OCT 01, 2019\n13 - 01.10.2019\n14 - 1.10.2019\n15 - 2019.10.01\n16 - 2019.10.1\n17 - 20191001\n18 - 2019-10-1\n19 - 1 OCT 2019\n20 - 2019-10-01\n```\n\n### Input/Output Example\n\n**Encoder's input example for `01.10.2019`**\n```bash\nTensor(\n     [[1, 2, 13, 2, 1, 13, 3, 1, 2, 10, 0, 0],], shape=(1, 12))\n```\n\n**Decoder's input example for `2019-10-01`**\n\nDecoder is fed with the encoded ISO date a.k.a [teacher forcing](https://machinelearningmastery.com/teacher-forcing-for-recurrent-neural-networks/). The number `1` at the first position is the start-of-sequence (SOS).\n\n```bash\nTensor(\n     [[1, 4, 2, 3, 11, 12, 3, 2, 12, 2],], shape=(1, 10))\n```\n\n**Decoder's expected output example for `2019-10-01`**\n\nShape is [10, 13]. 10 is the output length and 13 is the output vocabulary size.\n\n```bash\nTensor(\n    [[[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],\n      [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],\n      [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],\n      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],\n      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],\n      [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],\n      [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],\n      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],\n      [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],\n      [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]]], shape=(1, 10, 13))\n```\n\n### Attention Plots\n\n#### Luong Attention \n\nInput: `NOV 01, 98` Output: `1998-11-01`\n\n![image](./media/luong-attention.png)\n\n#### Bahdanau Attention\n\nInput: `APR 04. 1953` Output: `1953-04-04`\n\n![image](./media/bahdanau-attention.png)\n\n### Running \n```bash\npython date-conversion/main.py \u003cmodel-name\u003e \u003c0/1\u003e \n# \u003cmodel-name\u003e -  One of \"seq2seq\", \"luong\" or \"bahdanau\". If not provided \"luong\" will be used.\n# \u003c0/1\u003e - 1 to plot the attention. If not provided won't plot the attention.\n```\n\n### Run Unit Tests\n```bash\npython date-conversion/tests/runner.py\n```\n\n## Sorting Numbers\nSorts numbers in an ascending order with Pointer Networks. For more info check the [useful links section](#useful-links).\n\n### Problem Stats\n- Input vocabulary size: 100\n- Input length: 10\n- Output vocabulary size: 100\n- Output length: 10\n\n\u003e Note: Pointer Networks are capable of dealing with inputs of variable length. For example, we can train the model with sequence size equal to 10 but during testing we can feed sequences larger than 10 and it will still be able to sort it. However, after using `model.compile()` the model becomes \"static\" and it no longer accepts input sequences of variable length. I think the only way of solving this is by not using `model.compile()` and do the training (computing the loss and gradients) manually. This is a `ToDo`...\n\n### Configs\nConfigs are located at `sorting-numbers/config.json`\n```js\n{\n    \"num_epochs\": 10, // Number of epochs for training\n    \"batch_size\": 128, // Batch size during training\n    \"embedding_dims\": 64, // Encoder's and Decoder's embedding dims\n    \"lstm_units\": 64, // Encoder's and Decoder's LSTM units\n    \"num_samples_training\": 50000, // Training sample size\n    \"num_samples_validation\": 5000, // Validation sample size\n    \"num_samples_tests\": 200, // Testing sample size\n    \"sample_length\": 10, // Number of number to be sorted\n    // Numbers will be samples between the min value and max_value\n    \"min_value\": 1, // Min value is included\n    \"max_value\": 100 // Max value is included\n}\n```\n\n### Input/Output Example\n\nSorting numbers between `0` and `9`. The number `10` at the first position is the end-of-sequence (EOS). During the decoding process the last pointer will point to EOS.\n\n**Encoder's Input**\n```bash\ntf.Tensor([[10.  2.  9.  3.  0.  5.  1.  8.  6.  4.  7.]], shape=(1, 11), dtype=float32)\n```\n\n**Decoder's Input**\n\nDecoder is fed with the sorted sequence a.k.a [teacher forcing](https://machinelearningmastery.com/teacher-forcing-for-recurrent-neural-networks/). The number `11` at the first position is the start-of-sequence (SOS).\n\n```bash\ntf.Tensor([[11.  0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]], shape=(1, 11), dtype=float32)\n```\n\n**Decoder's Expected Output**\n\nOne-hot encoding where each row represents represents a time-step and the location to which the `pointer` should point. The last row should point to the first position of encoder's input, which is the EOS symbol.\n\n```bash\ntf.Tensor(\n[[[0 0 0 0 1 0 0 0 0 0 0]\n  [0 0 0 0 0 0 1 0 0 0 0]\n  [0 1 0 0 0 0 0 0 0 0 0]\n  [0 0 0 1 0 0 0 0 0 0 0]\n  [0 0 0 0 0 0 0 0 0 1 0]\n  [0 0 0 0 0 1 0 0 0 0 0]\n  [0 0 0 0 0 0 0 0 1 0 0]\n  [0 0 0 0 0 0 0 0 0 0 1]\n  [0 0 0 0 0 0 0 1 0 0 0]\n  [0 0 1 0 0 0 0 0 0 0 0]\n  [1 0 0 0 0 0 0 0 0 0 0]]], shape=(1, 11, 11), dtype=int32)\n```\n\n### Pointers Plots\n\n#### Vanilla Pointer Network\nInteresting behavior happens at `step 1` and `step 2` and the numbers `18` and `19`. It shows that at these steps the network is not sure where to point (either to `18` or `19`) because these numbers are close to each other. However, at `step 1` it gives more \"attention\" to the number `18` so it is selected (correct choice). The downside of vanilla pointer networks can be seen at `step 2`. Number `18` was selected at `step 1` but the network still considers it as a valid option at `step 2`. In this problem in particular, the pointer shouldn't point two times at the same place. This can be solved with masking, i.e., after selecting an element at `step t` it should be masked out in a way that the network ignores it at the next step `step t+1`.\n\n**Vanilla Pointers**\n\n![image](./media/pointer-attention.png)\n\u003e Note that in this plot number `50` represents the EOS\n\n#### Pointer with Mask\n\nLooking at `step 1` and `step 2` and the numbers `10` and `11` is is possible to see that at `step 1` the networking is unsure between the two numbers but it selects the number `10`. However, contrary to [Pointer Nets without masking](#vanilla-pointer-network), at `step 2` the network doesn't even consider the possibility of pointing to the number `10` because it was already selected at `step 1`.\n\n**Pointers with Mask**\n\n![image](./media/pointer-masking.png)\n\u003e Note that in this plot number `100` represents the EOS\n\n### Running \n```bash\npython sorting-numbers/main.py \u003cmodel-name\u003e # One of \"pointer\" or \"pointer-masking\". If not provided \"pointer-masking\" will be used\n```\n\n## Useful Links\nA short list of links that I've found useful while I was learning about attention mechanisms:\n- Tensorflow.js [data-conversion-attention](https://github.com/tensorflow/tfjs-examples/tree/master/date-conversion-attention) example. I've simply ported the dataset generation script and Luong's attention (slightly refactored). Nevertheless, all the credit goes to the TF team and the people that built the model.\n- [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/pdf/1409.0473.pdf) - Bahdanau's attention\n- [Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/abs/1508.04025) - Luong's Attention\n- [Pointer Networks](https://arxiv.org/abs/1506.03134)\n- [Neural machine translation with attention](https://www.tensorflow.org/tutorials/text/nmt_with_attention)\n- [Attention Mechanism](https://blog.floydhub.com/attention-mechanism/)\n- [Attn: Illustrated Attention](https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3)\n\n\n## Setting the environment and installing the dependencies\nFollow Tensorflow's [installation guide](https://www.tensorflow.org/install/pip) to set the environment and get things ready.\n\n\u003e I'm using Python v3.6 and Tensorflow v2.1\n\n## Pytorch Implementation\nFor Pytorch implementation check [fmstam](https://github.com/fmstam)'s [repo](https://github.com/fmstam/seq2seq_with_deep_attention).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandremaz%2Fdnn-attention","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandremaz%2Fdnn-attention","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandremaz%2Fdnn-attention/lists"}