{"id":13810669,"url":"https://github.com/tlatkowski/multihead-siamese-nets","last_synced_at":"2025-04-13T08:25:23.919Z","repository":{"id":28323686,"uuid":"116178813","full_name":"tlatkowski/multihead-siamese-nets","owner":"tlatkowski","description":"Implementation of Siamese Neural Networks built upon multihead attention mechanism for text semantic similarity task. ","archived":false,"fork":false,"pushed_at":"2023-03-24T23:45:44.000Z","size":1502,"stargazers_count":182,"open_issues_count":12,"forks_count":43,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-21T12:21:39.299Z","etag":null,"topics":["attention","deep-architectures","deep-learning","deep-neural-networks","multihead-attention","multihead-attention-networks","natural-language-processing","nlp","paraphrase","paraphrase-identification","python3","quora-question-pairs","semantic-similarity","sentence-similarity","siamese-cnn","siamese-lstm","siamese-neural-network","snli","tensorflow","text-similarity"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tlatkowski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":null,"patreon":null,"open_collective":"multihead-siamese-nets"}},"created_at":"2018-01-03T20:25:46.000Z","updated_at":"2025-01-21T03:46:28.000Z","dependencies_parsed_at":"2023-01-14T08:37:46.058Z","dependency_job_id":null,"html_url":"https://github.com/tlatkowski/multihead-siamese-nets","commit_stats":{"total_commits":137,"total_committers":6,"mean_commits":"22.833333333333332","dds":0.5620437956204379,"last_synced_commit":"6bccdb8c3fe61c0d53c8fe8dd19f0801089dde79"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlatkowski%2Fmultihead-siamese-nets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlatkowski%2Fmultihead-siamese-nets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlatkowski%2Fmultihead-siamese-nets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlatkowski%2Fmultihead-siamese-nets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tlatkowski","download_url":"https://codeload.github.com/tlatkowski/multihead-siamese-nets/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245516874,"owners_count":20628239,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","deep-architectures","deep-learning","deep-neural-networks","multihead-attention","multihead-attention-networks","natural-language-processing","nlp","paraphrase","paraphrase-identification","python3","quora-question-pairs","semantic-similarity","sentence-similarity","siamese-cnn","siamese-lstm","siamese-neural-network","snli","tensorflow","text-similarity"],"created_at":"2024-08-04T03:00:22.401Z","updated_at":"2025-03-27T00:11:24.727Z","avatar_url":"https://github.com/tlatkowski.png","language":"Jupyter Notebook","readme":"[![Financial Contributors on Open Collective](https://opencollective.com/multihead-siamese-nets/all/badge.svg?label=financial+contributors)](https://opencollective.com/multihead-siamese-nets) ![](https://img.shields.io/badge/Python-3.6-blue.svg) ![](https://img.shields.io/badge/TensorFlow-1.15.2-blue.svg) ![](https://img.shields.io/badge/License-MIT-blue.svg)\n\n# Siamese Deep Neural Networks for semantic similarity.\nThis repository contains an implementation of Siamese Neural Networks in Tensorflow built based on 3 different and major deep learning architectures:\n- Convolutional Neural Networks\n- Recurrent Neural Networks\n- Multihead Attention Networks\n\nThe main reason of creating this repository is to compare well-known implementaions of Siamese Neural Networks available on GitHub mainly built upon CNN and RNN architectures with Siamese Neural Network built based on the multihead attention mechanism originally proposed in the Transformer model from [Attention is all you need](https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf) paper.\n\n# Supported datasets\nCurrent version of pipeline supports the following **3** datasets:\n- [The Stanford Natural Language Inference (SNLI) Corpus](https://nlp.stanford.edu/projects/snli/)\n- [Quora Question Pairs](https://www.kaggle.com/c/quora-question-pairs)\n- :new: Adversarial Natural Language Inference (ANLI) benchmark: [GitHub](https://github.com/facebookresearch/anli/), [arXiv](https://arxiv.org/pdf/1910.14599.pdf)\n\n# Installation\n\n### Data preparation\n\nIn order to download data, execute the following commands \n(this process can take a while depending on your network throughput):\n```\ncd bin\nchmod a+x prepare_data.sh\n./prepare_data.sh\n```\nAs as result of executing above script, **corpora** directory\n will be created with **QQP**, **SNLI** and **ANLI** data.\n\n### Dependency installation\nThis project was developed in and has been tested using **Python 3.6**. The package requirements are stored in the **requirements** folder.\n\nTo install the requirements, execute the following command:\n\nFor **GPU** usage, execute:\n```\npip install -r requirements/requirements-gpu.txt\n```\nand for **CPU** usage:\n```\npip install -r requirements/requirements-cpu.txt\n```\n\n# Training models\nTo train a model run the following command:\n\n```\npython3 run.py train SELECTED_MODEL SELECTED_DATASET --experiment_name NAME --gpu GPU_NUMBER\n```\n\nwhere **SELECTED_MODEL** represents one of the selected models among:\n- cnn\n- rnn\n- multihead\n\nand **SELECTED_DATASET** is represented by:\n- SNLI\n- QQP\n- ANLI\n\n**--experiment_name** is an optional argument used for indicating an experiment name. Default value **{SELECTED_MODEL}_{EMBEDDING_SIZE}**. \n\n**--gpu** is an optional argument, use it in order to indicate specific GPU on your machine (the default value is '0').\n\nExample (GPU usage):\nRun the following command to train Siamese Neural Network based on CNN and trained on the SNLI corpus:\n```\npython3 run.py train cnn SNLI --gpu 1\n```\n\nExample (CPU usage):\nRun the following command to train Siamese Neural Network based on CNN:\n```\npython3 run.py train cnn SNLI\n```\n## Training configuration\nThis repository contains main configuration training file placed in **'config/main.ini'**.\n\n```ini\n[TRAINING]\nnum_epochs = 10\nbatch_size = 512\neval_every = 20\nlearning_rate = 0.001\ncheckpoints_to_keep = 5\nsave_every = 100\nlog_device_placement = false\n\n[DATA]\nlogs_path = logs\nmodel_dir = model_dir\n\n[PARAMS]\nembedding_size = 64\nloss_function = mse\n```\n\n## Model configuration\nAdditionally each model contains its own specific configuration file in which changing hyperparameters is possible.\n\n### Multihead Attention Network configuration file\n```ini\n[PARAMS]\nnum_blocks = 2\nnum_heads = 8\nuse_residual = False\ndropout_rate = 0.0\n```\n### Convolutional Neural Network configuration file\n```ini\n[PARAMS]\nnum_filters = 50,50,50\nfilter_sizes = 2,3,4\ndropout_rate = 0.0\n```\n### Recurrent Neural Network configuration file\n```ini\n[PARAMS]\nhidden_size = 128\ncell_type = GRU\nbidirectional = True\n```\n\n## Training models with GPU support on Google Colaboratory\n\nIf you don't have a workstation with GPU, you can use the below exemplary Google Colaboratory\nnotebook for training your models (CNN, RNN or Multihead) on SNLI or QQP datasets with usage of **NVIDIA Tesla T4 16GB GPU** \navailable within Google Colaboratory backend: [Multihead Siamese Nets in Google Colab](https://colab.research.google.com/drive/1FUEBV1JkQpF2iwFSDW338nAUhzPVZWAa)\n\n# Testing models\nDownload pretrained models from the following link: [pretrained Siamese Nets models](https://drive.google.com/file/d/1STgv1hIxdVpKLQ6-EZK7J3C4ZtfZgbkS/view?usp=sharing), unzip and put them \nin the **./model_dir** directory. After that, you can test models either using the predict mode of pipeline: \n```bash\npython3 run.py predict cnn\n```\nor using GUI demo:\n```bash\npython3 gui_demo.py\n```\n\nThe below pictures presents Multihead Siamese Nets GUI for:\n1. Positive example:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"530\" height=\"120\" src=\"https://github.com/tlatkowski/multihead-siamese-nets/blob/master/pics/positive_sample.png\"\u003e\n\u003c/p\u003e\n\n2. Negative example:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"530\" height=\"120\" src=\"https://github.com/tlatkowski/multihead-siamese-nets/blob/master/pics/negative_sample.png\"\u003e\n\u003c/p\u003e\n\n# Attention weights visualization\nIn order to visualize multihead attention weights for compared sentences use GUI demo - check \n'Visualize attention weights' checkbox which is visible after choosing model based on multihead attention mechanism.\n\nThe example of attention weights visualization looks as follows (4 attention heads):\n\n![](https://github.com/tlatkowski/multihead-siamese-nets/blob/master/pics/attention1.png) \n![](https://github.com/tlatkowski/multihead-siamese-nets/blob/master/pics/attention2.png) \n\n# Comparison of models\n\nExperiments performed on GPU **Nvidia GeForce GTX 1080Ti**.\n\n## \u003e SNLI dataset.\n\nExperiment parameters:\n```ini\nNumber of epochs : 10\nBatch size : 512\nLearning rate : 0.001\n\nNumber of training instances : 326959\nNumber of dev instances : 3674\nNumber of test instances : 36736\n\nEmbedding size : 64\nLoss function: mean squared error (MSE)\n```\n\nSpecific hyperparameters of models:\n\nCNN | RNN | Multihead\n------------ | ------------- | -------------\nnum_filters = 50,50,50 | hidden_size = 128 | num_blocks = 2\nfilter_sizes = 2,3,4 | cell_type = GRU | num_heads = 8\n|  | bidirectional = True | use_residual = False\n|  |  | layers_normalization = False\n\nEvaluation results:\n\nModel | Mean-Dev-Acc* | Last-Dev-Acc** | Test-Acc | Epoch Time\n------------ | ------------ | ------------- | ------------- | -------------\nCNN | 76.51 | 75.08 | 75.40 | 15.97s \nbi-RNN | 79.36 | 79.52 | 79.56 | 1 min 22.95s \nMultihead | 78.52 | 79.61 | 78.29 | 1 min 00.24s  \n\n*Mean-Dev-Acc: the mean development set accuaracy over all epochs.\n\n**Last-Dev-Acc: the development set accuaracy for the last epoch.\n\nTraining curves (Accuracy \u0026 Loss): \n![SNLI][results_snli]\n\n[results_snli]: https://github.com/tlatkowski/multihead-siamese-nets/blob/master/pics/snli_train_curves.png \"Evaluation results\"\n\n## \u003e QQP dataset.\n\nExperiment parameters:\n```ini\nNumber of epochs : 10\nBatch size : 512\nLearning rate : 0.001\n\nNumber of training instances : 362646\nNumber of dev instances : 1213\nNumber of test instances : 40428\n\nEmbedding size : 64\nLoss function: mean squared error (MSE)\n```\n\nSpecific hyperparameters of models:\n\nCNN | RNN | Multihead\n------------ | ------------- | -------------\nnum_filters = 50,50,50 | hidden_size = 128 | num_blocks = 2\nfilter_sizes = 2,3,4 | cell_type = GRU | num_heads = 8\n|  | bidirectional = True | use_residual = False\n|  |  | layers_normalization = False\n\nEvaluation results:\n\nModel | Mean-Dev-Acc* | Last-Dev-Acc** | Test-Acc | Epoch Time\n------------ | ------------ | ------------- | ------------- | -------------\nCNN | 79.74 | 80.83 | 80.90 | 49.84s \nbi-RNN | 82.68 | 83.66 | 83.30 | 4 min 26.91s \nMultihead | 80.75 | 81.74 | 80.99 | 4 min 58.58s  \n\n*Mean-Dev-Acc: the mean development set accuracy over all epochs.\n\n**Last-Dev-Acc: the development set accuracy for the last epoch.\n\nTraining curves (Accuracy \u0026 Loss): \n![QQP][qqp_results]\n\n[qqp_results]: https://github.com/tlatkowski/multihead-siamese-nets/blob/master/pics/qqp_train_curves.png \"Evaluation results\"\n\n## Contributors\n\n### Code Contributors\n\nThis project exists thanks to all the people who contribute. [[Contribute](CONTRIBUTING.md)].\n\u003ca href=\"https://github.com/tlatkowski/multihead-siamese-nets/graphs/contributors\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/contributors.svg?width=890\u0026button=false\" /\u003e\u003c/a\u003e\n\n### Financial Contributors\n\nBecome a financial contributor and help us sustain our community. [[Contribute](https://opencollective.com/multihead-siamese-nets/contribute)]\n\n#### Individuals\n\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/individuals.svg?width=890\"\u003e\u003c/a\u003e\n\n#### Organizations\n\nSupport this project with your organization. Your logo will show up here with a link to your website. [[Contribute](https://opencollective.com/multihead-siamese-nets/contribute)]\n\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/0/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/0/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/1/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/1/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/2/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/2/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/3/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/3/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/4/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/4/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/5/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/5/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/6/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/6/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/7/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/7/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/8/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/8/avatar.svg\"\u003e\u003c/a\u003e\n\u003ca href=\"https://opencollective.com/multihead-siamese-nets/organization/9/website\"\u003e\u003cimg src=\"https://opencollective.com/multihead-siamese-nets/organization/9/avatar.svg\"\u003e\u003c/a\u003e\n","funding_links":["https://opencollective.com/multihead-siamese-nets","https://opencollective.com/multihead-siamese-nets/contribute","https://opencollective.com/multihead-siamese-nets/organization/0/website","https://opencollective.com/multihead-siamese-nets/organization/1/website","https://opencollective.com/multihead-siamese-nets/organization/2/website","https://opencollective.com/multihead-siamese-nets/organization/3/website","https://opencollective.com/multihead-siamese-nets/organization/4/website","https://opencollective.com/multihead-siamese-nets/organization/5/website","https://opencollective.com/multihead-siamese-nets/organization/6/website","https://opencollective.com/multihead-siamese-nets/organization/7/website","https://opencollective.com/multihead-siamese-nets/organization/8/website","https://opencollective.com/multihead-siamese-nets/organization/9/website"],"categories":["Similarity / Distance Measures"],"sub_categories":["Siamese Networks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlatkowski%2Fmultihead-siamese-nets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftlatkowski%2Fmultihead-siamese-nets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlatkowski%2Fmultihead-siamese-nets/lists"}