{"id":19267720,"url":"https://github.com/harvardnlp/seq2seq-attn","last_synced_at":"2025-04-08T16:07:48.121Z","repository":{"id":38904939,"uuid":"55192695","full_name":"harvardnlp/seq2seq-attn","owner":"harvardnlp","description":"Sequence-to-sequence model with LSTM encoder/decoders and attention","archived":false,"fork":false,"pushed_at":"2020-12-30T02:54:09.000Z","size":5291,"stargazers_count":1272,"open_issues_count":15,"forks_count":281,"subscribers_count":96,"default_branch":"master","last_synced_at":"2025-04-01T15:10:00.552Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://nlp.seas.harvard.edu/code","language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/harvardnlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-04-01T00:43:10.000Z","updated_at":"2025-03-29T04:59:44.000Z","dependencies_parsed_at":"2022-09-13T18:40:21.346Z","dependency_job_id":null,"html_url":"https://github.com/harvardnlp/seq2seq-attn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harvardnlp%2Fseq2seq-attn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harvardnlp%2Fseq2seq-attn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harvardnlp%2Fseq2seq-attn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harvardnlp%2Fseq2seq-attn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/harvardnlp","download_url":"https://codeload.github.com/harvardnlp/seq2seq-attn/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247878022,"owners_count":21011158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T20:13:54.260Z","updated_at":"2025-04-08T16:07:48.101Z","avatar_url":"https://github.com/harvardnlp.png","language":"Lua","funding_links":[],"categories":["Uncategorized","Torch","Model Zoo"],"sub_categories":["Uncategorized","Recurrent Networks"],"readme":"## Sequence-to-Sequence Learning with Attentional Neural Networks\n\n**UPDATE: Check-out the beta release of \u003ca href=\"http://opennmt.net\"\u003eOpenNMT\u003c/a\u003e a fully supported feature-complete rewrite of seq2seq-attn. Seq2seq-attn will remain supported, but new features and optimizations will focus on the new codebase.**\n\n[Torch](http://torch.ch) implementation of a standard sequence-to-sequence model with (optional)\nattention where the encoder-decoder are LSTMs. Encoder can be a bidirectional LSTM.\nAdditionally has the option to use characters\n(instead of input word embeddings) by running a convolutional neural network followed by a\n[highway network](http://arxiv.org/abs/1505.00387) over character embeddings to use as inputs.\n\nThe attention model is from\n[Effective Approaches to Attention-based\nNeural Machine Translation](http://stanford.edu/~lmthang/data/papers/emnlp15_attn.pdf),\nLuong et al. EMNLP 2015. We use the *global-general-attention* model with the\n*input-feeding* approach from the paper. Input-feeding is optional and can be turned off.\n\nThe character model is from [Character-Aware Neural\nLanguage Models](http://arxiv.org/abs/1508.06615), Kim et al. AAAI 2016.\n\nThere are a lot of additional options on top of the baseline model, mainly thanks to the fantastic folks\nat [SYSTRAN](http://www.systransoft.com). Specifically, there are functionalities which implement:\n* [Effective Approaches to Attention-based Neural Machine Translation](http://stanford.edu/~lmthang/data/papers/emnlp15_attn.pdf). Luong et al., EMNLP 2015.\n* [Character-based Neural Machine Translation](https://aclweb.org/anthology/P/P16/P16-2058.pdf). Costa-Jussa and Fonollosa, ACL 2016.\n* [Compression of Neural Machine Translation Models via Pruning](https://arxiv.org/pdf/1606.09274.pdf). See et al., COLING 2016.\n* [Sequence-Level Knowledge Distillation](https://arxiv.org/pdf/1606.07947.pdf). Kim and Rush., EMNLP 2016.\n* [Deep Recurrent Models with Fast Forward Connections for Neural Machine Translation](https://arxiv.org/pdf/1606.04199).\nZhou et al, TACL 2016.\n* [Guided Alignment Training for Topic-Aware Neural Machine Translation](https://arxiv.org/pdf/1607.01628). Chen et al., arXiv:1607.01628.\n* [Linguistic Input Features Improve Neural Machine Translation](https://arxiv.org/pdf/1606.02892). Senrich et al., arXiv:1606.02892\n\nSee below for more details on how to use them.\n\nThis project is maintained by [Yoon Kim](http://people.fas.harvard.edu/~yoonkim).\nFeel free to post any questions/issues on the issues page.\n### Dependencies\n\n#### Python\n* h5py\n* numpy\n\n#### Lua\nYou will need the following packages:\n* hdf5\n* nn\n* nngraph\n\nGPU usage will additionally require:\n* cutorch\n* cunn\n\nIf running the character model, you should also install:\n* cudnn\n* luautf8\n\n### Quickstart\n\nWe are going to be working with some example data in `data/` folder.\nFirst run the data-processing code\n\n```\npython preprocess.py --srcfile data/src-train.txt --targetfile data/targ-train.txt\n--srcvalfile data/src-val.txt --targetvalfile data/targ-val.txt --outputfile data/demo\n```\n\nThis will take the source/target train/valid files (`src-train.txt, targ-train.txt,\nsrc-val.txt, targ-val.txt`) and make some hdf5 files to be consumed by Lua.\n\n`demo.src.dict`: Dictionary of source vocab to index mappings.\n`demo.targ.dict`: Dictionary of target vocab to index mappings.\n`demo-train.hdf5`: hdf5 containing the train data.\n`demo-val.hdf5`: hdf5 file containing the validation data.\n\nThe `*.dict` files will be needed when predicting on new data.\n\nNow run the model\n\n```\nth train.lua -data_file data/demo-train.hdf5 -val_data_file data/demo-val.hdf5 -savefile demo-model\n```\nThis will run the default model, which consists of a 2-layer LSTM with 500 hidden units\non both the encoder/decoder.\nYou can also add `-gpuid 1` to use (say) GPU 1 in the cluster.\n\nNow you have a model which you can use to predict on new data. To do this we are\ngoing to be running beam search\n\n```\nth evaluate.lua -model demo-model_final.t7 -src_file data/src-val.txt -output_file pred.txt\n-src_dict data/demo.src.dict -targ_dict data/demo.targ.dict\n```\nThis will output predictions into `pred.txt`. The predictions are going to be quite terrible,\nas the demo dataset is small. Try running on some larger datasets! For example you can download\nmillions of parallel sentences for [translation](http://www.statmt.org/wmt15/translation-task.html)\nor [summarization](https://github.com/harvardnlp/sent-summary).\n\n### Details\n#### Preprocessing options (`preprocess.py`)\n\n* `srcvocabsize, targetvocabsize`: Size of source/target vocabularies. This is constructed\nby taking the top X most frequent words. Rest are replaced with special UNK tokens.\n* `srcfile, targetfile`: Path to source/target training data, where each line represents a single\nsource/target sequence.\n* `srcvalfile, targetvalfile`: Path to source/target validation data.\n* `batchsize`: Size of each mini-batch.\n* `seqlength`: Maximum sequence length (sequences longer than this are dropped).\n* `outputfile`: Prefix of the output file names.\n* `maxwordlength`: For the character models, words are truncated (if longer than maxwordlength)\nor zero-padded (if shorter) to `maxwordlength`.\n* `chars`: If 1, construct the character-level dataset as well.  This might take up a lot of space\ndepending on your data size, so you may want to break up the training data into different shards.\n* `srcvocabfile, targetvocabfile`: If working with a preset vocab, then including these paths\nwill ignore the `srcvocabsize,targetvocabsize`.\n* `unkfilter`: Ignore sentences with too many UNK tokens. Can be an absolute count limit (if \u003e 1)\nor a proportional limit (0 \u003c unkfilter \u003c 1).\n* `shuffle`: Shuffle sentences.\n* `alignfile`, `alignvalfile`: If provided with filenames that contain 'Pharaoh' format alignment\non the train and validation data, source-to-target alignments are stored in the dataset.\n\n#### Training options (`train.lua`)\n**Data options**\n\n* `data_file, val_data_file`: Path to the training/validation `*.hdf5` files created from running\n`preprocess.py`.\n* `savefile`: Savefile name (model will be saved as `savefile_epochX_PPL.t7` after every `save_every`\nepoch where X is the X-th epoch and PPL is the validation perplexity at the epoch.\n* `num_shards`: If the training data has been broken up into different shards,\nthen this is the number of shards.\n* `train_from`: If training from a checkpoint then this is the path to the pre-trained model.\n\n**Model options**\n\n* `num_layers`: Number of layers in the LSTM encoder/decoder (i.e. number of stacks).\n* `rnn_size`: Size of LSTM hidden states.\n* `word_vec_size`: Word embedding size.\n* `attn`:  If = 1, use attention over the source sequence during decoding. If = 0, then it\nuses the last hidden state of the encoder as the context at each time step.\n* `brnn`: If = 1, use a bidirectional LSTM on the encoder side. Input embeddings (or CharCNN\nif using characters)  are shared between the forward/backward LSTM, and hidden states of the\ncorresponding forward/backward LSTMs are added to obtain the hidden representation for that\ntime step.\n* `use_chars_enc`: If = 1, use characters on the encoder side (as inputs).\n* `use_chars_dec`: If = 1, use characters on the decoder side (as inputs).\n* `reverse_src`: If = 1, reverse the source sequence. The original sequence-to-sequence paper\nfound that this was crucial to achieving good performance, but with attention models this\ndoes not seem necessary. Recommend leaving it to 0.\n* `init_dec`: Initialize the hidden/cell state of the decoder at time 0 to be the last\nhidden/cell state of the encoder. If 0, the initial states of the decoder are set to zero vectors.\n* `input_feed`: If = 1, feed the context vector at each time step as additional input (via\nconcatenation with the word embeddings) to the decoder.\n* `multi_attn`: If \u003e 0, then use a *multi-attention* on this layer of the decoder. For example, if\n`num_layers = 3` and `multi_attn = 2`, then the model will do an attention over the source sequence\non the second layer (and use that as input to the third layer) *and* the penultimate layer.\nWe've found that this did not really improve performance on translation, but may be helpful for\nother tasks where multiple attentional passes over the source sequence are required\n(e.g. for more complex reasoning tasks).\n* `res_net`: Use residual connections between LSTM stacks whereby the input to the l-th LSTM\nlayer of the hidden state of the l-1-th LSTM layer summed with hidden state of the l-2th LSTM layer.\nWe didn't find this to really help in our experiments.\n\nBelow options only apply if using the character model.\n\n* `char_vec_size`: If using characters, size of the character embeddings.\n* `kernel_width`: Size (i.e. width) of the convolutional filter.\n* `num_kernels`: Number of convolutional filters (feature maps). So the representation from characters will have this many dimensions.\n* `num_highway_layers`: Number of highway layers in the character composition model.\n\nTo build a model with guided alignment (implemented similarly to [Guided Alignment Training for Topic-Aware Neural Machine Translation](https://arxiv.org/abs/1607.01628) (Chen et al. 2016)):\n* `guided_alignment`: If 1, use external alignments to guide the attention weights\n* `guided_alignment_weight`: weight for guided alignment criterion\n* `guided_alignment_decay`: decay rate per epoch for alignment weight\n\n**Optimization options**\n\n* `epochs`: Number of training epochs.\n* `start_epoch`: If loading from a checkpoint, the epoch from which to start.\n* `param_init`: Parameters of the model are initialized over a uniform distribution with support\n`(-param_init, param_init)`.\n* `optim`: Optimization method, possible choices are 'sgd', 'adagrad', 'adadelta', 'adam'.\nFor seq2seq I've found vanilla SGD to work well but feel free to experiment.\n* `learning_rate`: Starting learning rate. For 'adagrad', 'adadelta', and 'adam', this is the global\nlearning rate. Recommended settings vary based on `optim`: sgd (`learning_rate = 1`), adagrad\n(`learning_rate = 0.1`), adadelta (`learning_rate = 1`), adam (`learning_rate = 0.1`).\n* `layer_lrs`: Comma-separated learning rates for encoder, decoder, and generator when using 'adagrad', 'adadelta', or 'adam' for 'optim' option. Layer-specific learning rates cannot currently be used with sgd.\n* `max_grad_norm`: If the norm of the gradient vector exceeds this, renormalize to have its norm equal to `max_grad_norm`.\n* `dropout`: Dropout probability. Dropout is applied between vertical LSTM stacks.\n* `lr_decay`: Decay learning rate by this much if (i) perplexity does not decrease on the validation\nset or (ii) epoch has gone past the `start_decay_at` epoch limit.\n* `start_decay_at`: Start decay after this epoch.\n* `curriculum`: For this many epochs, order the minibatches based on source sequence length. (Sometimes setting this to 1 will increase convergence speed).\n* `feature_embeddings_dim_exponent`: If the additional feature takes `N` values, then the embbeding dimension will be set to `N^exponent`.\n* `pre_word_vecs_enc`: If using pretrained word embeddings (on the encoder side), this is the\npath to the *.hdf5 file with the embeddings. The hdf5 should have a single field `word_vecs`,\nwhich references an array with dimensions vocab size by embedding size. Each row should be a word\nembedding and follow the same indexing scheme as the *.dict files from running\n`preprocess.py`. In order to be consistent with `beam.lua`, the first 4 indices should\nalways be `\u003cblank\u003e`, `\u003cunk\u003e`, `\u003cs\u003e`, `\u003c/s\u003e` tokens.\n* `pre_word_vecs_dec`: Path to *.hdf5 for pretrained word embeddings on the decoder side. See above\nfor formatting of the *.hdf5 file.\n* `fix_word_vecs_enc`: If = 1, fix word embeddings on the encoder side.\n* `fix_word_vecs_dec`: If = 1, fix word embeddings on the decoder side.\n* `max_batch_l`: Batch size used to create the data in `preprocess.py`. If this is left blank\n(recommended), then the batch size will be inferred from the validation set.\n\n**Other options**\n\n* `start_symbol`: Use special start-of-sentence and end-of-sentence tokens on the source side.\nWe've found this to make minimal difference.\n* `gpuid`: Which GPU to use (-1 = use cpu).\n* `gpuid2`: If this is \u003e=0, then the model will use two GPUs whereby the encoder is on the first\nGPU and the decoder is on the second GPU. This will allow you to train bigger models.\n* `cudnn`: Whether to use cudnn or not for convolutions (for the character model). `cudnn`\nhas much faster convolutions so this is highly recommended if using the character model.\n* `save_every`: Save every this many epochs.\n* `print_every`: Print various stats after this many batches.\n* `seed`: Change the random seed for random numbers in torch - use that option to train alternate models for ensemble\n* `prealloc`: when set to 1 (default), enable memory preallocation and sharing between clones - this reduces by a lot the used memory - there should not be\nany situation where you don't need it. Also - since memory is preallocated, there is not (major)\nmemory increase during the training. When set to 0, it rolls back to original memory optimization.\n\n#### Decoding options (`beam.lua`)\n\n* `model`: Path to model .t7 file.\n* `src_file`: Source sequence to decode (one line per sequence).\n* `targ_file`: True target sequence (optional).\n* `output_file`: Path to output the predictions (each line will be the decoded sequence).\n* `src_dict`: Path to source vocabulary (`*.src.dict` file from `preprocess.py`).\n* `targ_dict`: Path to target vocabulary (`*.targ.dict` file from `preprocess.py`).\n* `feature_dict_prefix`: Prefix of the path to the features vocabularies (`*.feature_N.dict` files from `preprocess.py`).\n* `char_dict`: Path to character vocabulary (`*.char.dict` file from `preprocess.py`).\n* `beam`: Beam size (recommend keeping this at 5).\n* `max_sent_l`: Maximum sentence length. If any of the sequences in `srcfile` are longer than this\nit will error out.\n* `simple`: If = 1, output prediction is simply the first time the top of the beam\nends with an end-of-sentence token. If = 0, the model considers all hypotheses that have\nbeen generated so far that ends with end-of-sentence token and takes the highest scoring\nof all of them.\n* `replace_unk`: Replace the generated UNK tokens with the source token that had the highest\nattention weight. If `srctarg_dict` is provided, it will lookup the identified source token\nand give the corresponding target token. If it is not provided (or the identified source token\ndoes not exist in the table) then it will copy the source token.\n* `srctarg_dict`: Path to source-target dictionary to replace UNK tokens. Each line should be a\nsource token and its corresponding target token, separated by `|||`. For example\n```\nhello|||hallo\nukraine|||ukrainische\n```\nThis dictionary can be obtained by, for example, running an alignment model as a preprocessing step.\nWe recommend [fast_align](https://github.com/clab/fast_align).\n* `score_gold`: If = 1, score the true target output as well.\n* `n_best`: If \u003e 1, then it will also output an n_best list of decoded sentences in the following\nformat.\n```\n1 ||| sentence_1 ||| sentence_1_score\n2 ||| sentence_2 ||| sentence_2_score\n```\n* `gpuid`: ID of the GPU to use (-1 = use CPU).\n* `gpuid2`: ID if the second GPU (if specified).\n* `cudnn`: If the model was trained with `cudnn`, then this should be set to 1 (otherwise the model\nwill fail to load).\n* `rescore`: when set to scorer name, use scorer to find hypothesis with highest score - available 'bleu', 'gleu'\n* `rescore_param`: parameter to rescorer - for bleu/gleu ngram length\n\n#### Using additional input features\n[Linguistic Input Features Improve Neural Machine Translation](https://arxiv.org/abs/1606.02892) (Senrich et al. 2016) shows that translation performance can be increased by using additional input features.\n\nSimilarly to this work, you can annotate each word in the **source** text by using the `-|-` separator:\n\n```\nword1-|-feat1-|-feat2 word2-|-feat1-|-feat2\n```\n\nIt supports an arbitrary number of features with arbitrary labels. However, all input words must have the **same** number of annotations. See for example `data/src-train-case.txt` which annotates each word with the case information.\n\nTo evaluate the model, the option `-feature_dict_prefix` is required on `evaluate.lua` which points to the prefix of the features dictionnaries generated during the preprocessing.\n\n#### Pruning a model\n\n[Compression of Neural Machine Translation Models via Pruning](http://arxiv.org/pdf/1606.09274v1.pdf) (See et al. 2016) shows that a model can be aggressively pruned while keeping the same performace.\n\nTo prune a model - you can use `prune.lua` which implement class-bind, and class-uniform pruning technique from the paper.\n\n* `model`: the model to prune\n* `savefile`: name of the pruned model\n* `gpuid`: Which gpu to use. -1 = use CPU. Depends if the model is serialized for GPU or CPU\n* `ratio`: pruning rate\n* `prune`: pruning technique `blind` or `uniform`, by default `blind`\n\nnote that the pruning cut connection with lowest weight in the linear models by using a boolean mask. The size of the file is a little larger since it stores the actual full matrix and the binary mask.\n\nModels can be retrained - typically you can recover full capacity of a model pruned at 60% or even 80% by few epochs of additional trainings.\n\n#### Switching between GPU/CPU models\nBy default, the model will always save the final model as a CPU model, but it will save the\nintermediate models as a CPU/GPU model depending on how you specified `-gpuid`.\nIf you want to run beam search on the CPU with an intermediate model trained on the GPU,\nyou can use `convert_to_cpu.lua` to convert the model to CPU and run beam search.\n\n#### GPU memory requirements/Training speed\nTraining large sequence-to-sequence models can be memory-intensive. Memory requirements will\ndependent on batch size, maximum sequence length, vocabulary size, and (obviously) model size.\nHere are some benchmark numbers on a GeForce GTX Titan X.\n(assuming batch size of 64, maximum sequence length of 50 on both the source/target sequence,\nvocabulary size of 50000, and word embedding size equal to rnn size):\n\n(`prealloc = 0`)\n* 1-layer, 100 hidden units: 0.7G, 21.5K tokens/sec\n* 1-layer, 250 hidden units: 1.4G, 14.1K tokens/sec\n* 1-layer, 500 hidden units: 2.6G, 9.4K tokens/sec\n* 2-layers, 500 hidden units: 3.2G, 7.4K tokens/sec\n* 4-layers, 1000 hidden units: 9.4G, 2.5K tokens/sec\n\nThanks to some fantastic work from folks at [SYSTRAN](http://www.systransoft.com), turning `prealloc` on\nwill lead to much more memory efficient training\n\n(`prealloc = 1`)\n* 1-layer, 100 hidden units: 0.5G, 22.4K tokens/sec\n* 1-layer, 250 hidden units: 1.1G, 14.5K tokens/sec\n* 1-layer, 500 hidden units: 2.1G, 10.0K tokens/sec\n* 2-layers, 500 hidden units: 2.3G, 8.2K tokens/sec\n* 4-layers, 1000 hidden units: 6.4G, 3.3K tokens/sec\n\nTokens/sec refers to total (i.e. source + target) tokens processed per second.\nIf using different batch sizes/sequence length, you should (linearly) scale\nthe above numbers accordingly. You can make use of memory on multiple GPUs by using\n`-gpuid2` option in `train.lua`. This will put the encoder on the GPU specified by\n`-gpuid`, and the decoder on the GPU specified by `-gpuid2`.\n\n#### Evaluation\nFor translation, evaluation via BLEU can be done by taking the output from `beam.lua` and using the\n`multi-bleu.perl` script from [Moses](https://github.com/moses-smt/mosesdecoder). For example\n\n```\nperl multi-bleu.perl gold.txt \u003c pred.txt\n```\n\n#### Evaluation of States and Attention\nattention_extraction.lua can be used to extract the attention and the LSTM states. It uses the following (required) options:\n\n* `model`: Path to model .t7 file.\n* `src_file`: Source sequence to decode (one line per sequence).\n* `targ_file`: True target sequence.\n* `src_dict`: Path to source vocabulary (`*.src.dict` file from `preprocess.py`).\n* `targ_dict`: Path to target vocabulary (`*.targ.dict` file from `preprocess.py`).\n\nOutput of the script are two files, `encoder.hdf5` and `decoder.hdf5`. The encoder contains the states for every layer of the encoder LSTM and the offsets for the start of each source sentence. The decoder contains the states for the decoder LSTM layers and the offsets for the start of gold sentence. It additionally contains the attention for each time step (if the model uses attention).\n\n\n#### Pre-trained models\nWe've uploaded English \u003c-\u003e German models trained on 4 million sentences from\n[Workshop on Machine Translation 2015](http://www.statmt.org/wmt15/translation-task.html).\nDownload link is below:\n\nhttps://drive.google.com/open?id=0BzhmYioWLRn_aEVnd0ZNcWd0Y2c\n\nThese models are 4-layer LSTMs with 1000 hidden units and essentially replicates the results from\n[Effective Approaches to Attention-based\nNeural Machine Translation](http://stanford.edu/~lmthang/data/papers/emnlp15_attn.pdf),\nLuong et al. EMNLP 2015.\n\n#### Acknowledgments\nOur implementation utilizes code from the following:\n* [Andrej Karpathy's char-rnn repo](https://github.com/karpathy/char-rnn)\n* [Wojciech Zaremba's lstm repo](https://github.com/wojzaremba/lstm)\n* [Element rnn library](https://github.com/Element-Research/rnn)\n\n#### Licence\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharvardnlp%2Fseq2seq-attn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharvardnlp%2Fseq2seq-attn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharvardnlp%2Fseq2seq-attn/lists"}