{"id":13711587,"url":"https://github.com/ymfa/seq2seq-summarizer","last_synced_at":"2025-05-06T21:31:39.879Z","repository":{"id":136050822,"uuid":"141190314","full_name":"ymfa/seq2seq-summarizer","owner":"ymfa","description":"Pointer-generator reinforced seq2seq summarization in PyTorch","archived":false,"fork":false,"pushed_at":"2019-04-16T17:36:23.000Z","size":75,"stargazers_count":358,"open_issues_count":13,"forks_count":72,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-11-13T22:34:57.000Z","etag":null,"topics":["abstractive-summarization","attention","attention-mechanism","coverage","coverage-mechanism","pointer-network","pytorch","reinforcement-learning","seq2seq","summarization","summarizer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ymfa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-16T20:22:30.000Z","updated_at":"2024-10-24T16:51:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"83b82da9-98f6-4b38-a173-ce3b36ffcc4b","html_url":"https://github.com/ymfa/seq2seq-summarizer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ymfa%2Fseq2seq-summarizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ymfa%2Fseq2seq-summarizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ymfa%2Fseq2seq-summarizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ymfa%2Fseq2seq-summarizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ymfa","download_url":"https://codeload.github.com/ymfa/seq2seq-summarizer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252772028,"owners_count":21801832,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abstractive-summarization","attention","attention-mechanism","coverage","coverage-mechanism","pointer-network","pytorch","reinforcement-learning","seq2seq","summarization","summarizer"],"created_at":"2024-08-02T23:01:09.613Z","updated_at":"2025-05-06T21:31:39.585Z","avatar_url":"https://github.com/ymfa.png","language":"Python","funding_links":[],"categories":["Repositories"],"sub_categories":[],"readme":"## Quick start\n\n#### Installation\nThe summarizer is written for *Python 3.6* and *PyTorch 0.4.0*.\nI believe you already have *numpy*.\nIt also requires *tqdm* for displaying progress bars, and *matplotlib* for plotting.\nOne of the data preprocessing scripts requires *nltk* to tokenize text.\n\nFor ROUGE evaluation, please put `ROUGE-1.5.5.pl` and its `data` directory under `data/` (i.e. there will be a `data/data/` directory); *pyrouge* is *not* required.\n\n#### Data\nThe expected data format is a text file (or a gzipped version of this, marked by the extension `.gz`) containing one example per line.\nIn each line, the source and the summary texts are separated by a tab, and are both *already* tokenized (you can add your own tokenizer in `utils.py`).\nParagraph breaks (newlines) are represented by the special token `\u003cP\u003e`.\n\nIn the `data/` directory, two scripts are provided to prepare the Google sentence compression data and the CNN/Daily Mail corpus for this summarizer.\n\n#### Training\nRunning `train.py` will start training using the parameters set in `params.py`.\nDescription of the parameters is provided below.\n\nTo resume a stopped training process, run the script with the command line option `--resume_from X.train.pt`, where `X.train.pt` is the filename of your saved training status.\nYou can also use commandline options to override any parameter set in `params.py`; for example `--cover_loss 1` sets `cover_loss` to 1.\nWhen resuming from a saved state, the original parameters will be used and `params.py` will be ignored, but you can still override some of the parameters using commandline options.\n\n#### Testing\nRunning `test.py` will evaluate the latest model trained using the parameters set in `params.py`.\nIt uses a beam search decoder, and will print out ROUGE scores.\nYou can also let it save the decoded summaries.\n\n## Model\n![Model architecture](https://user-images.githubusercontent.com/6981180/48382049-1b966b80-e6d7-11e8-9c5a-bc3329426221.png)\nThe model is defined in `model.py`, with the encoder, the decoder, and the combined model as three modules.\nAs background, [this tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html) outlines the general network architecture.\n\n| Parameter | Meaning |\n|-----|-----|\n| vocab_size | Vocabulary size of the encoder and the decoder |\n| hidden_size | Size of the hidden state of the encoder RNN |\n| dec_hidden_size | If set, a matrix will transform the final encoder state (concatenated if bidirectional) into this size for the decoder RNN |\n| embed_size | Size of the input word embeddings of the encoder and the decoder (they share the same word embeddings) |\n| enc_bidi | Whether the encoder RNN is bidirectional; if true, the default decoder hidden size is `hidden_size` * 2 |\n| enc_attn | Whether the decoder output depends on attention over encoder states |\n| dec_attn | Whether the decoder output depends on attention over past decoder states (to avoid repetition) |\n| pointer | Whether to use the pointer-generator network (requires `enc_attn`); a pointer network copies words (can be out-of-vocabulary) from the source |\n| out_embed_size | If set, a matrix will transform the decoder hidden state (its concatenation with encoder context and/or decoder context if `enc_attn` and/or `dec_attn` are true) into this size before applying softmax to generate an output word |\n| tie_embed | Whether the output word embeddings are tied to the input ones; if true, `out_embed_size` is automatically set to `embed_size` |\n\n#### Coverage\n| Parameter | Meaning |\n|-----|-----|\n| enc_attn_cover | Whether to provide the coverage vector as an input to the computation of attention over encoder states |\n| cover_func | What function (sum or max) should be used to aggregate previous attention distributions |\n| cover_loss | Coverage loss is multiplied by this value when added to total loss |\n| show_cover_loss | Whether to include coverage loss in total loss when displaying it in the progress bar |\n\nThe coverage mechanism is similar to that of See et al. (2017), whose `cover_func` is sum.\nIt has two components: one is in the model architecture, i.e. considering the coverage vector when computing attention, and the other in the loss, i.e. discouraging repeatedly attending to the same area of the input sequence.\n\nNote that because I use the simpler bilinear (Luong's \"general\") attention instead of their Bahdanau (Luong's \"concat\") attention, the coverage vector is also used in a simpler way.\nThat is, I subtract (with a learned weight) the coverage vector from the attention values prior to softmax.\n\n#### Regularization\nCurrently, only four dropout parameters are implemented.\n\n## Training\n| Parameter | Meaning |\n|-----|-----|\n| optimizer | Optimizer (adam or adagrad) |\n| lr | Learning rate |\n| adagrad_accumulator | The initial accumulator value of Adagrad |\n| batch_size | Batch size during training |\n| n_batches | Number of training batches per epoch |\n| val_batch_size | Batch size during validation |\n| n_val_batches | Number of validation batches per epoch |\n| n_epochs | Total number of epochs |\n| pack_seq | If true, the PyTorch functions `pack_padded_sequence`, `pad_packed_sequence` will be used to skip `\u003cPAD\u003e` inputs |\n| grad_norm | Gradient clipping: the maximum gradient norm that large gradients are scaled to |\n\n#### Teacher forcing\n| Parameter | Meaning |\n|-----|-----|\n| forcing_ratio | Initial percentage of using teacher forcing |\n| partial_forcing | If true, the random choice between teacher forcing and using the model's own output occurs every step, not every batch |\n| forcing_decay_type | If set (linear, exp, or sigmoid), teacher forcing ratio is decreased after every batch |\n| forcing_decay | See below for explanation |\n| sample | If true, when not teacher forcing, the next input word is sampled from the output word distribution, instead of always using the word of the highest probability |\n\nThree types of teacher forcing ratio decay (Bengio et al., 2015) are implemented:\n* Linear: Ratio is decreased by `forcing_decay` every batch.\n* Exponential: Ratio is multiplied by `forcing_decay` every batch.\n* Inverse sigmoid: Ratio is k/(k + exp(i/k)) where k is `forcing_decay` and i is batch number.\n\n#### Reinforcement learning\nReinforcement learning (RL) using self-critical policy gradient is implemented following Paulus et al. (2018).\nRL loss is based on the difference in ROUGE score between a sampled output (words are sampled from the softmax distribution) and a greedy baseline (words that have the highest probabilities are chosen).\n\n| Parameter | Meaning |\n|-----|-----|\n| rl_ratio | The weight in [0,1) of RL loss in the loss function; RL will be disabled if set to 0 |\n| rl_ratio_power | A factor in (0,1]; `rl_ratio` is set to `rl_ratio ** rl_ratio_power` after every epoch to increase the weight of RL loss |\n| rl_start_epoch | The epoch from when RL loss is enabled; useful because you want a strong baseline when RL begins |\n\n#### Data\n| Parameter | Meaning |\n|-----|-----|\n| embed_file | Path to the word embedding file; if set, input word embeddings will be initialized by pretrained embeddings |\n| data_path | Path to training data |\n| val_data_path | Path to validation data (optional) |\n| max_src_len | The maximum allowed length of every source text |\n| max_tgt_len | The maximum allowed length of every target text (reference summary) |\n| truncate_src | Whether to truncate source texts to `max_src_len`; if false, examples with overlong source text are discarded |\n| truncate_tgt |  Whether to truncate target texts to `max_tgt_len`; if false, examples with overlong target text are discarded |\n| model_path_prefix | The common prefix of models saved during training. |\n| keep_every_epoch | If false, only models of the latest epoch and the best epoch will be kept |\n\nThree types of files will be saved after every epoch (if `model_path_prefix` is `X`):\n* The model after epoch n is saved as `X.n.pt`.\n* The training status (for resuming) is saved as `X.train.pt`.\n* The plot of loss (training in blue, validation in green) and ROUGE-L (in red) is saved as `X.png`.\n\n## Testing\n| Parameter | Meaning |\n|-----|-----|\n| beam_size | The beam size of the beam search decoder |\n| min_out_len | The minimum acceptable output length |\n| max_out_len | The maximum acceptable output length |\n| out_len_in_words | If true, the output length does not count non-words such as punctuations |\n| test_data_path | Path to testing data |\n| test_sample_ratio | If less than 1, only this portion (randomly sampled) of the test set will be used |\n| test_save_results | If true, the decoded outputs will be saved in a file `X.results.tgz` where `X` is `model_path_prefix`|\n\nROUGE-1, ROUGE-2, ROUGE-L and ROUGE-SU4 are reported.\nROUGE-L is also used in validation and RL.\nPlease feel free to try other ROUGE metrics.\n\n#### Visualization\nA function `show_attention_map` is provided to visualize the attention weights over encoder states and the copying probability of the pointer-generator.\n![Visualization](https://user-images.githubusercontent.com/6981180/42974503-765b21f6-8baf-11e8-8928-9b7a88b033a8.png)\n\n## References\n* Bengio, S., Vinyals, O., Jaitly, N., \u0026 Shazeer, N. (2015). [Scheduled sampling for sequence prediction with recurrent neural networks.](http://papers.nips.cc/paper/5956-scheduled-sampling-for-sequence-prediction-with-recurrent-neural-networks) In _Advances in NIPS_.\n* Paulus, R., Xiong, C., \u0026 Socher, R. (2018). [A deep reinforced model for abstractive summarization.](https://arxiv.org/abs/1705.04304) In _ICLR_.\n* See, A., Liu, P. J., \u0026 Manning, C. D. (2017). [Get to the point: Summarization with pointer-generator networks.](http://www.aclweb.org/anthology/P17-1099) In _ACL_.\n\n## Disclaimer\n\nDue to change of circumstances, the original author of this repository can no longer maintain the repo. If you would like to maintain it, please contact @ymfa . Thank you!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fymfa%2Fseq2seq-summarizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fymfa%2Fseq2seq-summarizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fymfa%2Fseq2seq-summarizer/lists"}