Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/LLluoling/PENS-Personalized-News-Headline-Generation
Code for PENS: A Dataset and Generic Framework for Personalized News Headline Generation
https://github.com/LLluoling/PENS-Personalized-News-Headline-Generation
Last synced: 3 months ago
JSON representation
Code for PENS: A Dataset and Generic Framework for Personalized News Headline Generation
- Host: GitHub
- URL: https://github.com/LLluoling/PENS-Personalized-News-Headline-Generation
- Owner: LLluoling
- Created: 2021-07-26T16:39:18.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-11-10T09:09:57.000Z (almost 2 years ago)
- Last Synced: 2024-06-24T05:36:28.524Z (5 months ago)
- Language: Python
- Size: 129 KB
- Stars: 32
- Watchers: 3
- Forks: 10
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - PENS-Personalized-News-Headline-Generation
README
# PENS - ACL2021
## {PENS}: A Dataset and Generic Framework for Personalized News Headline Generation
This is a Pytorch implementation of [PENS](https://www.microsoft.com/en-us/research/uploads/prod/2021/06/ACL2021_PENS_Camera_Ready_1862_Paper.pdf).## I. Guidance
### 0. Enviroment
- Install pytorch version >= '1.4.0'
- Install the pensmodule package under ''PENS-Personalized-News-Headline-Generation'' using code ``` pip install -e . ```### 1. Data Prepare
- Download the PENS dataset [here](https://msnews.github.io/pens.html) and put the dataset under data/.
- (optional) Download glove.840B.300d.txt under data/ if you choose to use pretrained glove word embeddings.### 2. Running Code
- ```cd pensmodule ```
- Follow the order: Preprocess --> UserEncoder --> Generator and run the pipeline**.ipynb notebook to preprocess, train the user encoder and the train generator, individually.More infor please refer to the homepage of the [introduction of PENS dataset](https://msnews.github.io/pens.html).
## II. Training Tips
Here we take NRMS as user encoder, the followings are some experiment detailes that are not illustrated in the paper.
### 0. TIPS
- In this paper, we used mento carlo search for RL training, which is very slow in training and sometimes hard to converge. Thus we provide ac training in this provided code.
- If you pretrain the generator for a couple of epoches, you should set a very small learning rate during RL training.
- **Large improvements can be made compared with the baselines that we provided, the importance always lies in the design of reward functions.**### 1. Training Reward
![image info](./docs/reward.png)### 2. Test performance on different training steps
![image info](./docs/rouge1.png)
![image info](./docs/rouge2.png)
![image info](./docs/rougel.png)### 3. Cases
| epoch | generated headline |
| :-----| ----: |
| Case 1 | |
| 1000 | top stockton news arrests 2 impaired drivers |
| 5000 | top stockton news arrests 2 impaired drivers who had unrestrained children in their cars |
| Case 2 | |
| 1000 | trump says tens of thousands of people couldn t get in 2020 rally |
| 5000 | trump says tens of thousands of people outside his 2020 campaign rally at orlando |**Noted:**
- With the training process goes, the generated sentences are more fluent and contains more rich information.
- **Rouge scores is not the best evaluation scores, but a compromising choice. Of course the best evaluation is to check out the real clicks of users to see if they are more interested. Thus sometimes a more fluent and human-like generated sentence gets lower rouge scores.**