Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tencent-ailab/grndpodcastsum
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"
https://github.com/tencent-ailab/grndpodcastsum
Last synced: 2 days ago
JSON representation
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"
- Host: GitHub
- URL: https://github.com/tencent-ailab/grndpodcastsum
- Owner: tencent-ailab
- License: apache-2.0
- Created: 2022-03-04T06:27:37.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-10T04:33:29.000Z (about 1 year ago)
- Last Synced: 2023-10-10T05:28:46.663Z (about 1 year ago)
- Language: Python
- Size: 4.14 MB
- Stars: 12
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Towards Abstractive Grounded Summarization of Podcast Transcripts
We provide the source code for the paper ["Towards Abstractive Grounded Summarization of Podcast Transcripts"](https://arxiv.org/pdf/2203.11425.pdf) accepted at ACL'22. If you find the code useful, please cite the following paper.@inproceedings{song-etal-2022-grounded,
title="Towards Abstractive Grounded Summarization of Podcast Transcripts",
author = "Song, Kaiqiang and
Li, Chen and
Wang, Xiaoyang and
Yu, Dong and
Liu, Fei",
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
year={2022}
}## Goal
We proposed a grounded summarization system, which provide each summary sentence a linked chunk of the original transcripts and their audio/video recordings. It allows a human evaluator to quickly verify the summary content against source clips.
![example](https://raw.githubusercontent.com/tencent-ailab/GrndPodcastSum/main/example.png)## News
+ 03/22/2022 ArXiv Paper released.
+ 03/04/2022 Trained model and processed testing data released.
+ 03/03/2022 Code Released. Paper link, trained model and processed testing data will be released soon.
+ 02/23/2022 Paper accepted at ACL 2022.## Experiments
You can follow the below 4 steps to generate grounded podcast summaries or directly download the generated summary from this [link]()
## Step 1: Download Code, Model & Data
Download the code
```shell
git clone https://github.com/tencent-ailab/GrndPodcastSum.git
cd GrndPodcastSum
```Download the [Trained Models](https://tencentoverseas-my.sharepoint.com/:u:/g/personal/riversong_global_tencent_com/Ebi9ht9AbwlBi6FCxXeKCuQBcyoSMTRk-hofFdpLInU01w?e=JurGgT) to ``GrndPodcastSum`` Directory and unzip
```shell
unzip model.zip
```Download the [Processed Test Set (1027)](https://tencentoverseas-my.sharepoint.com/:u:/g/personal/riversong_global_tencent_com/ERhiDdS4BetHmZTlmO-JzXMBo3NQdqwZS9nikcem59_sDw?e=22uXt4) to ``GrndPodcastSum`` Directory and unzip
```shell
unzip data.zip
```## Step 2: Setup Environment
Create the environment using ``.yml`` file.
```shell
conda env create -f env.yml
conda activate GrndPodcastSum
```## Step 3. Offline Computing for Chunk Embeddings
Calculating the chunk embedding offline.
```shell
sh offline.sh
```## Step 4. Generating Grounded Summary
Use Grnd-token-nonoveralp model to generate summary.
```shell
sh test.sh
```## License
Copyright 2022 TencentLicensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
## Disclaimer
This repo is only for research purpose. It is not an officially supported Tencent product.