Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bojone/SPACES
端到端的长本文摘要模型(法研杯2020司法摘要赛道)
https://github.com/bojone/SPACES
Last synced: 3 months ago
JSON representation
端到端的长本文摘要模型(法研杯2020司法摘要赛道)
- Host: GitHub
- URL: https://github.com/bojone/SPACES
- Owner: bojone
- Created: 2020-12-10T03:55:00.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-05-31T06:48:10.000Z (5 months ago)
- Last Synced: 2024-06-14T01:46:53.953Z (5 months ago)
- Language: Python
- Size: 149 KB
- Stars: 380
- Watchers: 8
- Forks: 94
- Open Issues: 30
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - bojone/SPACES
README
# SPACES
端到端的长文本摘要模型(法研杯2020司法摘要赛道)。博客介绍:https://kexue.fm/archives/8046
## 含义
我们将我们的模型称为SPACES,它正好是科学空间的域名之一([https://spaces.ac.cn](https://spaces.ac.cn)),具体含义如下:
- **S**:Sparse Softmax;
- **P**:Pretrained Language Model;
- **A**:Abstractive;
- **C**:Copy Mechanism;
- **E**:Extractive;
- **S**:Special Words。顾名思义,这是一个以词为单位的、包含预训练和Copy机制的“抽取-生成”式摘要模型,里边包含了一些我们对文本生成技术的最新研究成果。
## 运行
实验环境:tensorflow 1.14 + keras 2.3.1 + bert4keras 0.9.7
(如果是Windows,请用bert4keras>=0.9.8)
首先请在`snippets.py`中修改相关路径配置,然后再执行下述代码。
训练代码:
```bash
#! /bin/bashpython extract_convert.py
python extract_vectorize.pyfor ((i=0; i<15; i++));
do
python extract_model.py $i
donepython seq2seq_convert.py
python seq2seq_model.py
```预测代码
```python
from final import *
summary = predict(text, topk=3)
print(summary)
```## 交流
QQ交流群:808623966,微信群请加机器人微信号spaces_ac_cn
## 链接
- 博客:https://kexue.fm
- 追一:https://zhuiyi.ai/
- 预训练模型:https://github.com/ZhuiyiTechnology/pretrained-models
- WoBERT:https://github.com/ZhuiyiTechnology/WoBERT