Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/attilanagy234/neural-punctuator
Complimentary code for our paper Automatic punctuation restoration with BERT models
https://github.com/attilanagy234/neural-punctuator
bert punctuation-restoration transformer
Last synced: 2 months ago
JSON representation
Complimentary code for our paper Automatic punctuation restoration with BERT models
- Host: GitHub
- URL: https://github.com/attilanagy234/neural-punctuator
- Owner: attilanagy234
- License: mit
- Created: 2020-10-13T14:26:34.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-11-06T08:11:56.000Z (8 months ago)
- Last Synced: 2024-01-26T19:35:19.328Z (5 months ago)
- Topics: bert, punctuation-restoration, transformer
- Language: Jupyter Notebook
- Homepage:
- Size: 7.63 MB
- Stars: 46
- Watchers: 3
- Forks: 7
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-hungarian-nlp - neural-punctuator
README
# neural-punctuator
Complimentary code for our paper [**_Automatic punctuation restoration with BERT models_**](https://arxiv.org/abs/2101.07343) submitted to the XVII. Conference on Hungarian Computational Linguistics.## Abstract
We present an approach for automatic punctuation restoration with BERT models for English and Hungarian. For English, we conduct our experiments on Ted Talks, a commonly used benchmark for punctuation restoration, while for Hungarian we evaluate our models on the Szeged Treebank dataset. Our best models achieve a macro-averaged F1-score of 79.8 in English and 82.2 in Hungarian.## Repository Structure
```
.
|-- docs
| └── paper # The submitted paper
|-- notebooks # Notebooks for data preparation/preprocessing
|-- src
└── neural_punctuator
├── base # Base classes for training Torch models
├── configs # YAMl files defining the parameters of each model
├── models # Torch model definitions
├── preprocessors # Preprocessor class
├── trainers # Train logic
├── utils # Utility scripts (logging, metrics, tensorboard etc.)
└── wrappers # Wrapper classes for the models containing all the components needed for training/prediction
```
## Dataset
Ted Talk dataset (English) - http://hltc.cs.ust.hk/iwslt/index.php/evaluation-campaign/ted-task.htmlSzeged Treebank (Hungarian) - https://rgai.inf.u-szeged.hu/node/113
## Citation
If you use our work, please cite the following paper:
```
@article{nagy2021automatic,
title={Automatic punctuation restoration with bert models},
author={Nagy, Attila and Bial, Bence and {\'A}cs, Judit},
journal={arXiv preprint arXiv:2101.07343},
year={2021}
}
```## Authors
Attila Nagy, Bence Bial, Judit ÁcsBudapest University of Technology and Economics - Department of Automation and Applied Informatics