https://github.com/ayaka14732/bart-base-cantonese

The pre-trained Cantonese BART model
https://github.com/ayaka14732/bart-base-cantonese

bart cantonese huggingface nlp pre-trained-model transformer

Last synced: about 1 year ago
JSON representation

The pre-trained Cantonese BART model

Host: GitHub
URL: https://github.com/ayaka14732/bart-base-cantonese
Owner: ayaka14732
Created: 2022-11-06T15:32:47.000Z (over 3 years ago)
Default Branch: 2nd-stage-pre-train
Last Pushed: 2022-11-06T15:37:42.000Z (over 3 years ago)
Last Synced: 2025-03-25T15:50:13.996Z (about 1 year ago)
Topics: bart, cantonese, huggingface, nlp, pre-trained-model, transformer
Language: Python
Homepage: https://huggingface.co/Ayaka/bart-base-cantonese
Size: 262 KB
Stars: 8
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # BART Base Cantonese

This is the Cantonese model of BART base. It is obtained by a second-stage pre-training on the [LIHKG dataset](https://github.com/ayaka14732/lihkg-scraper) based on the [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese) model.

This project is supported by Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).

## Usage

```python

from transformers import BertTokenizer, BartForConditionalGeneration, Text2TextGenerationPipeline

tokenizer = BertTokenizer.from_pretrained('Ayaka/bart-base-cantonese')

model = BartForConditionalGeneration.from_pretrained('Ayaka/bart-base-cantonese')

text2text_generator = Text2TextGenerationPipeline(model, tokenizer)  

output = text2text_generator('聽日就要返香港，我激動到[MASK]唔着', max_length=50, do_sample=False)

print(output[0]['generated_text'].replace(' ', ''))

# output: 聽日就要返香港，我激動到瞓唔着

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ayaka14732/bart-base-cantonese

Awesome Lists containing this project

README