https://github.com/bangoc123/bleu
Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation
https://github.com/bangoc123/bleu
bleu bleu-score machine-translation
Last synced: 5 months ago
JSON representation
Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation
- Host: GitHub
- URL: https://github.com/bangoc123/bleu
- Owner: bangoc123
- Created: 2021-09-16T12:20:21.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-09-16T14:15:59.000Z (about 4 years ago)
- Last Synced: 2025-03-31T16:34:55.386Z (6 months ago)
- Topics: bleu, bleu-score, machine-translation
- Language: Python
- Homepage:
- Size: 7.81 KB
- Stars: 9
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
Awesome Lists containing this project
README
# BLEU Score
Implementation for paper:
[BLEU](https://aclanthology.org/P02-1040.pdf): a Method for Automatic Evaluation of Machine Translation
Author: Ba Ngoc from [ProtonX](https://protonx.ai/)
BLEU score is a popular metric to evaluate machine translation. Check out the recent [Transformer](https://github.com/bangoc123/transformer) project we published.
I. Usage
```python
from bleu_score import cal_corpus_bleu_scorecandidates = ['eating chicken chicken is a eating a eating chicken',
'eating chicken chicken is not good']
references_list = [['a chicken is eating chicken', 'there is a chicken eating chicken'], [
'a chicken is eating chicken', 'there is a chicken eating chicken']]bleu_score = cal_corpus_bleu_score(candidates, references_list,
weights=(0.25, 0.25, 0.25, 0.25), N=4)print('Bleu Score: {}'.format(bleu_score))
```II. BLEU Score Formula
### 1. Precision
We count specific n-grams in the candidates and the number of those grams in the references. Then we calculate the proportion of two countings and get the precision.
**Important to note:** Count clip means that the number of typical n-grams can not exceed the maximum number of that n-grams in **any single** reference.
For example: if `('a', 'a')` gram exists **3 times** in a candidate. However, the maximum number of this gram in any single reference is **2**. So we will use value 2 for calculation.
If you never heard about grams? It means that we count the number of continuous substrings with a pre-set length in a string.
Candidate 1: `'eating chicken chicken is a eating a eating chicken'`
-------Unigram------
| | |
|---|---|
eating | 3
chicken | 3
is | 1
a | 2-------bigrams------
| | |
|---|---|
eating chicken | 2
chicken chicken | 1
chicken is | 1
is a | 1
a eating | 2
eating a | 1We can do the same thing with trigrams and 4-grams
### 2. Sentence brevity penalty
We prefer the reference with a length that is closest to the candidate's.
Checkout function `get_eff_ref_length` in utils.py.
`c`: the total lengths of all candidates
`r`: the total lengths of all effective reference lengths
### 3. BLEU Formula
`N`: the number of grams
`w`: list of pre-set weight for each gram