https://github.com/rjtpp/rouge-evaluation-tool

MikeLab 2024 subproject. Developed for the computation of ROUGE score.
https://github.com/rjtpp/rouge-evaluation-tool

python rouge-metric

Last synced: 1 day ago
JSON representation

MikeLab 2024 subproject. Developed for the computation of ROUGE score.

Host: GitHub
URL: https://github.com/rjtpp/rouge-evaluation-tool
Owner: RJTPP
License: mit
Created: 2024-12-20T18:16:14.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-12-26T06:11:07.000Z (10 months ago)
Last Synced: 2025-07-17T02:37:12.048Z (3 months ago)
Topics: python, rouge-metric
Language: Python
Homepage:
Size: 24.4 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # ROUGE Evaluation Tool

[![Python 3.6+](https://img.shields.io/badge/python-3.6+-blue.svg)](https://www.python.org/)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

This project provides a Python implementation to calculate **ROUGE** (Recall-Oriented Understudy for Gisting Evaluation) scores. It includes methods for computing ROUGE-1, ROUGE-2, and ROUGE-L metrics, which are commonly used to evaluate text similarity and summarization performance.

## Table of Contents 

  - [Features](#features)

  - [Requirements and Dependencies](#requirements-and-dependencies)

  - [Quick Start](#quick-start)

  - [Dataset Structure](#dataset-structure)

  - [Output](#output)

  - [Score Explanation](#score-explanation)

  - [License](#license)

  - [Contributors](#contributors)

  

## Features

- Reads input datasets from JSON files and outputs results as JSON.

- Outputs:

  - `ROUGE-1` calculation: Measures unigram overlap.

  - `ROUGE-2` calculation: Measures bigram overlap.

  - `ROUGE-L` calculation: Measures the longest common subsequence (LCS).

## Requirements and Dependencies

This project was developed using Python 3.6 and is tested to be compatible with Python 3.6 through 3.12 and should work with newer versions. No additional dependencies are required.

## Quick Start

1. Clone the repository:

```bash

git clone https://github.com/RJTPP/ROUGE-Evaluation-Tool.git && 

cd ROUGE-Evaluation-Tool

```

2. Prepare the Dataset:

- Create a JSON file named dataset.json inside the /data folder with the following structure. See the [Dataset Structure](#dataset-structure) section for more details.

```json

[

  {

    "reference": "the reference text",

    "candidate": [

      "candidate text 1",

      "candidate text 2"

    ]

  }

]

```

3. Run the script:

```bash

python main.py

```

4. Check the Output:

- The ROUGE scores will be saved in `/data/scores.json`.

- For more details, see the [Output](#output) and [Score Explanation](#score-explanation) section.

## Dataset Structure

The input dataset must be placed in the /data folder as a JSON file named dataset.json. It should contain a list of evaluation instances, where each instance consists of:

1. `reference` (String): The reference or ground truth text that serves as the standard for the comparison.

2. `candidate` (Array of Strings): A list of candidate texts generated by a model or system to be evaluated against the reference text.

### Example

```json

[

  {

    "reference": "the ground truth or expected text",

    "candidate": [

      "candidate text 1",

      "candidate text 2",

      "... additional candidate texts ..."

    ]

  },

  "... additional evaluation instances ..."

]

```

## Output

The script generates a `scores.json` file in the `/data` folder with the following structure:

```json

[

  {

    "candidate": "candidate text 1",

    "reference": "the reference text",

    "ROUGE-1": {

      "precision": 0.75,

      "recall": 0.6,

      "f-measure": 0.67

    },

    "ROUGE-2": {

      "precision": 0.5,

      "recall": 0.4,

      "f-measure": 0.44

    },

    "ROUGE-L": {

      "precision": 0.7,

      "recall": 0.55,

      "f-measure": 0.62

    }

  }

]

```

## Score Explanation

Each ROUGE score is represented by three key metrics:

1. **Precision**:

    - Measures the fraction of overlapping elements in the candidate text that are also in the reference text.

```math

\text{Precision} = \frac{\text{Overlap Count}}{\text{Candidate Text Word Count}}

```

2. **Recall**:

    - Measures the fraction of overlapping elements in the reference text that are also in the candidate text.

```math

\text{Recall} = \frac{\text{Overlap Count}}{\text{Reference Text Word Count}}

```

1. **F-Measure**:

    - The harmonic mean of Precision and Recall, giving a balanced score between the two.

```math

\text{F-Measure} = \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}

```

The ROUGE scores are calculated as:

- ROUGE-1: Based on unigram overlap (individual words).

- ROUGE-2: Based on bigram overlap (pairs of consecutive words).

- ROUGE-L: Based on the longest common subsequence (LCS) between the candidate and reference texts.

## License

This project is released under the [MIT License](LICENSE).

You are free to use, modify, and distribute this software under the terms of the MIT License. See the LICENSE file for detailed terms and conditions.

## Contributors

Rajata Thamcharoensatit ([@RJTPP](https://github.com/RJTPP))

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rjtpp/rouge-evaluation-tool

Awesome Lists containing this project

README