Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bheinzerling/pyrouge

A Python wrapper for the ROUGE summarization evaluation package
https://github.com/bheinzerling/pyrouge

evaluation-metrics nlp rouge summarization

Last synced: 6 days ago
JSON representation

A Python wrapper for the ROUGE summarization evaluation package

Host: GitHub
URL: https://github.com/bheinzerling/pyrouge
Owner: bheinzerling
License: mit
Created: 2014-01-14T14:08:25.000Z (about 11 years ago)
Default Branch: master
Last Pushed: 2021-02-10T00:58:46.000Z (almost 4 years ago)
Last Synced: 2025-01-09T04:14:09.185Z (13 days ago)
Topics: evaluation-metrics, nlp, rouge, summarization
Language: Python
Homepage:
Size: 120 KB
Stars: 251
Watchers: 4
Forks: 71
Open Issues: 21
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.txt
- License: LICENSE.txt

Awesome Lists containing this project

README

        pyrouge

=======

pyrouge is a Python wrapper for the ROUGE summarization evaluation

package. Getting ROUGE to work can require quite a bit of time. pyrouge

is designed to make getting ROUGE scores easier by automatically

converting your summaries into a format ROUGE understands, and

automatically generating the ROUGE configuration file.

Usage Examples

--------------

I have summaries in plain text format and want to get the ROUGE scores

===========================================================================

You can evaluate your plain text summaries like this:

::

    from pyrouge import Rouge155

    r = Rouge155()

    r.system_dir = 'path/to/system_summaries'

    r.model_dir = 'path/to/model_summaries'

    r.system_filename_pattern = 'some_name.(\d+).txt'

    r.model_filename_pattern = 'some_name.[A-Z].#ID#.txt'

    output = r.convert_and_evaluate()

    print(output)

    output_dict = r.output_to_dict(output)

In order to evaluate summaries, ROUGE needs to know where your summaries

and the gold standard summaries are, and how to match them. In ROUGE

parlance, your summaries are 'system' summaries and the gold standard

summaries are 'model' summaries. The summaries should be in separate

folders, whose paths are set with the ``system_dir`` and ``model_dir``

variables. All summaries should contain one sentence per line.

To automatically match a system summary with the corresponding model

summaries, pyrouge uses regular expressions. For example, let's assume

your system summaries are named with a combination of a fixed name and a

variable numeric ID like this:

| some\_name.001.txt

| some\_name.002.txt

| some\_name.003.txt

| ...

and the model summaries like this, with uppercase letters identifying

multiple model summaries for a given document:

| some\_name.A.001.txt

| some\_name.B.001.txt

| some\_name.C.001.txt

| some\_name.A.002.txt

| some\_name.B.002.txt

| ...

The group in the ``system_filename_pattern`` tells pyrouge which part of

the filename is the ID -- in this case ``(\d+)``. You have to use round

brackets to indicate a group, or else pyrouge won't be able to tell

apart the ID from the rest of the filename. pyrouge then uses that ID to

find all matching model summaries. The special placeholder ``#ID#``

tells pyrouge where it should expect the ID in the

``model_filename_pattern``. The ``[A-Z]`` part matches multiple model

summaries for that ID.

With the configuration done, invoking ``convert_and_evaluate()`` gets

you the ROUGE scores as a string. If you want to further process the

scores, you can parse the output into a dict with

``output_to_dict(output)``.

I only want to preprocess my summaries and then run ROUGE on my own

===================================================================

To convert plain text summaries into a format ROUGE understands, do:

::

    from pyrouge import Rouge155

    Rouge155.convert_summaries_to_rouge_format(system_input_dir, system_output_dir)

    Rouge155.convert_summaries_to_rouge_format(model_input_dir, model_output_dir)

This will convert all summaries in ``system_input_dir`` and

``model_input_dir``, and save them to their respective output

directories.

To generate the configuration file that ROUGE uses to match system and

model summaries, do:

::

    from pyrouge import Rouge155

    Rouge155.write_config_static(

        system_dir, system_filename_pattern,

        model_dir, model_filename_pattern,

        config_file_path)

The first four arguments are explained above. ``config_file_path``

specifies where to save the configuration file.

Using pyrouge from the command line

===================================

If you prefer the command line to Python and the pyrouge module, you can

use the following scripts, which are automatically installed and should

be runnable from anywhere on your system:

-  **pyrouge\_evaluate\_plain\_text\_files** gets you ROUGE scores

   for your plain text summaries. Example:

::

    pyrouge_evaluate_plain_text_files -s systems_plain/ -sfp "some_name.(\d+).txt" -m models_plain/ -mfp some_name.[A-Z].#ID#.txt

-  **pyrouge\_evaluate\_rouge\_format\_files** gets you ROUGE scores

   for summaries already converted to ROUGE format. Example usage for

   the ``sample-test/SL2003`` data that comes with ROUGE:

::

    pyrouge_evaluate_rouge_format_files -s systems -sfp "SL.P.10.R.11.SL062003-(\d+).html" -m models -mfp SL.P.10.R.[A-Z].SL062003-#ID#.html

Note that the system filename pattern is enclosed in quotation marks

because it contains special characters.

-  **pyrouge\_convert\_plain\_text\_to\_rouge\_format** converts

   plain text files into a format ROUGE understands. If your plain text

   files do not contain one sentence per line, this script can also

   split sentences, provided you have nltk and its Punkt sentence

   splitter installed. Example:

::

    pyrouge_convert_plain_text_to_rouge_format -i models_plain/ -o models_rouge

-  **pyrouge\_write\_config\_file** creates a configuration file you

   can use to run ROUGE on your own. Example:

::

    pyrouge_write_config_file -s systems -sfp "SL.P.10.R.11.SL062003-(\d+).html" -m models -mfp SL.P.10.R.[A-Z].SL062003-#ID#.html -c sl2003_config.xml

Running any of these with the ``-h`` option will display a usage message

explaining the various command line options.

Installation

------------

Instruction on installing ROUGE can be found

`here `__.

Depending on your system, you might have to run the following commands

as root.

To install pyrouge, run:

::

    pip install pyrouge

    

If you have trouble installing pyrouge on Windows, please check `this guide by Franck Dernoncourt `_.

Assuming a working ROUGE-1.5.5. installation, tell pyrouge the ROUGE

path with this command:

::

    pyrouge_set_rouge_path /absolute/path/to/ROUGE-1.5.5/directory

If saving the rouge path using this script doesn't work on your system, you can also supply the rouge path at runtime:

::

    r = Rouge155('/absolute/path/to/ROUGE-1.5.5/directory') 

To test if everything is installed correctly, run:

::

    python -m pyrouge.test

If everything works, you should see something like:

::

    Ran 10 tests in 18.055s

    OK

If you want to uninstall pyrouge:

::

    pip uninstall pyrouge