https://github.com/nicolay-r/frame-based-attitude-extraction-workflow

Workflow source code for ISPRAS-2021 journal paper "Language Models Application in Sentiment Attitude Extraction Task" (in Russian)
https://github.com/nicolay-r/frame-based-attitude-extraction-workflow

distant-supervision relation-extraction sentiment-analysis

Last synced: 4 months ago
JSON representation

Workflow source code for ISPRAS-2021 journal paper "Language Models Application in Sentiment Attitude Extraction Task" (in Russian)

Host: GitHub
URL: https://github.com/nicolay-r/frame-based-attitude-extraction-workflow
Owner: nicolay-r
License: mit
Created: 2020-09-25T09:18:06.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2022-02-08T08:05:12.000Z (over 3 years ago)
Last Synced: 2025-02-12T00:39:07.618Z (8 months ago)
Topics: distant-supervision, relation-extraction, sentiment-analysis
Language: Python
Homepage:
Size: 1.02 MB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Frame-Based Attitude Extraction Workflow

A source code for a core of news processing workflow.

It provides scripts for sentiment attitude extraction using frame-based method.

![](logo.png)

## Dependencies

* python == 3.6

* sqlite3

* arekit == [0.19.5](https://github.com/nicolay-r/AREkit/tree/0.19.5-bdr-elsevier-2020-py3)

    * Utilized as a core library for text parsing, frames reading, stemming application, etc.

* ner == 0.0.2 

    * Optional, for `deep-ner` NER model

* deep-pavlov == 1.11.0 

    * Optional, for `bert-mult-ontonotes` NER model

    

# Installation

* Step 1: Install dependencies.

``` bash

# Install AREkit dependency

git clone --single-branch --branch 0.19.5-bdr-elsevier-2020-py3 git@github.com:nicolay-r/AREkit.git core

# Download python dependencies

pip install -r requirements.txt

```

    

# Usage 

## Prepare data

1. Place the news collection at `data/source/`;

2. Download [RuWordNet](https://ruwordnet.ru/en/) and place at `data/thesaurus/`;

    - [Contact with authors to download]

3. Download [RuSentiFrames-2.0](https://github.com/nicolay-r/RuSentiFrames) collection;

```bash

cd data && ./download.sh

```

4. Provide news reader:

    - default news reader [[code]](texts/readers/simple.py)/[[sample]](data/source/sample.txt);

    - implement custom reader based on `BaseNewsReader` API.

## Apply processing

**Problem:** BERT-based-ontonotes-mult model for NER (`deep-pavlov-1.11.0`), consumes a significant amount of time per a single document which

reduces the speed in a whole text processing pipeline.

**Solution:** Employ a cache for NER results. We utilize `sqlite` as a storage for such data.

### Sentiment Attitude Annotation

Considered to run scripts which organized in the related [folder](scripts) in the following order:

1. Caching extracted data from document into sqlite tables:

    * NER data [[script]](step1_ner_cache.sh);

    * Frames data [[script]](step1_frames_cache.sh);

2. Gather synonyms collection [[script]](step2_cache_synonyms.sh):

    1. Extracting object values;

    2. Grouping into single synonyms collection.

3. Apply `re`script with `--task ext_by_frames` [[script]](step3_exatract_pairs.sh)

    * is a stage 1. of the workflow (pair list gathering)

4. Filter most relevant pairs from pair list [[script]](step4_filter_pairs.sh)

5. Apply `re` script with `--task ext_diff` [[script]](step5_extract_attitudes.sh)

    * is a stage 2. of the workflow.

    

### Expand with Neutral Attitude Annotation

6. Prepare archieved (`*.zip`) collection from step #5, which includes:

    * `synonym.txt` -- list of synonyms.

    * `collection.txt` -- RuAttitudes collection.

7. Run [[script]](step6_neutral_attitudes.sh)

    * Use `--src-zip-filepath` to pass the archived collection path from step #6.

# References

```

@inproceedings{rusnachenko2021language,

    title={Language Models Application in Sentiment Attitude Extraction Task},

    author={Rusnachenko, Nicolay},

    booktitle={Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS), vol.33},

    year={2021},

    number={3},

    pages={199--222},

    authorvak={true},

    authorconf={false},

    language={russian}

}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nicolay-r/frame-based-attitude-extraction-workflow

Awesome Lists containing this project

README