https://github.com/nicolay-r/frame-based-attitude-extraction-workflow
Workflow source code for ISPRAS-2021 journal paper "Language Models Application in Sentiment Attitude Extraction Task" (in Russian)
https://github.com/nicolay-r/frame-based-attitude-extraction-workflow
distant-supervision relation-extraction sentiment-analysis
Last synced: 4 months ago
JSON representation
Workflow source code for ISPRAS-2021 journal paper "Language Models Application in Sentiment Attitude Extraction Task" (in Russian)
- Host: GitHub
- URL: https://github.com/nicolay-r/frame-based-attitude-extraction-workflow
- Owner: nicolay-r
- License: mit
- Created: 2020-09-25T09:18:06.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-02-08T08:05:12.000Z (over 3 years ago)
- Last Synced: 2025-02-12T00:39:07.618Z (8 months ago)
- Topics: distant-supervision, relation-extraction, sentiment-analysis
- Language: Python
- Homepage:
- Size: 1.02 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Frame-Based Attitude Extraction Workflow
A source code for a core of news processing workflow.
It provides scripts for sentiment attitude extraction using frame-based method.
## Dependencies
* python == 3.6
* sqlite3
* arekit == [0.19.5](https://github.com/nicolay-r/AREkit/tree/0.19.5-bdr-elsevier-2020-py3)
* Utilized as a core library for text parsing, frames reading, stemming application, etc.
* ner == 0.0.2
* Optional, for `deep-ner` NER model
* deep-pavlov == 1.11.0
* Optional, for `bert-mult-ontonotes` NER model
# Installation* Step 1: Install dependencies.
``` bash
# Install AREkit dependency
git clone --single-branch --branch 0.19.5-bdr-elsevier-2020-py3 git@github.com:nicolay-r/AREkit.git core# Download python dependencies
pip install -r requirements.txt
```
# Usage## Prepare data
1. Place the news collection at `data/source/`;
2. Download [RuWordNet](https://ruwordnet.ru/en/) and place at `data/thesaurus/`;
- [Contact with authors to download]
3. Download [RuSentiFrames-2.0](https://github.com/nicolay-r/RuSentiFrames) collection;
```bash
cd data && ./download.sh
```
4. Provide news reader:
- default news reader [[code]](texts/readers/simple.py)/[[sample]](data/source/sample.txt);
- implement custom reader based on `BaseNewsReader` API.## Apply processing
**Problem:** BERT-based-ontonotes-mult model for NER (`deep-pavlov-1.11.0`), consumes a significant amount of time per a single document which
reduces the speed in a whole text processing pipeline.**Solution:** Employ a cache for NER results. We utilize `sqlite` as a storage for such data.
### Sentiment Attitude Annotation
Considered to run scripts which organized in the related [folder](scripts) in the following order:
1. Caching extracted data from document into sqlite tables:
* NER data [[script]](step1_ner_cache.sh);
* Frames data [[script]](step1_frames_cache.sh);
2. Gather synonyms collection [[script]](step2_cache_synonyms.sh):
1. Extracting object values;
2. Grouping into single synonyms collection.
3. Apply `re`script with `--task ext_by_frames` [[script]](step3_exatract_pairs.sh)
* is a stage 1. of the workflow (pair list gathering)
4. Filter most relevant pairs from pair list [[script]](step4_filter_pairs.sh)
5. Apply `re` script with `--task ext_diff` [[script]](step5_extract_attitudes.sh)
* is a stage 2. of the workflow.
### Expand with Neutral Attitude Annotation
6. Prepare archieved (`*.zip`) collection from step #5, which includes:
* `synonym.txt` -- list of synonyms.
* `collection.txt` -- RuAttitudes collection.
7. Run [[script]](step6_neutral_attitudes.sh)
* Use `--src-zip-filepath` to pass the archived collection path from step #6.# References
```
@inproceedings{rusnachenko2021language,
title={Language Models Application in Sentiment Attitude Extraction Task},
author={Rusnachenko, Nicolay},
booktitle={Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS), vol.33},
year={2021},
number={3},
pages={199--222},
authorvak={true},
authorconf={false},
language={russian}
}