https://github.com/nicolay-r/AREkit
Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML
https://github.com/nicolay-r/AREkit
batching bulk-operation frames language-model nlp pipelines pipelines-library relation-extraction relationship-extraction sentiment-analysis
Last synced: 17 days ago
JSON representation
Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML
- Host: GitHub
- URL: https://github.com/nicolay-r/AREkit
- Owner: nicolay-r
- License: mit
- Created: 2019-12-03T20:20:46.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-01-18T11:40:41.000Z (3 months ago)
- Last Synced: 2025-03-30T06:04:43.561Z (18 days ago)
- Topics: batching, bulk-operation, frames, language-model, nlp, pipelines, pipelines-library, relation-extraction, relationship-extraction, sentiment-analysis
- Language: Python
- Homepage: https://nicolay-r.github.io/arekit-page/
- Size: 22.4 MB
- Stars: 63
- Watchers: 5
- Forks: 3
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-sentiment-attitude-extraction - [github - applicable-paper]](https://arxiv.org/pdf/2006.13730.pdf) (Frameworks)
- awesome-sentiment-attitude-extraction - [github - applicable-paper]](https://arxiv.org/pdf/2006.13730.pdf) (Frameworks)
README
# AREkit 0.25.1

[](https://pypistats.org/packages/arekit)
![]()
**AREkit** (Attitude and Relation Extraction Toolkit) --
is a python toolkit, devoted to document level Attitude and Relation Extraction between text objects from mass-media news.## Description
This toolkit aims at memory-effective data processing in [Relation Extraction (RE)](https://nlpprogress.com/english/relationship_extraction.html) related tasks.
![]()
> Figure: AREkit pipelines design. More on
> **[ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction](https://link.springer.com/chapter/10.1007/978-3-031-56069-9_23)** paperIn particular, this framework serves the following features:
* ➿ [pipelines](https://github.com/nicolay-r/AREkit/wiki/Pipelines:-Text-Opinion-Annotation) and iterators for handling large-scale collections serialization without out-of-memory issues.
* 🔗 EL (entity-linking) API support for objects,
* ➰ avoidance of cyclic connections,
* :straight_ruler: distance consideration between relation participants (in `terms` or `sentences`),
* 📑 relations annotations and filtering rules,
* *️⃣ entities formatting or masking, and more.The core functionality includes:
* API for document presentation with EL (Entity Linking, i.e. Object Synonymy) support
for sentence level relations preparation (dubbed as contexts);
* API for contexts extraction;
* Relations transferring from sentence-level onto document-level, and more.## Installation
```bash
pip install git+https://github.com/nicolay-r/[email protected]
```## Usage
Please follow the **[tutorial section on project Wiki](https://github.com/nicolay-r/AREkit/wiki/Tutorials)** for mode details.
## How to cite
A great research is also accompanied by the faithful reference.
if you use or extend our work, please cite as follows:```bibtex
@inproceedings{rusnachenko2024arelight,
title={ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction},
author={Rusnachenko, Nicolay and Liang, Huizhi and Kolomeets, Maxim and Shi, Lei},
booktitle={European Conference on Information Retrieval},
year={2024},
organization={Springer}
}
```