https://github.com/hitz-zentroa/multilingual-abstrct

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/hitz-zentroa/multilingual-abstrct
Owner: hitz-zentroa
Created: 2024-04-09T16:18:44.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-05-06T13:36:08.000Z (over 1 year ago)
Last Synced: 2025-01-15T05:39:20.803Z (10 months ago)
Size: 11.2 MB
Stars: 0
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# multilingual-abstrct

This repository contains a medical corpus of argument mining [AbstRCT corpus](https://gitlab.com/tomaye/abstrct) (a dataset of clinical abstracts annotated for argument mining in English), and generation of the dataset in Spanish, French and Italian by translation and projection using word alignment tools, such as Awesome align.

# Data

The dataset consists of abstracts of 5 disease types for argument component detection and argument relation classification:

- `neoplasm`: 350 train, 100 dev and 50 test abstracts
- `glaucoma_test`: 100 abstracts
- `mixed_test`: 100 abstracts (20 on glaucoma, 20 on neoplasm, 20 on diabetes, 20 on hypertension, 20 on hepatitis)

The structure of the repository:

Inside `EN` folder:
- argument_components/
- argument_relations/

Inside `ES/FR/IT` folder:

- `argument_components/$MT_$projection/`

- `automatic_projections` - post-processed projections
- `manual_revision` - manually corrected projections
- `postprocessed` - merged English and Spanish train and dev sets

- `argument_relations` (Only English and Spanish (deepl))

## Citation

````bibtex
@misc{yeginbergen2024crosslingual,
title={Cross-lingual Argument Mining in the Medical Domain},
author={Anar Yeginbergen and Rodrigo Agerri},
year={2024},
eprint={2301.10527},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
````

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hitz-zentroa/multilingual-abstrct

Awesome Lists containing this project

README