https://github.com/bryanoliveira/xml2conll
A simple script to convert XML Named Entity Recognition annotations to the CONLL format.
https://github.com/bryanoliveira/xml2conll
Last synced: about 1 month ago
JSON representation
A simple script to convert XML Named Entity Recognition annotations to the CONLL format.
- Host: GitHub
- URL: https://github.com/bryanoliveira/xml2conll
- Owner: bryanoliveira
- License: mit
- Created: 2019-11-03T15:45:45.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-04-12T00:53:36.000Z (over 2 years ago)
- Last Synced: 2025-04-08T15:52:37.428Z (6 months ago)
- Language: Python
- Size: 15.6 KB
- Stars: 2
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# xml2conll
A simple script to convert XML Named Entity Recognition annotations to the CONLL format.
This script was made to convert XMLs from the project [HAREM](https://www.linguateca.pt/HAREM/), a famous Portuguese NER dataset.
An example of input file can be found in `example.xml` or [here](https://www.linguateca.pt/aval_conjunta/HAREM/CDPrimeiroHAREMMiniHAREM.xml).## Install
### Dependencies
You will need the following dependencies:
- Python 3
### Requirements
The following requirements will be needed. They can be installed mannually using the following list:
- nltk
Or, just by running the following command:
`pip3 install -r requirements.txt`
## Usage
Run `python3 xml2conll.py --input [XML FILE PATH]` to convert the XML file to CONLL format.
If needed, you can specify the output file name with `--output [CONLL FILE PATH]`.