Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/codait/identifying-incorrect-labels-in-conll-2003
Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.
https://github.com/codait/identifying-incorrect-labels-in-conll-2003
Last synced: 6 days ago
JSON representation
Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.
- Host: GitHub
- URL: https://github.com/codait/identifying-incorrect-labels-in-conll-2003
- Owner: CODAIT
- License: apache-2.0
- Created: 2020-09-28T21:53:30.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-05-11T17:25:00.000Z (over 3 years ago)
- Last Synced: 2023-12-14T19:01:45.146Z (11 months ago)
- Language: Jupyter Notebook
- Size: 11.2 MB
- Stars: 12
- Watchers: 10
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Identifying-Incorrect-Labels-In-CoNLL-2003
Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.To download the CoNLL-2003 corpus and apply label corrections to produce a corrected version of
the corpus, run the commands below. The CoNLL-2003 corpus is licensed for research use only. Be
sure to adhere to the terms of the license when using this data set!```bash
pip3 install -r requirements.txt
python3 scripts/download_and_correct_corpus.py
```This will download the CoNLL-2003 corpus to `original_corpus/`, apply corrections and save the
corrected corpus in `corrected_corpus/`.NOTE: [Text Extensions for Pandas](https://github.com/CODAIT/text-extensions-for-pandas) must be
installed to run the script. It provides utilities to download and work with the CoNLL-2003
corpus and assist with NLP analysis on Pandas.