Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jamnicki/bachelor_thesis_project
System for Training-based Expansion of Tools for Proper Name Mentions Recognition Based on Active Learning
https://github.com/jamnicki/bachelor_thesis_project
active-learning active-learning-in-nlp annotation-tool argilla kpwr named-entity-recognition nlp optimization sampling-methods sequence-labeling sequential-data spacy
Last synced: 24 days ago
JSON representation
System for Training-based Expansion of Tools for Proper Name Mentions Recognition Based on Active Learning
- Host: GitHub
- URL: https://github.com/jamnicki/bachelor_thesis_project
- Owner: jamnicki
- Created: 2022-11-14T13:26:10.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2023-10-26T19:16:13.000Z (about 1 year ago)
- Last Synced: 2024-11-10T06:28:01.189Z (2 months ago)
- Topics: active-learning, active-learning-in-nlp, annotation-tool, argilla, kpwr, named-entity-recognition, nlp, optimization, sampling-methods, sequence-labeling, sequential-data, spacy
- Language: Jupyter Notebook
- Homepage:
- Size: 57.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# System for Training-based Expansion of Tools for Proper Name Mentions Recognition Based on Active Learning
## Abstract
The quality of tools for recognizing proper names in texts depends on the domain and coverage of the training data. Obtaining a model with satisfactory performance requires using a corpus with a large number of samples, which translates into time spent on data annotation by users with skills such as machine learning engineers, data analysts or annotators, linguists. The goal of the work is to build a system to support the creation of proper name recognition tools by annotating them on a progressively larger set of training data based on the active learning method, thereby improving their quality. The project also aims to accelerate the process of creating datasets and building natural language machine learning models, based on the active learning method. In addition to the use of the promising and constantly developing method of active learning, the motivation of the Author of the work is also to reduce the working time of people involved in the process of creating models. The Author was also prompted to take up the topic of the work by the small number of examples of using the active learning method for the task of recognizing occurrences of proper names, as opposed to classification tasks, in the literature. The paper includes the theoretical basis, a review of available solutions and tools, a description of the implementation, static and dynamic analysis, as well as a summary of the results of the work and the future of the developed system. The stated objectives and requirements were met by the developed system.## Key illustrations
TODO: translate to English| | |
|:---:|:---:|
| | |
| | |
| | |
| | |
| | |