https://github.com/kstrassheim/active-learning-with-deep-learning-for-nlp
We present our concept of a new type of Active-Learning for Deep Learning with NLP text classification and experimentally prove its performance against Random Sampling as well as its runtime performance on the Security Threat dataset from CySecAlert. These new Active Learning algorithms are based on Sentence-BERT and BERTopic clustering algorithms with allow us to generate fixed length tokens for whole sentences to make them comparable to each other. Further the Tokens are Clustered using K-Means or HDBScan to get diverse clusters to pick the samples out of them.
https://github.com/kstrassheim/active-learning-with-deep-learning-for-nlp
active-learning bertopic deep-learning hdbscan k-means-clustering matplotlib natural-language-processing pandas python3 pytorch sentence-bert
Last synced: about 2 months ago
JSON representation
We present our concept of a new type of Active-Learning for Deep Learning with NLP text classification and experimentally prove its performance against Random Sampling as well as its runtime performance on the Security Threat dataset from CySecAlert. These new Active Learning algorithms are based on Sentence-BERT and BERTopic clustering algorithms with allow us to generate fixed length tokens for whole sentences to make them comparable to each other. Further the Tokens are Clustered using K-Means or HDBScan to get diverse clusters to pick the samples out of them.
- Host: GitHub
- URL: https://github.com/kstrassheim/active-learning-with-deep-learning-for-nlp
- Owner: kstrassheim
- License: gpl-3.0
- Created: 2021-12-14T17:09:20.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-06-05T10:58:30.000Z (almost 3 years ago)
- Last Synced: 2024-02-01T17:09:37.835Z (over 1 year ago)
- Topics: active-learning, bertopic, deep-learning, hdbscan, k-means-clustering, matplotlib, natural-language-processing, pandas, python3, pytorch, sentence-bert
- Language: Jupyter Notebook
- Homepage:
- Size: 7.51 MB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0