https://github.com/childmindresearch/text-knnassifier
A tool for compressed KNN text classification
https://github.com/childmindresearch/text-knnassifier
machine-learning natural-language-processing python-package
Last synced: 5 months ago
JSON representation
A tool for compressed KNN text classification
- Host: GitHub
- URL: https://github.com/childmindresearch/text-knnassifier
- Owner: childmindresearch
- License: lgpl-2.1
- Created: 2023-07-13T15:52:54.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-01-17T15:01:13.000Z (over 2 years ago)
- Last Synced: 2025-05-11T16:03:19.889Z (about 1 year ago)
- Topics: machine-learning, natural-language-processing, python-package
- Language: Python
- Homepage: https://cmi-dair.github.io/text-knnassifier/
- Size: 115 KB
- Stars: 9
- Watchers: 5
- Forks: 1
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TextKNNClassifier
[](https://github.com/cmi-dair/text-knnassifier/actions/workflows/test.yaml?query=branch%3Amain)
[](https://codecov.io/gh/cmi-dair/text-knnassifier)
[](https://github.com/psf/black)
[](LICENSE)
[](https://cmi-dair.github.io/text-knnassifier)
`TextKNNClassifier` is a k-nearest neighbors classifier for text data. It uses a compression algorithm to compute the distance between texts and predicts the label of a test entry based on the labels of the k-nearest neighbors in the training data.
## Installation
You can install `TextKNNassifier` using pip:
```bash
pip install textknnassifier
```
## Usage
Here's an example of how to use `TextKNNClassifier`:
```python
from textknnassifier import classifier
training_data = [
"This is a test",
"Another test",
"General Tarkin",
"General Grievous",
]
training_labels = ["test", "test", "star_wars", "star_wars"]
testing_data = [
"This is a test",
"Testing here too!",
"General Kenobi",
"General Skywalker",
]
KNN = classifier.TextKNNClassifier(n_neighbors=2)
KNN.fit(training_data, training_labels)
predicted_labels = KNN.predict(testing_data)
print(predicted_labels)
# Output: ['test', 'test', 'star_wars', 'star_wars']
```
In this example, we create a `TextKNNClassifier` instance and use it to predict the labels of the test entries. The initialization is given `n_neighbors=2`, this denotes the number of training datapoints to consider for predicting the testing label. The `fit` method takes two arguments: the training data, and the training labels. It simply stores these values for later use. The `predict` method takes the testing data as an argument and returns the predicted labels.
## References
- Jiang, Z., Yang, M., Tsirlin, M., Tang, R., Dai, Y., & Lin, J. (2023, July). “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Findings of the Association for Computational Linguistics: ACL 2023 (pp. 6810-6828).