https://github.com/childmindresearch/text-knnassifier

A tool for compressed KNN text classification
https://github.com/childmindresearch/text-knnassifier

machine-learning natural-language-processing python-package

Last synced: 5 months ago
JSON representation

A tool for compressed KNN text classification

Host: GitHub
URL: https://github.com/childmindresearch/text-knnassifier
Owner: childmindresearch
License: lgpl-2.1
Created: 2023-07-13T15:52:54.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-01-17T15:01:13.000Z (over 2 years ago)
Last Synced: 2025-05-11T16:03:19.889Z (about 1 year ago)
Topics: machine-learning, natural-language-processing, python-package
Language: Python
Homepage: https://cmi-dair.github.io/text-knnassifier/
Size: 115 KB
Stars: 9
Watchers: 5
Forks: 1
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # TextKNNClassifier

[![Build](https://github.com/cmi-dair/text-knnassifier/actions/workflows/test.yaml/badge.svg?branch=main)](https://github.com/cmi-dair/text-knnassifier/actions/workflows/test.yaml?query=branch%3Amain)

[![codecov](https://codecov.io/gh/cmi-dair/text-knnassifier/branch/main/graph/badge.svg?token=22HWWFWPW5)](https://codecov.io/gh/cmi-dair/text-knnassifier)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![L-GPL License](https://img.shields.io/badge/license-L--GPL-blue.svg)](LICENSE)

[![pages](https://img.shields.io/badge/api-docs-blue)](https://cmi-dair.github.io/text-knnassifier)

`TextKNNClassifier` is a k-nearest neighbors classifier for text data. It uses a compression algorithm to compute the distance between texts and predicts the label of a test entry based on the labels of the k-nearest neighbors in the training data.

## Installation

You can install `TextKNNassifier` using pip:

```bash

pip install textknnassifier

```

## Usage

Here's an example of how to use `TextKNNClassifier`:

```python

from textknnassifier import classifier

training_data = [

    "This is a test",

    "Another test",

    "General Tarkin",

    "General Grievous",

]

training_labels = ["test", "test", "star_wars", "star_wars"]

testing_data = [

    "This is a test",

    "Testing here too!",

    "General Kenobi",

    "General Skywalker",

]

KNN = classifier.TextKNNClassifier(n_neighbors=2)

KNN.fit(training_data, training_labels)

predicted_labels = KNN.predict(testing_data)

print(predicted_labels)

# Output: ['test', 'test', 'star_wars', 'star_wars']

```

In this example, we create a `TextKNNClassifier` instance and use it to predict the labels of the test entries. The initialization is given `n_neighbors=2`, this denotes the number of training datapoints to consider for predicting the testing label. The `fit` method takes two arguments: the training data, and the training labels. It simply stores these values for later use. The `predict` method takes the testing data as an argument and returns the predicted labels.

## References

- Jiang, Z., Yang, M., Tsirlin, M., Tang, R., Dai, Y., & Lin, J. (2023, July). “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Findings of the Association for Computational Linguistics: ACL 2023 (pp. 6810-6828).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/childmindresearch/text-knnassifier

Awesome Lists containing this project

README