https://github.com/holygrease/algorithm-k-nn

k-NN methods assign the classification of new example most similar to the classification of k nearby training examples.
https://github.com/holygrease/algorithm-k-nn

classification classification-algorithm

Last synced: 10 months ago
JSON representation

k-NN methods assign the classification of new example most similar to the classification of k nearby training examples.

Host: GitHub
URL: https://github.com/holygrease/algorithm-k-nn
Owner: HolyGrease
Created: 2021-05-22T15:03:32.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2021-05-31T19:13:51.000Z (about 5 years ago)
Last Synced: 2025-04-13T19:18:20.362Z (about 1 year ago)
Topics: classification, classification-algorithm
Language: Python
Homepage:
Size: 127 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

          # Example of creating Dataset object

Firstly, import Dataset:

	from dataset import Dataset

Secondly, get some data:

	data = [

		[5.1, 3.5, 1.4, 0.2, "Iris-setosa"],

		[5.0, 3.2, 1.2, 0.2, "Iris-setosa"],

		[6.4, 3.2, 4.5, 1.5, "Iris-versicolor"],

		[6.7, 3.1, 4.4, 1.4, "Iris-versicolor"],

		[6.7, 3.0, 5.2, 2.3, "Iris-virginica"]]

Thirdly, set columns (attributes) names:

	column_names = [

		"Sepal length", "Sepal width",

		"Petal length", "Petal width",

		"Class"]

Now we can create Dataset object. Arguments:

- data - just list of list

- target index - index of target attribute, attribute that contains classes values

- column or attributes names - list of attributes names

- name - Dataset name



	iris = Dataset(data, 4, column_names, "Iris")

Also you can just get iris dataset by calling method

	get_iris().

You can specify path to dataset file by passing this path as argument, for example:

	get_iris("data\\iris.data")

Default value of path 

> resources\\data\\iris\\iris.data.

	iris = get_iris()

# Preprocessing dataset

Shuffle dataset:

	iris = iris.shuffle()

Split dataset on "train" and "test", as argument passing ratio. Train dataset gets 80% of original dataset, test - other:

	train, test = iris.split_by_ration(0.8)

# Classification

Don't forget to import, instead of euclidean you can import any other implemented metric:

	from k_NN import k_NN

	from k_NN import euclidean

### Basic classification

For basic classification use k_NN() function with only 4 arguments:

- dataset - train dataset which used for classification

- k - integer, number of neighbours that will be used to predict class

- row - instance to classify, must contains all attributes except target

- metric - metric used to calculate distance. k_NN.py file contains some metric, use one of them or implement yours.



k_NN method return class according to train dataset.

Remember you need to delete target attribute from instance that you classify.

	assert_class = test.data[0].pop(test.target)

	instance_to_classify = [4.8, 3.1, 1.6, 0.2]

	predicted_class = k_NN(train, instance_to_classify, 3, euclidean)

	print(f"{assert_class} ?= {predicted_class}")

In terminal you can see something like this

> Iris-setosa ?= Iris-setosa

### Attribute weight classification

Attribute weights are used with calculating distances. This weights determine how specific attribute influece on general distance.

In this case you need to define weights. For example like this:

	attributes_weights = [0.1, 0.5, 0.6, 0.3]

Number of weights must be equal number of attributes - 1 (except target attribute)

Also you can use build in Dataset method called [inforamtion gain](https://machinelearningmastery.com/information-gain-and-mutual-information/#:~:text=Information%20gain%20is%20the%20reduction,before%20and%20after%20a%20transformation.) to get data based weights. Example:

	attributes_weights = [

		Dataset.gain(

			train.get_column(i),

			train.get_target_column())

		for i in range(4)]

To use this weights in classification you need pass one more argument:

	predicted_class = k_NN(train, row, 3, euclidean, attributes_weights=attributes_weights)

### Distance weight classification

In this case you need to define weights. For example like this:

	distances_weights = [0.1, 0.5, 0.6, 0.3]

To use this weights in classification you need pass one more argument:

	predicted_class = k_NN(train, row, 3, euclidean, distances_weights)

### Combine weight classification

Also you can combine both, attribute and distance weights, in this algorithm.

	predicted_class = k_NN(train, row, 3, euclidean, distances_weights, attributes_weights)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/holygrease/algorithm-k-nn

Awesome Lists containing this project

README