https://github.com/youssef-saaed/knn_assignment_c_version
This is a machine learning assignment that implements the K-Nearest Neighbors (KNN) algorithm on the Iris dataset from scratch using pure C
https://github.com/youssef-saaed/knn_assignment_c_version
c from-scratch iris-classification iris-dataset knn pure-c
Last synced: about 1 year ago
JSON representation
This is a machine learning assignment that implements the K-Nearest Neighbors (KNN) algorithm on the Iris dataset from scratch using pure C
- Host: GitHub
- URL: https://github.com/youssef-saaed/knn_assignment_c_version
- Owner: youssef-saaed
- License: mit
- Created: 2024-02-14T14:45:35.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-14T14:49:37.000Z (over 2 years ago)
- Last Synced: 2025-01-27T05:55:02.995Z (over 1 year ago)
- Topics: c, from-scratch, iris-classification, iris-dataset, knn, pure-c
- Language: C
- Homepage:
- Size: 39.1 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# KNN Assignment C version
This is a machine learning assignment that implements the K-Nearest Neighbors (KNN) algorithm on the Iris dataset. The Iris dataset contains 150 samples of three different species of iris flowers, with four features each: sepal length, sepal width, petal length, and petal width. The goal is to classify each sample into one of the three species based on the features.
## Dataframe.h
This is a header file that defines the data frame structure and functions. A data frame is a two-dimensional array of cells, where each cell can store different types of data, such as integers, doubles, or strings. The data frame also has an array of column names and a shape attribute that stores the number of rows and columns. The data frame functions include:
- `new_dataframe`: Creates a new data frame with the given number of rows and columns and returns a pointer to it.
- `read_csv`: Reads a CSV file and stores the data in a data frame and returns a pointer to it.
- `copy_dataframe`: Copies an existing data frame and returns a pointer to the new data frame.
- `copy_cell`: Copies an existing cell and returns a pointer to the new cell.
- `print_cell`: Prints the data of a cell to the standard output.
- `print_dataframe`: Prints the data of a data frame to the standard output.
- `delete_dataframe`: Deletes a data frame and frees the memory allocated for it.
- `mean`: Calculates the mean of a column in a data frame and returns it as a double.
- `stdev`: Calculates the standard deviation of a column in a data frame and returns it as a double.
- `standardize`: Standardizes a data frame by subtracting the mean and dividing by the standard deviation for the given columns and returns a pointer to the standardized data frame.
## KNN_Helpers.h
This is a header file that defines the K-Nearest Neighbors helper functions. The KNN algorithm is a supervised learning method that classifies a sample based on the majority vote of its k nearest neighbors in the feature space. The helper functions include:
- `split_samples`: Splits a data frame into training and testing data frames based on a given ratio and a label column and stores the pointers in the given parameters.
- `ecludian_dist`: Calculates the Euclidean distance between two points of the same length and returns the result as a double.
- `KNN_training_res`: Performs the KNN algorithm on the training and testing data frames and prints the results.
## main.c
This is the main file that executes the program. It does the following steps:
- Reads the Iris.csv file and stores the pointer to the data frame in `df`.
- Prints the data frame to the standard output.
- Declares and initializes an array of column names to be standardized: `stdcols`.
- Standardizes the data frame by subtracting the mean and dividing by the standard deviation for the given columns and stores the pointer to the standardized data frame in `standardized_df`.
- Declares pointers to the training and testing data frames and initializes them to NULL: `training_df` and `testing_df`.
- Splits the standardized data frame into training and testing data frames based on a 0.2 ratio and the label column (5) and stores the pointers in `training_df` and `testing_df`.
- Performs the KNN algorithm on the training and testing data frames with k = 3, 5, and 7 and prints the results.
- Deletes the dataframes and frees the memory allocated for them.