https://github.com/teddyoweh/dimensionality-reduction-pca
Dimensionality reduction is basically a process of reducing the amount of random features,attributes variables or in this case called dimensions in a dataset and leaving as much variation in the dataset as possible by obtaining a set of only relevant features to increase the effiency of a model.
https://github.com/teddyoweh/dimensionality-reduction-pca
data-science dataset dimensional-analysis dimensionality-reduction feature-extraction feature-selection machine-learning
Last synced: 6 months ago
JSON representation
Dimensionality reduction is basically a process of reducing the amount of random features,attributes variables or in this case called dimensions in a dataset and leaving as much variation in the dataset as possible by obtaining a set of only relevant features to increase the effiency of a model.
- Host: GitHub
- URL: https://github.com/teddyoweh/dimensionality-reduction-pca
- Owner: teddyoweh
- Created: 2022-04-04T09:42:30.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-04-28T10:02:03.000Z (over 3 years ago)
- Last Synced: 2025-03-24T00:27:16.621Z (7 months ago)
- Topics: data-science, dataset, dimensional-analysis, dimensionality-reduction, feature-extraction, feature-selection, machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 6.84 KB
- Stars: 11
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
Awesome Lists containing this project
README
# Dimensionality Reduction PCA
Dimensionality is refered to the attribute or features of a dataset
#### Dimensionality reduction is basically a process of reducing the amount of random features,attributes variables or in this case called dimensions in a dataset and leaving as much variation in the dataset as possible by obtaining a set of only relevant features to increase the effiency of a model.# Importance of Dimensionality Reduction
When running models or training models machine Learning not all the dimensions of a datasets are relevant, and to make you a model train in a less time and more effienct dimensionality reduction should be carried out on a dataset to remove the irrelevant datasets. It avoids overfitting of a dataset because, when attributes or dimensions are many, the model tends to become complex and overfit on the training data. Also useful when visualizing data
#### Two approaches are either keeping the most important features and removing the rest and .. combination of features to reduce the dimensions
# Feature Extraction and Feature Selection?
Feature selection removes irrelevant features from a dataset.
Feature extraction makes new features from existing ones.### Both dimensionality reduction processes.
# Why feature selection is important?
Feature selection is rather important because it is an effiecient method of dimensionality reduction by removing irrelevant features. It also builds an accurate model with better prediction power and overfitting by selecting only relevant features.# Wrapper-based feature selection
In wrapper methods, it is based on a specific machine learning algorithm.
It evaluates all the features of a dataset and gives optimal results of the various combinations of features.Some methods are
### Forward Selection
This process starts off with no feature then keeps features which improves the model the best till when a new feature doesn't improve the model### Backward Elimination
This process removes the least significant feature which improves the performance of the model.# Filter-based feature selection
In this process the features Dimensionality reduction is a machine learning (ML) or statistical technique of reducing the amount of random features,attributes variables or in this case called dimensions in a dataset by obtaining a set of only relevant features to increase the effiency of a model. are selected based of their results in statical tests# Embedded-based feature selection
This process combines both the Wrapper and Filter based selection methods. It tests both methods,generate a combination of features and selection the best of the methods that improves the models performance