Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arvinsingh/biases-in-data
Experiments on biases in data & models
https://github.com/arvinsingh/biases-in-data
bias-correction convolutional-neural-networks keras-tensorflow machine-learning natural-language-processing
Last synced: 2 days ago
JSON representation
Experiments on biases in data & models
- Host: GitHub
- URL: https://github.com/arvinsingh/biases-in-data
- Owner: arvinsingh
- Created: 2024-02-20T23:02:21.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-04-29T06:40:29.000Z (7 months ago)
- Last Synced: 2024-04-29T11:28:14.621Z (7 months ago)
- Topics: bias-correction, convolutional-neural-networks, keras-tensorflow, machine-learning, natural-language-processing
- Language: Jupyter Notebook
- Homepage:
- Size: 8.89 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
Awesome Lists containing this project
README
# Biases in data and models.
This repository explores the topic of biases and abuses in data and aims to study their effects on various experiments. The experiments will be conducted using Jupyter Notebook to analyze and understand the impact of biases in data and find ways to minimize them.
## Tech Stack
1. Keras with TensorFlow
2. Numerical Python Stack
3. Word2Vec
4. Scikit-Learn
5. Jupyter## Datasets
1. [Cat Vs Dog](https://www.kaggle.com/datasets/karakaggle/kaggle-cat-vs-dog-dataset)
2. [Titanic Dataset](https://www.kaggle.com/datasets/yasserh/titanic-dataset)
3. [Statlog - German Credit Data](https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data)## Introduction
In today's data-driven world, it is crucial to be aware of the biases and abuses that can exist within datasets. Biases can arise from various sources, such as data collection methods, sampling techniques, or even human judgment. These biases can lead to skewed results and unfair outcomes, impacting decision-making processes and perpetuating inequalities.
The purpose of this project is to shed light on the presence of biases and abuses in data & trained model and explore ways to mitigate their effects.
## Topics to explore
1. Bias in Natural Language Processing models.
2. Convolutional Neural Network Manifold Learning.
3. Global Black-box Explanation.
4. Local Black-box Explanation.
5. FairML## Biases in Data
Biases in data can occur in different forms, including:
- **Selection Bias**: When certain groups or characteristics are overrepresented or underrepresented in the dataset due to biased sampling methods.
- **Confirmation Bias**: When data is selectively collected or interpreted to support preconceived notions or beliefs.
- **Measurement Bias**: When measurement instruments or techniques introduce systematic errors or inaccuracies.
- **Cultural Bias**: When data reflects the biases and perspectives of a particular culture or group.## Experimental Setup
The experiments will be conducted using Jupyter Notebook, a popular tool for data analysis and visualization. The datasets used in the experiments will be carefully selected to highlight different types of biases and potential abuses. The code and analysis will be documented in the Jupyter Notebook files provided in this repository.
## Results and Analysis
The results obtained from the experiments will be analyzed to identify the presence and impact of biases in the data. Various statistical techniques and machine learning algorithms will be used to quantify and understand the biases. Additionally, strategies and methodologies to minimize biases and improve the fairness of the data will be explored.
## Conclusion
By studying biases and abuses in data, I aim to raise awareness about their existence and impact on decision-making processes. Through rigorous experimentation and analysis, I strive to develop best practices and guidelines to minimize biases and promote fairness in data-driven applications.
Please refer to the Jupyter Notebook files in this repository for detailed experiments, code, and analysis.
## Insights
In the form of Critical Questions/Discussions at the end of each Notebook.