https://github.com/madhurimarawat/data-wrangling

This repository contains experiments on data wrangling techniques, focusing on methods for handling missing values, filtering, aggregation, and more.
https://github.com/madhurimarawat/data-wrangling

codes data-aggregation data-concatenation data-conversion data-filtering data-merging data-preprocessing data-reshaping data-sampling data-visualization data-wrangling data-wrangling-workflow date-time-processing detailed-documentation handling-missing-values jupyter-notebook markdown output python text-data-processing

Last synced: 4 months ago
JSON representation

This repository contains experiments on data wrangling techniques, focusing on methods for handling missing values, filtering, aggregation, and more.

Host: GitHub
URL: https://github.com/madhurimarawat/data-wrangling
Owner: madhurimarawat
License: mit
Created: 2024-09-08T15:43:14.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-01-30T14:21:19.000Z (8 months ago)
Last Synced: 2025-03-04T17:43:38.138Z (7 months ago)
Topics: codes, data-aggregation, data-concatenation, data-conversion, data-filtering, data-merging, data-preprocessing, data-reshaping, data-sampling, data-visualization, data-wrangling, data-wrangling-workflow, date-time-processing, detailed-documentation, handling-missing-values, jupyter-notebook, markdown, output, python, text-data-processing
Language: Jupyter Notebook
Homepage:
Size: 621 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Data-Wrangling
This repository contains experiments on data wrangling techniques, focusing on methods for handling missing values, filtering, aggregation, and more.

## Python

Python is a high-level, interpreted programming language widely used in data science for data manipulation, analysis, and visualization. Libraries such as Pandas and NumPy provide powerful tools for data wrangling, including handling missing values, filtering, and reshaping datasets.

## Directory Structure 📂

```
Data-Wrangling/
│
├── Experiment 1 - Handling Missing Values/
│ ├── Handling_Missing_Values.ipynb
│
├── Experiment 2 - Data Filtering/
│ ├── Data_Filtering.ipynb
│ ├── Experiment 2 Document.docx
│
├── Experiment 3 - Data Aggregation/
│ ├── Data_Aggregation.ipynb
│ ├── Experiment 3 Document.docx/
│
├── Experiment 4 - Data Concatenation/
│ ├── Data_Concatenation.ipynb
│
├── Experiment 5 - Data Reshaping/
│ ├── Data_Reshaping.ipynb
│
├── Experiment 6 - Data Sampling/
│ ├── Data_Sampling.ipynb
│
├── Experiment 7 - Data Conversion/
│ ├── Data_Conversion.ipynb
│
└── README.md
```

## Table Of Contents 📔 🔖 📑

### 1. [Handling Missing Values](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%201)

Identify and fill missing values in a dataset using methods such as mean imputation or forward/backward filling to ensure data completeness and accuracy.

### 2. [Data Filtering](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%202)

Filter rows or columns based on specified criteria, such as removing outliers or selecting data within a certain range to refine datasets for analysis.

### 3. [Data Aggregation](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%203)

Aggregate data by grouping rows based on specific attributes and computing summary statistics, such as mean, median, count, or sum. This helps to summarize large datasets for easier analysis.

### 4. [Data Concatenation](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%204)

Concatenate multiple datasets either along rows or columns to create a unified dataset. This method is useful when merging datasets from different sources or appending new data to an existing dataset.

### 5. [Data Reshaping](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%205)

Reshape data by pivoting, stacking, or unstacking to convert between wide and long formats. This technique allows for better organization and analysis of data with multiple variables.

### 6. [Data Sampling](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%206)

Randomly sample rows or columns from a dataset to create a smaller subset for analysis. Sampling is useful for exploratory data analysis, testing models, or handling large datasets efficiently.

### 7. [Data Conversion](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%207)

Convert data types of columns, such as changing categorical variables to numerical representations or converting numerical values into categories, enabling better processing and analysis of the data.

### 8. [Text Data Processing](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%208)

Clean and preprocess text data by removing punctuation, stopwords, and performing tokenization. This process helps in standardizing the text, making it ready for further analysis such as natural language processing (NLP) or text mining. Tokenization splits text into words or phrases, which can then be analyzed or converted into numerical representations for machine learning models.

### 9. [Date-Time Processing](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%209)

Extract date or time components from datetime columns and perform operations such as calculating time differences or aggregating data by time intervals. This allows for efficient analysis of time series data and helps in understanding trends over different time periods. Techniques include extracting year, month, day, and calculating durations between timestamps.

### 10. [Data Merging](https://github.com/madhurimarawat/Data-Wrangling/tree/main/Experiment%2010)

Merge two or more datasets based on common keys or indices to combine information from different sources. This process is essential for creating comprehensive datasets that capture all relevant data points across different tables. Techniques include inner joins, outer joins, left joins, and right joins to ensure that data relationships are properly maintained during the merging process.

---

## Thanks for Visiting 😄

- Drop a 🌟 if you find this repository useful.

- If you have any doubts or suggestions, feel free to reach me.

📫 How to reach me: [![Linkedin Badge](https://img.shields.io/badge/-madhurima-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/madhurima-rawat/)

- **Contribute and Discuss:** Feel free to open issues 🐛, submit pull requests 🛠️, or start discussions 💬 to help improve this repository!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/madhurimarawat/data-wrangling

Awesome Lists containing this project

README