https://github.com/nurulashraf/pyjanitor-datacleaning-exploration
A brief exploration of data cleaning using PyJanitor, an extension of pandas that simplifies common preprocessing tasks. This repo showcases practical examples to streamline data workflows with readable, chainable syntax for efficient data wrangling.
https://github.com/nurulashraf/pyjanitor-datacleaning-exploration
data-cleaning data-preprocessing data-wrangling google-colab method-chaining notebook pandas pyjanitor
Last synced: about 1 month ago
JSON representation
A brief exploration of data cleaning using PyJanitor, an extension of pandas that simplifies common preprocessing tasks. This repo showcases practical examples to streamline data workflows with readable, chainable syntax for efficient data wrangling.
- Host: GitHub
- URL: https://github.com/nurulashraf/pyjanitor-datacleaning-exploration
- Owner: nurulashraf
- License: mit
- Created: 2025-06-02T13:08:57.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-05T09:49:00.000Z (about 1 year ago)
- Last Synced: 2025-08-08T03:41:18.345Z (10 months ago)
- Topics: data-cleaning, data-preprocessing, data-wrangling, google-colab, method-chaining, notebook, pandas, pyjanitor
- Language: Jupyter Notebook
- Homepage:
- Size: 49.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Pyjanitor Data Cleaning Exploration
This repository contains a Google Colab notebook that demonstrates how to use **Pyjanitor**, a Python library built on top of `pandas`, to simplify and streamline data cleaning and preprocessing tasks.
## Overview
The notebook showcases the use of several powerful `pyjanitor` functions, including:
* `clean_names()` – Standardises column names to snake\_case.
* `rename_column()` – Easily rename columns.
* `fill_empty()` – Handling missing values.
* `select_columns()` – Select only the columns you need.
* Method chaining – Perform multiple operations in a single, clean line of code.
These tools help make your data wrangling workflow more readable, efficient, and pythonic.
## Contents
* `pyjanitor_datacleaning_exploration.ipynb` – The main notebook illustrating how to use Pyjanitor for real-world data cleaning.
* Example use cases for handling:
* Missing data
* Inconsistent column names
* Filtering rows and selecting columns
* Renaming for better readability
## Requirements
To run the notebook, you’ll need:
```bash
pandas
pyjanitor
```
Install using pip:
```bash
pip install pandas pyjanitor
```
## Learn More
Explore the full capabilities of Pyjanitor in its official documentation:
🔗 [https://pyjanitor-devs.github.io/pyjanitor](https://pyjanitor-devs.github.io/pyjanitor)
## License
This project is open-source and available under the [MIT License](LICENSE).