An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-cleansing

A curated list of projects in awesome lists tagged with data-cleansing .

https://github.com/desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Last synced: 22 Nov 2025

https://github.com/Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Last synced: 03 Apr 2025

https://github.com/probcomp/pclean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning

bayesian-inference data-cleaning data-cleansing probabilistic-graphical-models probabilistic-programming

Last synced: 08 May 2025

https://github.com/probcomp/PClean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning

bayesian-inference data-cleaning data-cleansing probabilistic-graphical-models probabilistic-programming

Last synced: 04 May 2025

https://github.com/data-forge/data-forge-fs

This library contains the file system extensions to Data-Forge that allow it to directly read and write CSV and JSON files in Node.js

csv data data-analysis data-cleaning data-cleansing data-forge data-management data-manipulation data-munging data-visualization data-wrangling javascript json linq nodejs pandas visualization

Last synced: 04 Sep 2025

https://github.com/datapreprocessing/datacleaning

Data Cleaning is a python package for data preprocessing. This cleans the CSV file and returns the cleaned data frame. It does the work of imputation, removing duplicates, replacing special characters, and many more.

data data-cleaning data-cleansing data-preprocessing data-wrangling imputation python threshold

Last synced: 14 Dec 2025

https://github.com/softwaresalt/csv-managed

csv-managed is a Rust command-line utility for high‑performance exploration and transformation of CSV data at scale, emphasizing streaming, typed operations, and reproducible workflows via schema and index files.

big-data cli-app data-cleansing data-engineering data-standardization data-transformation data-wrangling high-performance ml-engineering

Last synced: 12 Dec 2025

https://github.com/jcp/datafilter

Quickly find flags (words, phrases, etc) within your data. :male_detective:

csv data-clean data-cleansing hate-speech-detection parser python swear-filter text textfile

Last synced: 14 Jan 2026

https://github.com/agungbudiwirawan/data_science_in_telco-data_cleansing

Data cleansing using python: handling missing data values, outliers, and standardized values.

data-analysis-python data-cleansing data-science pandas python

Last synced: 08 May 2026

https://github.com/samhollings/nhs_data_cleansing

A repo of reusable functions for cleansing data

cleansing data data-cleaning data-cleansing preprocessing pyspark python python3

Last synced: 05 Oct 2025

https://github.com/itrauco/vtt-to-csv-python-script

Python3 script to convert transcribed video VTT to CSV for import into Google Sheets

captions closed-captions data-cleansing data-wrangling python script transcri vtt vtt-to-csv

Last synced: 19 May 2026

https://github.com/saya304/data-cleaning-and-exploratory-data-analysis

Data Cleaning and Exploratory Data Analysis in Snowflake

data-cleansing exploratory-data-analysis snowflake sql

Last synced: 16 Mar 2026

https://github.com/miozilla/dataprep-alteryx

dataprep-alteryx :eight_spoked_asterisk: : Political & Election # DataPrep # Alteryx # Trifacta # Wrangle # Recipe

alteryx-designer data-analytics data-cleansing data-wrangling dataprep recipe trifacta

Last synced: 29 Aug 2025

https://github.com/vbhvsingh0/cdc_immunization

This project explores the relationships in between different vaccines and the sex, age and other basic features in the data.

data-cleansing data-manipulation-with-pandas data-science numpy pandas-python python3

Last synced: 05 May 2026