An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-wrangling

A curated list of projects in awesome lists tagged with data-wrangling .

https://github.com/openrefine/openrefine

OpenRefine is a free, open source power tool for working with messy data and improving it

data-analysis data-science data-wrangling datacleaning datacleansing datajournalism datamining java journalism opendata reconciliation wikidata

Last synced: 13 May 2025

https://github.com/OpenRefine/OpenRefine

OpenRefine is a free, open source power tool for working with messy data and improving it

data-analysis data-science data-wrangling datacleaning datacleansing datajournalism datamining java journalism opendata reconciliation wikidata

Last synced: 15 Mar 2025

https://github.com/tomwright/dasel

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

cli config configuration data-processing data-structures data-wrangling devops-tools go golang json json-processing parser query selector toml update xml yaml yaml-processor

Last synced: 26 Dec 2025

https://github.com/TomWright/dasel

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

cli config configuration data-processing data-structures data-wrangling devops-tools go golang json json-processing parser query selector toml update xml yaml yaml-processor

Last synced: 12 Mar 2025

https://github.com/iterative/datachain

ETL, Analytics, Versioning for Unstructured Data

ai cv data-analytics data-wrangling embeddings llm llm-eval machine-learning mlops multimodal

Last synced: 18 Jun 2025

https://github.com/contextlab/hypertools

A Python toolbox for gaining geometric insights into high-dimensional data

data-visualization data-wrangling high-dimensional-data python text-vectorization time-series topic-modeling visualization

Last synced: 13 Apr 2025

https://github.com/ContextLab/hypertools

A Python toolbox for gaining geometric insights into high-dimensional data

data-visualization data-wrangling high-dimensional-data python text-vectorization time-series topic-modeling visualization

Last synced: 07 Apr 2025

https://github.com/brimdata/zui

Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.

csv data data-analytics data-viz data-wrangling electron-app json-inspector keyword-search super-structured-data table-view type-system zed zng zq zui

Last synced: 12 Jun 2025

https://github.com/brimsec/brim

Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.

csv data data-analytics data-viz data-wrangling electron-app json-inspector keyword-search super-structured-data table-view type-system zed zng zq zui

Last synced: 25 Feb 2025

https://github.com/microsoft/prose

Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.

csharp data-transformation data-wrangling dotnet examples microsoft program-synthesis prose sdk synthesis

Last synced: 13 May 2025

https://github.com/desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Last synced: 22 Nov 2025

https://github.com/Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Last synced: 03 Apr 2025

https://github.com/dbohdan/sqawk

Like awk but with SQL and table joins

awk cli converter csv data-transformation data-wrangling delimited-files json sql tsv

Last synced: 06 Apr 2025

https://github.com/shawnbrown/datatest

Tools for test driven data-wrangling and data validation.

data-wrangling pytest-plugin python quality-assurance testing unittest

Last synced: 18 Jul 2025

https://github.com/kjam/data-cleaning-101

Data Cleaning Libraries with Python

data-validation data-wrangling python teaching

Last synced: 10 May 2025

https://github.com/strengejacke/sjmisc

Data transformation and utility functions for R

data-transformation data-wrangling labelled-data r recoding

Last synced: 04 Apr 2025

https://github.com/dlab-berkeley/R-Fundamentals-Legacy

D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.

automation data-science data-visualization data-wrangling r

Last synced: 26 Apr 2025

https://github.com/trainingbypackt/data-wrangling-with-python

Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

analytics beautifulsoup data-analytics data-munging data-science data-wrangling database numpy pandas python regular-expression web-scraping

Last synced: 06 Apr 2025

https://github.com/lucacappelletti94/csv_trimming

Package python to remove common ugliness from a csv-like file

csv data-wrangling sanitizer

Last synced: 02 Jul 2025

https://github.com/asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow

Last synced: 11 Apr 2025

https://github.com/uc-r/uc-r.github.io

Main repository for R programming courses @ University of Cincinnati, courses and tutorials that focus on data wrangling, exploration, visualization, and analysis with R.

classroom data-science data-wrangling machine-learning r tutorial tutorial-code visualization

Last synced: 26 Mar 2025

https://github.com/r-rudra/tidycells

Automatic transformation of untidy spreadsheet-like data into tidy form

cran data-wrangling heuristic heuristic-algorithm r r-package r-stats spreadsheets tabular-data tidy

Last synced: 30 Jul 2025

https://github.com/TomFevrier/kiwis

A Pandas-inspired data wrangling toolkit in JavaScript

data data-manipulation data-wrangling pandas

Last synced: 15 Mar 2025

https://github.com/ammsa/dtcleaner

DTCleaner: data cleaning using multi-target decision trees.

data-cleaning data-mining data-preprocessing data-quality data-science data-wrangling

Last synced: 21 Mar 2025

https://github.com/pwwang/pipda

A framework for data piping in python

data-wrangling dplyr pandas piping python

Last synced: 15 Apr 2025

https://github.com/inphyt/covid19-italy-integrated-surveillance-data

COVID-19 integrated surveillance data provided by the Italian Institute of Health and processed via UnrollingAverages.jl to deconvolve the weekly moving averages.

covid-19 covid19-data data data-analysis data-structures data-visualization data-wrangling database dataset epidemiological-data epidemiology italy italy-data italy-dataset open-data surveillance surveillance-data time-series time-series-analysis

Last synced: 26 Jul 2025

https://github.com/jezcope/pyrefine

Execute OpenRefine JSON scripts without OpenRefine (or Java)

data-science data-wrangling openrefine python

Last synced: 07 Apr 2025

https://github.com/naqvis/crysda

Crystal library for Data Analysis, Wrangling, Munging

crystal crystal-lang crystal-language crystal-shard data-a data-science data-wrangling

Last synced: 22 Jun 2025

https://github.com/gagolews/teaching-data

Dr Marek's Data for Teaching/Training

data data-science data-wrangling datasets machine-learning

Last synced: 03 Jan 2026

https://github.com/vianneymi/monggregate

Library to make MongoDB aggregation framework and pipelines easy to use in python.

aggregation-framework aggregation-pipeline data-science data-wrangling database mongodb nosql pandas pydantic pymongo query-builder query-engine

Last synced: 30 Jul 2025

https://github.com/buabaj/xplore

A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.

artificial-intelligence data-preprocessing data-science data-wrangling machine-learning

Last synced: 12 Apr 2025

https://github.com/chris-prener/qualmap

R package for working with semi-structured qualitative GIS data

data-management data-wrangling gis mapping package qualitative qualitative-analysis qualitative-gis r rstats

Last synced: 02 Jul 2025

https://github.com/slu-openGIS/qualmap

R package for working with semi-structured qualitative GIS data

data-management data-wrangling gis mapping package qualitative qualitative-analysis qualitative-gis r rstats

Last synced: 14 Mar 2025

https://github.com/r-hyperspec/hyperSpec

hyperSpec: Tools for Spectroscopy (R package)

data-wrangling hyperspectral imaging infrared nmr r-package raman spectroscopy uv-vis xrf

Last synced: 12 Apr 2025

https://github.com/CleverInsight/cognito

🚀🤖 Cognito - Simplifies AutoML Data Preprocessing.

automl data-munging data-preperation data-preprocessing data-wrangling

Last synced: 20 Nov 2025

https://github.com/jananiravi/workshop-tidyverse

Workshop: Using R/tidyverse to analyze & visualize gapminder/processed transcriptomics data!

data-tidying data-visualization data-wrangling gapminder genomics r rstudio tidyverse video-tutorial workshop

Last synced: 20 Aug 2025

https://github.com/contextlab/data-wrangler

Wrangle messy numerical, image, and text data into consistent well-organized formats

data data-analysis data-science data-wrangling hugging-face image-data machine-learning nlp numpy pandas python scikit-learn

Last synced: 10 Apr 2025

https://github.com/data-forge/data-forge-fs

This library contains the file system extensions to Data-Forge that allow it to directly read and write CSV and JSON files in Node.js

csv data data-analysis data-cleaning data-cleansing data-forge data-management data-manipulation data-munging data-visualization data-wrangling javascript json linq nodejs pandas visualization

Last synced: 04 Sep 2025

https://github.com/bradleyboehmke/dw-r

Code and text for the "Data Wrangling with R" book.

book data-science data-wrangling r

Last synced: 13 Apr 2025

https://github.com/btskinner/duawranglr

R Package to Securely Wrangle Dataset According to Data Usage Agreement

data-security data-usage-agreement data-wrangling package r

Last synced: 05 Aug 2025

https://github.com/pmgraham/datagrunt

Datagrunt is a Python library designed to simplify the way you work with CSV files. It provides a streamlined approach to reading, processing, and transforming your data into various formats, making data manipulation efficient and intuitive.

csv csv-parser data-analysis data-engineering data-science data-wrangling dataframe duckdb open-source polars python python3

Last synced: 26 Aug 2025

https://github.com/sondosaabed/nics-firearm-background-checks-investigation

🔫 The data comes from the FBI's National Instant Criminal Background Check System. The NICS is used by to determine whether a prospective buyer is eligible to buy firearms or explosives. 🔫

census-data criminal-background data-analyst-nanodegree data-science data-wrangling data-wrangling-data-vis data-wrangling-data-visualisation fbi matplotlib nanodegree numpy pandas python storytelling-with-data usa

Last synced: 01 Jul 2025

https://github.com/christianbors/OpenRefineQualityMetrics

MetricDoc is an interactive visual exploration environment for assessing data quality

data-profiling data-quality data-quality-checks data-wrangling interactive-visualizations quality-metrics visual-analytics

Last synced: 06 Apr 2025

https://github.com/hrbrmstr/fish-stocking-pdf-data-wrangling

🐠A fishy example of how to do PDF data wrangling in R

data-wrangling pdf pdf-extractor r rs

Last synced: 29 Oct 2025

https://github.com/datapreprocessing/datacleaning

Data Cleaning is a python package for data preprocessing. This cleans the CSV file and returns the cleaned data frame. It does the work of imputation, removing duplicates, replacing special characters, and many more.

data data-cleaning data-cleansing data-preprocessing data-wrangling imputation python threshold

Last synced: 14 Dec 2025

https://github.com/sondosaabed/advanced-data-wrangling

In this advanced course, Learning the three phases of data wrangling: gathering, assessing, and cleaning data.

data-analysis data-analyst-nanodegree data-wrangling numpy pandas python

Last synced: 09 Apr 2025

https://github.com/r-js/mangos

🥭's is monorepo collecting data wrangling and data validation utilities

counterculture data data-wrangling fold functional isomorphism javascript json lens optics schema traversal validation

Last synced: 22 Jun 2025

https://github.com/dathere/qsvpro.dathere.com

🌐 Promo website for qsv pro, a spreadsheet data wrangling desktop app. Includes download links for Windows, macOS, & Linux. Website built with Astro as a static site.

astro ckan csv data data-wrangling framer-motion javascript product qsv react saas tailwindcss website

Last synced: 14 Apr 2025

https://github.com/antononcube/raku-data-reshapers

Raku package with data reshaping functions for different data structures (full arrays, Red tables, Text::CSV tables.)

data data-transformation data-wrangling rakulang

Last synced: 14 Aug 2025

https://github.com/audiomuze/tagminder

Import, maintain and export tag metadata to/from audio files and a dynamically created SQLite table. Automates incremental tag cleanup, enrichment and standardisation for your digital audio library at scale using pre-scripted SQL queries, achieving quality and consistency throughout your music collection in a manner not possible with a tagger.

audio-metadata data-enrichment data-wrangling flac metadata-editing metadata-extraction music-library music-metadata music-tagging musicbrainz rym-capitalisation sqlite3

Last synced: 14 Jul 2025

https://github.com/dcs-training/datavisualisationwithr

Data Visualisation with R Workshop (delivered by the Centre in December 2020). This workshop is focusing on visualising your data. Go to the readme file

data-analysis data-visualisation data-wrangling r

Last synced: 25 Apr 2025

https://github.com/bradleyboehmke/uc-bana-7025

Additional resources for the UC BANA 7025 Data Wrangling course

data-science data-visualization data-wrangling r

Last synced: 13 Apr 2025

https://github.com/data-forge-notebook/data-forge-cheat-sheet

A cheat sheet for Data-Forge that accompanies my book Data Wrangling with JavaScript

cheatsheet data data-wrangling javascript nodejs

Last synced: 20 Jun 2025

https://github.com/sondosaabed/introduction-to-data-analysis-with-pandas-and-numpy

Learning the data analysis process of questioning, wrangling, exploring, analyzing, and communicating data. Working with data in Python using libraries like NumPy and pandas.

data-analysis data-analyst-nanodegree data-wrangling numpy pandas python

Last synced: 09 Apr 2025

https://github.com/antononcube/Raku-Data-Reshapers

Raku package with data reshaping functions for different data structures (full arrays, Red tables, Text::CSV tables.)

data data-transformation data-wrangling rakulang

Last synced: 11 Apr 2025