An open API service indexing awesome lists of open source software.

https://github.com/adk7712/lumi

A professional data cleaning and validation platform designed to transform raw datasets into high-quality, analysis-ready assets.
https://github.com/adk7712/lumi

csv data-cleaning data-exploration data-science wip

Last synced: about 16 hours ago
JSON representation

A professional data cleaning and validation platform designed to transform raw datasets into high-quality, analysis-ready assets.

Awesome Lists containing this project

README

          

# LUMI

LUMI is a professional data cleaning and validation platform designed to transform raw datasets into high-quality, analysis-ready assets. By combining automated intelligence with intuitive visual tools, LUMI streamlines the journey from messy data to reliable insights.

## Automated Data Quality Scouting
Eliminate the manual effort of hunting for errors. LUMI’s proactive scouting engine automatically scans your dataset to identify inconsistencies, outliers, and structural issues, offering intelligent recommendations to fix them instantly.

## Interactive Data Diagnostics
Understand your data at a glance. Navigate through rich, interactive diagnostic cards that visualize feature distributions, null value density, and unique value counts, providing a comprehensive map of your data’s health and composition.

## Visual Insights Dashboard
Gain deep visual understanding of your dataset. The visual insights panel features:
* **Correlation range filtering:** A dual-handle range slider lets you isolate specific positive or negative correlation scales across your features.
* **Missingness patterns:** A binary matrix visualizer maps where missing values reside across rows and columns to identify systemic data collection gaps.
* **Standardized outlier box plots:** All variables are scaled to statistical Z-scores for a direct, stacked comparison of outliers side-by-side.

## Comprehensive Cleaning Toolkit
Take full control of your data transformations. LUMI provides a versatile suite of tools for precise data manipulation, including column renaming, interactive column reordering, whitespace stripping, smart type casting, find-and-replace operations, and sophisticated outlier management.

## Custom Rulebook & Validation
Enforce rigorous quality standards with a tailored rulebook. Define and apply custom validation constraints to ensure your data adheres to specific business logic and integrity requirements, preventing bad data from reaching your downstream systems.

## Data Lineage & Audit Log
Maintain a transparent record of every modification. Every transformation is captured in a detailed audit log, providing complete data lineage so you can track, review, and verify the evolution of your dataset from its raw state to the final export.

## One-Click Pipeline Export
Move from discovery to production seamlessly. Once you have perfected your cleaning workflow, export the entire logic as a reusable Python script. This allows you to deploy your validated data pipeline into any production environment with a single click.