An open API service indexing awesome lists of open source software.

https://github.com/willie-conway/datavista

DataVista is a comprehensive, production-grade data analysis and machine learning platform that combines real-time data ingestion from live APIs, interactive visualizations, statistical analysis, hypothesis testing, and machine learning model training โ€” all in a unified, professional-grade interface. Built with React and Recharts.
https://github.com/willie-conway/datavista

analytics-platform api-integration classification coingecko-api csv-import data-analysis data-cleaning-and-preprocessing data-pipeline data-science data-visualizations etl hypothesis-testing json-export machine-learning-models open-meteo react recharts regression statistics world-bank

Last synced: 21 days ago
JSON representation

DataVista is a comprehensive, production-grade data analysis and machine learning platform that combines real-time data ingestion from live APIs, interactive visualizations, statistical analysis, hypothesis testing, and machine learning model training โ€” all in a unified, professional-grade interface. Built with React and Recharts.

Awesome Lists containing this project

README

          

[![Deploy to GitHub Pages](https://github.com/Willie-Conway/DataVista/actions/workflows/deploy.yml/badge.svg)](https://github.com/Willie-Conway/DataVista/actions/workflows/deploy.yml)

# ๐Ÿ“Š DataVista โ€” Real-Time Data Analysis & Machine Learning Platform

![alt text](Screenshots/DataVista.png)

![DataVista](https://img.shields.io/badge/DataVista-Real_Time_Data_Analysis-3b82f6?style=for-the-badge&logo=python&logoColor=white)
![React](https://img.shields.io/badge/React-18-61DAFB?style=for-the-badge&logo=react&logoColor=white)
![Recharts](https://img.shields.io/badge/Recharts-Visualizations-10b981?style=for-the-badge&logo=chartdotjs&logoColor=white)
![API Integration](https://img.shields.io/badge/API-3_Live_Sources-f59e0b?style=for-the-badge&logo=api&logoColor=white)






---

### **About** ๐Ÿ“Š
**DataVista** is a comprehensive, production-grade data analysis and machine learning platform that combines real-time data ingestion from live APIs, interactive visualizations, statistical analysis, hypothesis testing, and machine learning model training โ€” all in a unified, professional-grade interface. Built with React and Recharts, DataVista features user authentication, persistent storage, CSV/JSON import/export, data cleaning pipelines, and a full ML workflow from feature selection to model evaluation. Designed for data scientists, analysts, and machine learning engineers. ๐Ÿ“ˆ

---

## โœจ Key Features

### ๐ŸŽ“ **11 Core Modules**

| Module | Focus | Key Capabilities |
|--------|-------|------------------|
| **01 โ€” Dashboard** | Workspace Overview | Dataset stats, pipeline status, recent activity |
| **02 โ€” Data Sources** | Live API Integration | World Bank, Open-Meteo Weather, CoinGecko Crypto |
| **03 โ€” Data Explorer** | Dataset Inspection | Browse, sort, filter, export CSV/JSON |
| **04 โ€” Cleaning** | Data Quality | Missing values, duplicates, outlier removal |
| **05 โ€” Preprocessing** | ML Preparation | Scaling, encoding, feature selection, train/test split |
| **06 โ€” Statistics** | Descriptive Stats | Mean, median, std dev, quartiles, full summary |
| **07 โ€” Visualization** | Interactive Charts | Line, Bar, Area, Scatter, Pie (6 chart types) |
| **08 โ€” ML Models** | Machine Learning | Train regression models, Rยฒ, RMSE, MAE metrics |
| **09 โ€” Hypothesis** | Statistical Tests | T-test, Chi-Square, ANOVA with p-value interpretation |
| **10 โ€” Reports** | Export & Documentation | Download reports from each analysis stage |
| **11 โ€” Profile** | User Management | Persistent profile, avatar upload, account settings |

---

## ๐Ÿ“Š **Module 01: Dashboard โ€” Workspace Overview**

### **Key Performance Indicators** ๐Ÿ“ˆ
- **Datasets** โ€” Total in workspace
- **Total Rows** โ€” Across all datasets
- **Active Dataset** โ€” Current selection with row/column count
- **ML Model Status** โ€” Trained vs untrained

### **Workspace Datasets** ๐Ÿ“‹
- **List all imported/live datasets** with metadata
- **One-click actions**: Open, Export CSV, Export JSON, Delete
- **Source badges**: World Bank (๐Ÿ”ต), Weather (๐ŸŸข), Crypto (๐ŸŸก), File (๐ŸŸฃ)

### **Pipeline Status** ๐Ÿ”„
- **6-step pipeline tracker**:
1. Data Loaded โœ“
2. Dataset Selected โœ“
3. Stats Run
4. ML Trained
5. Hypothesis Tested
6. Visualization Built
- **Visual progress indicators** with checkmarks

![alt text](Screenshots/User_Dashboard.png)

---

## ๐Ÿ“ก **Module 02: Data Sources โ€” Live API Integration**

### **3 Live API Sources** ๐ŸŒ

| Source | API | Data | Features |
|--------|-----|------|----------|
| **World Bank** | `api.worldbank.org/v2` | GDP, Population, Inflation, Unemployment, Energy Use, Internet Users, Literacy, Life Expectancy | 200+ countries, 8 indicators |
| **Open-Meteo Weather** | `api.open-meteo.com/v1` | Temperature, Humidity, Wind Speed | 7-day hourly forecast, 6 global cities |
| **CoinGecko Crypto** | `api.coingecko.com/api/v3` | Price, Market Cap, Volume, 24h Change | Top 30 cryptocurrencies |

### **File Import** ๐Ÿ“
- **CSV/JSON upload** with drag & drop interface
- **Progress bar** during import
- **Automatic parsing** of headers and data types
- **Limit**: 5,000 rows per file

### **Data Preview** ๐Ÿ‘๏ธ
- **Live table preview** after API fetch
- **One-click** "Add to Workspace"
- **Visual preview** for weather data (area chart)

![alt text]()

---

## ๐Ÿ” **Module 03: Data Explorer โ€” Dataset Inspection**

### **Dataset Selection** ๐ŸŽฏ
- **Dropdown** with all available datasets
- **Metadata chips**: Row count, column count, source badge
- **Export buttons**: CSV, JSON

### **Interactive Table** ๐Ÿ“‹
- **Sticky headers** for easy navigation
- **Horizontal scrolling** for wide datasets
- **Vertical scrolling** with max height 520px
- **Limited preview** (100 rows) with "Showing X of Y" indicator
- **Monospaced font** for numeric values
- **Hover effects** on rows

![alt text]()

---

## ๐Ÿงน **Module 04: Data Cleaning**

### **Missing Value Strategies** โ“
| Strategy | Description |
|----------|-------------|
| **Drop** | Remove rows with any missing value |
| **Mean** | Fill numeric columns with column mean |
| **Median** | Fill numeric columns with column median |
| **Keep** | Leave missing values as-is |

### **Outlier Removal** ๐Ÿ“Š
| Method | Description |
|--------|-------------|
| **Z-score** | Remove values with |z| > 3 |
| **IQR** | Remove values outside 1.5ร—IQR range |

### **Duplicate Removal** ๐Ÿ”„
- **Toggle checkbox** to remove duplicate rows
- **JSON-based comparison** for exact matches

### **Data Quality Report** ๐Ÿ“‹
- **Per-column analysis**:
- Column name
- Type (numeric/text)
- Missing count and percentage
- Unique value count
- **Color coding**: ๐ŸŸข numeric, ๐ŸŸฃ text

![alt text]()

---

## โš™๏ธ **Module 05: Preprocessing for Machine Learning**

### **4 Key Preprocessing Steps** ๐Ÿ”ง

| Step | Options | Description |
|------|---------|-------------|
| **Feature Scaling** | None, Min-Max, Z-score, Robust | Scale numeric features to common range |
| **Categorical Encoding** | None, One-Hot, Label, Ordinal | Convert text categories to numbers |
| **Dimensionality Reduction** | None, PCA, Variance Filter, Correlation Filter | Reduce number of features |
| **Train/Test Split** | 80/20, 70/30, 60/40, 90/10 | Ratio for model training vs evaluation |

### **Column Summary** ๐Ÿ“Š
- **Visual cards** for each column:
- Column name (monospaced)
- Type badge (numeric/categorical)
- Unique value count
- **Color coding**: ๐ŸŸข numeric, ๐ŸŸฃ categorical

![alt text](Screenshots/Preprocessing.png)

---

## โˆ‘ **Module 06: Descriptive Statistics**

### **Statistical Summary** ๐Ÿ“ˆ
- **Full statistics** for all numeric columns:
- Count
- Mean
- Median
- Standard Deviation
- Min
- Q1 (25th percentile)
- Q3 (75th percentile)
- Max

### **One-Click Calculation** โšก
- **"Calculate Statistics"** button
- **"Download Report"** button for text export

![alt text](Screenshots/Statistics.png)

---

## ๐Ÿ“ˆ **Module 07: Interactive Visualization**

### **6 Chart Types** ๐Ÿ“Š
| Chart | Library | Use Case |
|-------|---------|----------|
| **Line Chart** | Recharts | Time series trends |
| **Bar Chart** | Recharts | Categorical comparisons |
| **Area Chart** | Recharts | Cumulative trends |
| **Scatter Plot** | Recharts | Correlation analysis |
| **Pie Chart** | Recharts | Part-to-whole relationships |

### **Chart Configuration** โš™๏ธ
- **Dataset selector**
- **Chart type dropdown**
- **X-axis column selector**
- **Y-axis column selector** (numeric only)

### **Visual Features** โœจ
- **Dark theme** charts matching platform
- **Tooltips** with formatted values
- **Responsive sizing** for all screen sizes
- **Pie chart labels** with truncated names
- **Scatter plot** with X/Y axes

![alt text](Screenshots/Visualization_1.png)

---

## ๐Ÿค– **Module 08: Machine Learning Models**

### **Model Configuration** ๐Ÿ› ๏ธ
- **Dataset selection**
- **Target column** (numeric only)
- **Algorithm options**:
- Linear Regression
- Random Forest
- Support Vector Machine
- K-Nearest Neighbors
- Gradient Boosting

### **Model Performance Metrics** ๐Ÿ“Š
- **Rยฒ Score** (coefficient of determination)
- **RMSE** (Root Mean Square Error)
- **MAE** (Mean Absolute Error)
- **Training samples count**
- **Features used** count

### **Simulated Training** ๐Ÿง 
- **Realistic metrics** with random variation
- **Progress bar** for Rยฒ score
- **Feature selection** from numeric columns

![alt text]()

---

## โŠ› **Module 09: Hypothesis Testing**

### **3 Statistical Tests** ๐Ÿ“Š
| Test | Use Case | Inputs |
|------|----------|--------|
| **Independent T-test** | Compare two groups | Variable 1, Variable 2 |
| **Chi-Square Test** | Categorical independence | Single variable |
| **One-Way ANOVA** | Compare multiple groups | Single variable (simulated) |

### **Test Results** ๐Ÿ“‹
- **Test statistic** (t, ฯ‡ยฒ, F)
- **P-value** with color coding: ๐ŸŸข p<0.05, ๐Ÿ”ด pโ‰ฅ0.05
- **Conclusion**: Reject Hโ‚€ / Fail to Reject Hโ‚€
- **Interpretation** in plain English
- **Significance level**: ฮฑ = 0.05 (two-tailed)

![alt text](Screenshots/Hypothesis.png)

---

## โ—Ž **Module 10: Report Generator**

### **6 Report Types** ๐Ÿ“„

| Report | Content | Format |
|--------|---------|--------|
| **Data Quality** | Missing values, duplicates, type analysis | TXT |
| **Statistics** | Descriptive stats for all numeric columns | TXT |
| **ML Performance** | Model metrics, features, evaluation scores | TXT |
| **Hypothesis Test** | Test statistic, p-value, conclusion | TXT |
| **Full Pipeline** | Combined report of all completed steps | Multiple |
| **Dataset Export** | CSV/JSON of active dataset | CSV/JSON |

### **Report Preview** ๐Ÿ‘๏ธ
- **Last generated report** displayed in terminal-style pre
- **Monospaced font** for readability
- **Scrollable** for long reports

![alt text](Screenshots/Reports.png)

---

## โ—ฏ **Module 11: Profile Management**

### **User Profile** ๐Ÿ‘ค
- **Avatar upload** with preview
- **Display name**
- **Email address**
- **Role / Title**
- **Bio** (textarea)

### **Persistent Storage** ๐Ÿ’พ
- **Local storage** for all user data
- **Avatar saved** as base64
- **Settings** persist across sessions

![alt text](Screenshots/Profile.png)

### **Dark mode** ๐ŸŒ‘
- **Manual toggle** via button
- **Toggle** in settings panel
- **System preference** detection
- **Persistent** across sessions

![alt text]()

![alt text](Screenshots/Dashboard_Dark_mode.png)

---

## ๐ŸŽจ **Design & Aesthetics**

### **Modern Data Science Platform** ๐Ÿ–ฅ๏ธ
- **Dark theme** (`#080c14` background) โ€” easy on the eyes for extended analysis
- **Blue accent** (`#3b82f6`) for primary actions
- **Green** (`#10b981`) for success and weather data
- **Amber** (`#f59e0b`) for crypto and warnings
- **Purple** (`#8b5cf6`) for ML models
- **Glass-morphism cards** with subtle borders
- **Custom scrollbars** for data tables

### **Typography** โœ๏ธ
- **Outfit** โ€” UI text, body copy, buttons
- **JetBrains Mono** โ€” Data tables, statistics, code blocks

### **Visual Elements** ๐Ÿ–ผ๏ธ
- **Animated loading spinners**
- **Progress bars** for ML metrics
- **Color-coded badges** for data sources
- **Hover effects** on cards and buttons
- **Fade-up animations** on page transitions
- **Toast notifications** for user feedback

### **Color Coding** ๐ŸŽจ
| Element | Color | Hex | Usage |
|---------|-------|-----|-------|
| **World Bank** | Blue | `#3b82f6` | GDP, economic indicators |
| **Weather** | Green | `#10b981` | Temperature, humidity, wind |
| **Crypto** | Amber | `#f59e0b` | Price, market cap, volume |
| **Files** | Purple | `#8b5cf6` | Imported CSV/JSON |
| **Primary** | Blue | `#3b82f6` | Buttons, active tabs |
| **Success** | Green | `#10b981` | Completed steps |
| **Warning** | Amber | `#f59e0b` | Missing data, warnings |
| **Error** | Red | `#ef4444` | Errors, high p-values |

---

## ๐Ÿ› ๏ธ **Technical Implementation**

### **Tech Stack** ๐Ÿฅž
- **React 18** โ€” Functional components with hooks
- **Recharts** โ€” Data visualization library
- **Pure CSS** โ€” Custom design system, no frameworks
- **LocalStorage API** โ€” Persistent user data
- **FileReader API** โ€” CSV/JSON import

### **React Hooks Used** ๐ŸŽฃ
| Hook | Purpose |
|------|---------|
| `useState` | 25+ state variables for UI and data |
| `useEffect` | Initial data loading, theme sync |
| `useRef` | File input, avatar upload |
| `useCallback` | Memoized save function |

### **Custom Utilities** ๐Ÿ”ง
- `parseCSV` โ€” Convert CSV text to structured data
- `toCSV` โ€” Convert data to CSV format
- `download` โ€” Trigger file download
- `numericCols` โ€” Identify numeric columns
- `descStats` โ€” Calculate descriptive statistics
- `store` โ€” Persistent storage wrapper

### **Storage Architecture** ๐Ÿ’พ
```javascript
const store = {
async get(key) { /* retrieve from localStorage */ },
async set(key, val) { /* save to localStorage */ }
};
```

### **Data Flow** ๐Ÿ”„
```
User Authentication โ†’ Workspace โ†’ Dataset Selection
โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ†“ โ†“
Data Cleaning Machine Learning
โ†“ โ†“
Statistics Model Training
โ†“ โ†“
Visualization Hypothesis Testing
โ†“ โ†“
Reports Export
```

---

## ๐ŸŽฅ **Video Demo Script** (60-75 seconds)

| Time | Module | Scene | Action |
|------|--------|-------|--------|
| 0:00 | Auth | Login | Enter credentials โ†’ Dashboard loads |
| 0:05 | Dashboard | KPIs | Show 4 datasets, 2,847 rows, active dataset "GDP Data" |
| 0:10 | Data Sources | World Bank | Load GDP data โ†’ Preview table shows 40 countries |
| 0:15 | Data Sources | Weather | Fetch NYC forecast โ†’ Area chart updates |
| 0:20 | Data Sources | Crypto | Load top 30 coins โ†’ Table shows BTC, ETH, SOL |
| 0:25 | Data Explorer | Inspect | Browse GDP dataset with 100 rows preview |
| 0:30 | Cleaning | Quality | Show missing values: 2 columns with 12% missing |
| 0:35 | Cleaning | Apply | Run cleaning โ†’ New cleaned dataset appears |
| 0:40 | Statistics | Run | Calculate stats โ†’ Table shows mean, median, std dev |
| 0:45 | Visualization | Build | Line chart of GDP by country โ†’ Interactive hover |
| 0:50 | ML | Train | Linear regression on GDP โ†’ Rยฒ = 0.84, RMSE = 1.23M |
| 0:55 | Hypothesis | T-test | Run test on GDP vs Population โ†’ p=0.03 (significant) |
| 1:00 | Reports | Download | Generate Data Quality report โ†’ Preview shows 15 lines |
| 1:05 | Profile | Edit | Update avatar, name, role โ†’ Save persists |

---

## ๐Ÿšฆ **Performance**

- **Load Time**: < 2 seconds
- **Memory Usage**: < 60 MB
- **API Calls**: 3 simultaneous on demand
- **Dataset Limit**: 5,000 rows per file

### **Optimizations** โšก
- **Data sampling** for visualizations (80-200 points)
- **Efficient re-renders** via React hooks
- **CSS animations** instead of JavaScript
- **Debounced** API calls

---

## ๐Ÿ›ก๏ธ **Security Notes**

DataVista is a **client-side only** application:
- โœ… No backend server
- โœ… No data transmission
- โœ… Local storage only for persistence
- โœ… No tracking or analytics
- โœ… All API calls made directly from browser

---

## ๐Ÿ“ **License**

MIT License โ€” see LICENSE file for details.

---

## ๐Ÿ™ **Acknowledgments**

- **World Bank** โ€” Open data API
- **Open-Meteo** โ€” Free weather API
- **CoinGecko** โ€” Cryptocurrency data API
- **Recharts** โ€” React charting library
- **JetBrains** โ€” Mono font for code

---

## ๐Ÿ“ง **Contact**

- **GitHub Issues**: [Create an issue](https://github.com/Willie-Conway/DataVista/issues)
- **Website**: https://willie-conway.github.io/DataVista/

---

## ๐Ÿ **Future Enhancements**

- [ ] Add time series forecasting (ARIMA)
- [ ] Include more ML algorithms (neural networks)
- [ ] Add SQL query interface
- [ ] Implement data versioning
- [ ] Add collaborative workspaces
- [ ] Include data profiling with visualizations
- [ ] Add automated EDA reports
- [ ] Implement feature importance ranking
- [ ] Add confusion matrix for classification
- [ ] Include ROC curves and AUC

---


๐Ÿ“Š DataVista โ€” Real-Time Data Analysis & Machine Learning Platform ๐Ÿ“Š

---

*Last updated: March 2025*