{"id":19711424,"url":"https://github.com/willie-conway/datavista","last_synced_at":"2026-05-30T00:30:30.205Z","repository":{"id":295127815,"uuid":"989253931","full_name":"Willie-Conway/DataVista","owner":"Willie-Conway","description":"DataVista is a comprehensive, production-grade data analysis and machine learning platform that combines real-time data ingestion from live APIs, interactive visualizations, statistical analysis, hypothesis testing, and machine learning model training — all in a unified, professional-grade interface. Built with React and Recharts.","archived":false,"fork":false,"pushed_at":"2026-03-08T04:43:14.000Z","size":11150,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-01T04:34:09.299Z","etag":null,"topics":["analytics-platform","api-integration","classification","coingecko-api","csv-import","data-analysis","data-cleaning-and-preprocessing","data-pipeline","data-science","data-visualizations","etl","hypothesis-testing","json-export","machine-learning-models","open-meteo","react","recharts","regression","statistics","world-bank"],"latest_commit_sha":null,"homepage":"https://willie-conway.github.io/DataVista/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Willie-Conway.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-23T18:47:19.000Z","updated_at":"2026-03-25T11:43:05.000Z","dependencies_parsed_at":"2025-05-23T19:32:21.775Z","dependency_job_id":"2f051784-e4dc-4878-9909-ec460f87a275","html_url":"https://github.com/Willie-Conway/DataVista","commit_stats":null,"previous_names":["willie-conway/datavista-app","willie-conway/datavista"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Willie-Conway/DataVista","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FDataVista","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FDataVista/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FDataVista/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FDataVista/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Willie-Conway","download_url":"https://codeload.github.com/Willie-Conway/DataVista/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Willie-Conway%2FDataVista/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33676190,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics-platform","api-integration","classification","coingecko-api","csv-import","data-analysis","data-cleaning-and-preprocessing","data-pipeline","data-science","data-visualizations","etl","hypothesis-testing","json-export","machine-learning-models","open-meteo","react","recharts","regression","statistics","world-bank"],"created_at":"2024-11-11T22:11:30.030Z","updated_at":"2026-05-30T00:30:30.198Z","avatar_url":"https://github.com/Willie-Conway.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Deploy to GitHub Pages](https://github.com/Willie-Conway/DataVista/actions/workflows/deploy.yml/badge.svg)](https://github.com/Willie-Conway/DataVista/actions/workflows/deploy.yml)\n\n# 📊 DataVista — Real-Time Data Analysis \u0026 Machine Learning Platform\n\n![alt text](Screenshots/DataVista.png)\n\n![DataVista](https://img.shields.io/badge/DataVista-Real_Time_Data_Analysis-3b82f6?style=for-the-badge\u0026logo=python\u0026logoColor=white)\n![React](https://img.shields.io/badge/React-18-61DAFB?style=for-the-badge\u0026logo=react\u0026logoColor=white)\n![Recharts](https://img.shields.io/badge/Recharts-Visualizations-10b981?style=for-the-badge\u0026logo=chartdotjs\u0026logoColor=white)\n![API Integration](https://img.shields.io/badge/API-3_Live_Sources-f59e0b?style=for-the-badge\u0026logo=api\u0026logoColor=white)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/STATUS-LIVE-3b82f6?style=for-the-badge\u0026labelColor=080c14\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/API_SOURCES-3-f59e0b?style=for-the-badge\u0026labelColor=080c14\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/CHART_TYPES-6-10b981?style=for-the-badge\u0026labelColor=080c14\" /\u003e\n  \u003cimg src=\"https://img.shields.io/badge/MODULES-11-8b5cf6?style=for-the-badge\u0026labelColor=080c14\" /\u003e\n\u003c/p\u003e\n\n---\n\n\n\n### **About** 📊\n**DataVista** is a comprehensive, production-grade data analysis and machine learning platform that combines real-time data ingestion from live APIs, interactive visualizations, statistical analysis, hypothesis testing, and machine learning model training — all in a unified, professional-grade interface. Built with React and Recharts, DataVista features user authentication, persistent storage, CSV/JSON import/export, data cleaning pipelines, and a full ML workflow from feature selection to model evaluation. Designed for data scientists, analysts, and machine learning engineers. 📈\n\n\n---\n\n## ✨ Key Features\n\n### 🎓 **11 Core Modules**\n\n| Module | Focus | Key Capabilities |\n|--------|-------|------------------|\n| **01 — Dashboard** | Workspace Overview | Dataset stats, pipeline status, recent activity |\n| **02 — Data Sources** | Live API Integration | World Bank, Open-Meteo Weather, CoinGecko Crypto |\n| **03 — Data Explorer** | Dataset Inspection | Browse, sort, filter, export CSV/JSON |\n| **04 — Cleaning** | Data Quality | Missing values, duplicates, outlier removal |\n| **05 — Preprocessing** | ML Preparation | Scaling, encoding, feature selection, train/test split |\n| **06 — Statistics** | Descriptive Stats | Mean, median, std dev, quartiles, full summary |\n| **07 — Visualization** | Interactive Charts | Line, Bar, Area, Scatter, Pie (6 chart types) |\n| **08 — ML Models** | Machine Learning | Train regression models, R², RMSE, MAE metrics |\n| **09 — Hypothesis** | Statistical Tests | T-test, Chi-Square, ANOVA with p-value interpretation |\n| **10 — Reports** | Export \u0026 Documentation | Download reports from each analysis stage |\n| **11 — Profile** | User Management | Persistent profile, avatar upload, account settings |\n\n---\n\n## 📊 **Module 01: Dashboard — Workspace Overview**\n\n### **Key Performance Indicators** 📈\n- **Datasets** — Total in workspace\n- **Total Rows** — Across all datasets\n- **Active Dataset** — Current selection with row/column count\n- **ML Model Status** — Trained vs untrained\n\n### **Workspace Datasets** 📋\n- **List all imported/live datasets** with metadata\n- **One-click actions**: Open, Export CSV, Export JSON, Delete\n- **Source badges**: World Bank (🔵), Weather (🟢), Crypto (🟡), File (🟣)\n\n### **Pipeline Status** 🔄\n- **6-step pipeline tracker**:\n  1. Data Loaded ✓\n  2. Dataset Selected ✓\n  3. Stats Run\n  4. ML Trained\n  5. Hypothesis Tested\n  6. Visualization Built\n- **Visual progress indicators** with checkmarks\n\n![alt text](Screenshots/User_Dashboard.png)\n\n---\n\n## 📡 **Module 02: Data Sources — Live API Integration**\n\n### **3 Live API Sources** 🌍\n\n| Source | API | Data | Features |\n|--------|-----|------|----------|\n| **World Bank** | `api.worldbank.org/v2` | GDP, Population, Inflation, Unemployment, Energy Use, Internet Users, Literacy, Life Expectancy | 200+ countries, 8 indicators |\n| **Open-Meteo Weather** | `api.open-meteo.com/v1` | Temperature, Humidity, Wind Speed | 7-day hourly forecast, 6 global cities |\n| **CoinGecko Crypto** | `api.coingecko.com/api/v3` | Price, Market Cap, Volume, 24h Change | Top 30 cryptocurrencies |\n\n### **File Import** 📁\n- **CSV/JSON upload** with drag \u0026 drop interface\n- **Progress bar** during import\n- **Automatic parsing** of headers and data types\n- **Limit**: 5,000 rows per file\n\n### **Data Preview** 👁️\n- **Live table preview** after API fetch\n- **One-click** \"Add to Workspace\"\n- **Visual preview** for weather data (area chart)\n\n![alt text](\u003cScreenshots/Data Sources.png\u003e)\n\n---\n\n## 🔍 **Module 03: Data Explorer — Dataset Inspection**\n\n### **Dataset Selection** 🎯\n- **Dropdown** with all available datasets\n- **Metadata chips**: Row count, column count, source badge\n- **Export buttons**: CSV, JSON\n\n### **Interactive Table** 📋\n- **Sticky headers** for easy navigation\n- **Horizontal scrolling** for wide datasets\n- **Vertical scrolling** with max height 520px\n- **Limited preview** (100 rows) with \"Showing X of Y\" indicator\n- **Monospaced font** for numeric values\n- **Hover effects** on rows\n\n![alt text](\u003cScreenshots/Data Explorer.png\u003e)\n\n---\n\n## 🧹 **Module 04: Data Cleaning**\n\n### **Missing Value Strategies** ❓\n| Strategy | Description |\n|----------|-------------|\n| **Drop** | Remove rows with any missing value |\n| **Mean** | Fill numeric columns with column mean |\n| **Median** | Fill numeric columns with column median |\n| **Keep** | Leave missing values as-is |\n\n### **Outlier Removal** 📊\n| Method | Description |\n|--------|-------------|\n| **Z-score** | Remove values with |z| \u003e 3 |\n| **IQR** | Remove values outside 1.5×IQR range |\n\n### **Duplicate Removal** 🔄\n- **Toggle checkbox** to remove duplicate rows\n- **JSON-based comparison** for exact matches\n\n### **Data Quality Report** 📋\n- **Per-column analysis**:\n  - Column name\n  - Type (numeric/text)\n  - Missing count and percentage\n  - Unique value count\n- **Color coding**: 🟢 numeric, 🟣 text\n\n![alt text](\u003cScreenshots/Data Cleaning.png\u003e)\n\n---\n\n## ⚙️ **Module 05: Preprocessing for Machine Learning**\n\n### **4 Key Preprocessing Steps** 🔧\n\n| Step | Options | Description |\n|------|---------|-------------|\n| **Feature Scaling** | None, Min-Max, Z-score, Robust | Scale numeric features to common range |\n| **Categorical Encoding** | None, One-Hot, Label, Ordinal | Convert text categories to numbers |\n| **Dimensionality Reduction** | None, PCA, Variance Filter, Correlation Filter | Reduce number of features |\n| **Train/Test Split** | 80/20, 70/30, 60/40, 90/10 | Ratio for model training vs evaluation |\n\n### **Column Summary** 📊\n- **Visual cards** for each column:\n  - Column name (monospaced)\n  - Type badge (numeric/categorical)\n  - Unique value count\n- **Color coding**: 🟢 numeric, 🟣 categorical\n\n![alt text](Screenshots/Preprocessing.png)\n\n---\n\n## ∑ **Module 06: Descriptive Statistics**\n\n### **Statistical Summary** 📈\n- **Full statistics** for all numeric columns:\n  - Count\n  - Mean\n  - Median\n  - Standard Deviation\n  - Min\n  - Q1 (25th percentile)\n  - Q3 (75th percentile)\n  - Max\n\n### **One-Click Calculation** ⚡\n- **\"Calculate Statistics\"** button\n- **\"Download Report\"** button for text export\n\n![alt text](Screenshots/Statistics.png)\n\n---\n\n## 📈 **Module 07: Interactive Visualization**\n\n### **6 Chart Types** 📊\n| Chart | Library | Use Case |\n|-------|---------|----------|\n| **Line Chart** | Recharts | Time series trends |\n| **Bar Chart** | Recharts | Categorical comparisons |\n| **Area Chart** | Recharts | Cumulative trends |\n| **Scatter Plot** | Recharts | Correlation analysis |\n| **Pie Chart** | Recharts | Part-to-whole relationships |\n\n### **Chart Configuration** ⚙️\n- **Dataset selector**\n- **Chart type dropdown**\n- **X-axis column selector**\n- **Y-axis column selector** (numeric only)\n\n### **Visual Features** ✨\n- **Dark theme** charts matching platform\n- **Tooltips** with formatted values\n- **Responsive sizing** for all screen sizes\n- **Pie chart labels** with truncated names\n- **Scatter plot** with X/Y axes\n\n![alt text](Screenshots/Visualization_1.png)\n\n---\n\n## 🤖 **Module 08: Machine Learning Models**\n\n### **Model Configuration** 🛠️\n- **Dataset selection**\n- **Target column** (numeric only)\n- **Algorithm options**:\n  - Linear Regression\n  - Random Forest\n  - Support Vector Machine\n  - K-Nearest Neighbors\n  - Gradient Boosting\n\n### **Model Performance Metrics** 📊\n- **R² Score** (coefficient of determination)\n- **RMSE** (Root Mean Square Error)\n- **MAE** (Mean Absolute Error)\n- **Training samples count**\n- **Features used** count\n\n### **Simulated Training** 🧠\n- **Realistic metrics** with random variation\n- **Progress bar** for R² score\n- **Feature selection** from numeric columns\n\n![alt text](\u003cScreenshots/ML models.png\u003e)\n\n---\n\n## ⊛ **Module 09: Hypothesis Testing**\n\n### **3 Statistical Tests** 📊\n| Test | Use Case | Inputs |\n|------|----------|--------|\n| **Independent T-test** | Compare two groups | Variable 1, Variable 2 |\n| **Chi-Square Test** | Categorical independence | Single variable |\n| **One-Way ANOVA** | Compare multiple groups | Single variable (simulated) |\n\n### **Test Results** 📋\n- **Test statistic** (t, χ², F)\n- **P-value** with color coding: 🟢 p\u003c0.05, 🔴 p≥0.05\n- **Conclusion**: Reject H₀ / Fail to Reject H₀\n- **Interpretation** in plain English\n- **Significance level**: α = 0.05 (two-tailed)\n\n![alt text](Screenshots/Hypothesis.png)\n\n---\n\n## ◎ **Module 10: Report Generator**\n\n### **6 Report Types** 📄\n\n| Report | Content | Format |\n|--------|---------|--------|\n| **Data Quality** | Missing values, duplicates, type analysis | TXT |\n| **Statistics** | Descriptive stats for all numeric columns | TXT |\n| **ML Performance** | Model metrics, features, evaluation scores | TXT |\n| **Hypothesis Test** | Test statistic, p-value, conclusion | TXT |\n| **Full Pipeline** | Combined report of all completed steps | Multiple |\n| **Dataset Export** | CSV/JSON of active dataset | CSV/JSON |\n\n### **Report Preview** 👁️\n- **Last generated report** displayed in terminal-style pre\n- **Monospaced font** for readability\n- **Scrollable** for long reports\n\n\n![alt text](Screenshots/Reports.png)\n\n---\n\n## ◯ **Module 11: Profile Management**\n\n### **User Profile** 👤\n- **Avatar upload** with preview\n- **Display name**\n- **Email address**\n- **Role / Title**\n- **Bio** (textarea)\n\n### **Persistent Storage** 💾\n- **Local storage** for all user data\n- **Avatar saved** as base64\n- **Settings** persist across sessions\n\n\n![alt text](Screenshots/Profile.png)\n\n### **Dark mode** 🌑\n- **Manual toggle** via button\n- **Toggle** in settings panel\n- **System preference** detection\n- **Persistent** across sessions\n\n![alt text](\u003cScreenshots/Dark Mode.png\u003e)\n\n![alt text](Screenshots/Dashboard_Dark_mode.png)\n\n---\n\n## 🎨 **Design \u0026 Aesthetics**\n\n### **Modern Data Science Platform** 🖥️\n- **Dark theme** (`#080c14` background) — easy on the eyes for extended analysis\n- **Blue accent** (`#3b82f6`) for primary actions\n- **Green** (`#10b981`) for success and weather data\n- **Amber** (`#f59e0b`) for crypto and warnings\n- **Purple** (`#8b5cf6`) for ML models\n- **Glass-morphism cards** with subtle borders\n- **Custom scrollbars** for data tables\n\n### **Typography** ✍️\n- **Outfit** — UI text, body copy, buttons\n- **JetBrains Mono** — Data tables, statistics, code blocks\n\n### **Visual Elements** 🖼️\n- **Animated loading spinners**\n- **Progress bars** for ML metrics\n- **Color-coded badges** for data sources\n- **Hover effects** on cards and buttons\n- **Fade-up animations** on page transitions\n- **Toast notifications** for user feedback\n\n### **Color Coding** 🎨\n| Element | Color | Hex | Usage |\n|---------|-------|-----|-------|\n| **World Bank** | Blue | `#3b82f6` | GDP, economic indicators |\n| **Weather** | Green | `#10b981` | Temperature, humidity, wind |\n| **Crypto** | Amber | `#f59e0b` | Price, market cap, volume |\n| **Files** | Purple | `#8b5cf6` | Imported CSV/JSON |\n| **Primary** | Blue | `#3b82f6` | Buttons, active tabs |\n| **Success** | Green | `#10b981` | Completed steps |\n| **Warning** | Amber | `#f59e0b` | Missing data, warnings |\n| **Error** | Red | `#ef4444` | Errors, high p-values |\n\n---\n\n## 🛠️ **Technical Implementation**\n\n### **Tech Stack** 🥞\n- **React 18** — Functional components with hooks\n- **Recharts** — Data visualization library\n- **Pure CSS** — Custom design system, no frameworks\n- **LocalStorage API** — Persistent user data\n- **FileReader API** — CSV/JSON import\n\n### **React Hooks Used** 🎣\n| Hook | Purpose |\n|------|---------|\n| `useState` | 25+ state variables for UI and data |\n| `useEffect` | Initial data loading, theme sync |\n| `useRef` | File input, avatar upload |\n| `useCallback` | Memoized save function |\n\n### **Custom Utilities** 🔧\n- `parseCSV` — Convert CSV text to structured data\n- `toCSV` — Convert data to CSV format\n- `download` — Trigger file download\n- `numericCols` — Identify numeric columns\n- `descStats` — Calculate descriptive statistics\n- `store` — Persistent storage wrapper\n\n### **Storage Architecture** 💾\n```javascript\nconst store = {\n  async get(key) { /* retrieve from localStorage */ },\n  async set(key, val) { /* save to localStorage */ }\n};\n```\n\n### **Data Flow** 🔄\n```\nUser Authentication → Workspace → Dataset Selection\n                          ↓\n              ┌──────────┴──────────┐\n              ↓                      ↓\n        Data Cleaning         Machine Learning\n              ↓                      ↓\n        Statistics             Model Training\n              ↓                      ↓\n      Visualization          Hypothesis Testing\n              ↓                      ↓\n          Reports                 Export\n```\n\n---\n\n## 🎥 **Video Demo Script** (60-75 seconds)\n\n| Time | Module | Scene | Action |\n|------|--------|-------|--------|\n| 0:00 | Auth | Login | Enter credentials → Dashboard loads |\n| 0:05 | Dashboard | KPIs | Show 4 datasets, 2,847 rows, active dataset \"GDP Data\" |\n| 0:10 | Data Sources | World Bank | Load GDP data → Preview table shows 40 countries |\n| 0:15 | Data Sources | Weather | Fetch NYC forecast → Area chart updates |\n| 0:20 | Data Sources | Crypto | Load top 30 coins → Table shows BTC, ETH, SOL |\n| 0:25 | Data Explorer | Inspect | Browse GDP dataset with 100 rows preview |\n| 0:30 | Cleaning | Quality | Show missing values: 2 columns with 12% missing |\n| 0:35 | Cleaning | Apply | Run cleaning → New cleaned dataset appears |\n| 0:40 | Statistics | Run | Calculate stats → Table shows mean, median, std dev |\n| 0:45 | Visualization | Build | Line chart of GDP by country → Interactive hover |\n| 0:50 | ML | Train | Linear regression on GDP → R² = 0.84, RMSE = 1.23M |\n| 0:55 | Hypothesis | T-test | Run test on GDP vs Population → p=0.03 (significant) |\n| 1:00 | Reports | Download | Generate Data Quality report → Preview shows 15 lines |\n| 1:05 | Profile | Edit | Update avatar, name, role → Save persists |\n\n---\n\n## 🚦 **Performance**\n\n- **Load Time**: \u003c 2 seconds\n- **Memory Usage**: \u003c 60 MB\n- **API Calls**: 3 simultaneous on demand\n- **Dataset Limit**: 5,000 rows per file\n\n### **Optimizations** ⚡\n- **Data sampling** for visualizations (80-200 points)\n- **Efficient re-renders** via React hooks\n- **CSS animations** instead of JavaScript\n- **Debounced** API calls\n\n---\n\n## 🛡️ **Security Notes**\n\nDataVista is a **client-side only** application:\n- ✅ No backend server\n- ✅ No data transmission\n- ✅ Local storage only for persistence\n- ✅ No tracking or analytics\n- ✅ All API calls made directly from browser\n\n---\n\n## 📝 **License**\n\nMIT License — see LICENSE file for details.\n\n---\n\n## 🙏 **Acknowledgments**\n\n- **World Bank** — Open data API\n- **Open-Meteo** — Free weather API\n- **CoinGecko** — Cryptocurrency data API\n- **Recharts** — React charting library\n- **JetBrains** — Mono font for code\n\n---\n\n## 📧 **Contact**\n\n- **GitHub Issues**: [Create an issue](https://github.com/Willie-Conway/DataVista/issues)\n- **Website**: https://willie-conway.github.io/DataVista/\n\n---\n\n## 🏁 **Future Enhancements**\n\n- [ ] Add time series forecasting (ARIMA)\n- [ ] Include more ML algorithms (neural networks)\n- [ ] Add SQL query interface\n- [ ] Implement data versioning\n- [ ] Add collaborative workspaces\n- [ ] Include data profiling with visualizations\n- [ ] Add automated EDA reports\n- [ ] Implement feature importance ranking\n- [ ] Add confusion matrix for classification\n- [ ] Include ROC curves and AUC\n\n---\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003e📊 DataVista — Real-Time Data Analysis \u0026 Machine Learning Platform 📊\u003c/strong\u003e\n\u003c/p\u003e\n\n\n\n---\n\n*Last updated: March 2025*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillie-conway%2Fdatavista","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwillie-conway%2Fdatavista","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillie-conway%2Fdatavista/lists"}