An open API service indexing awesome lists of open source software.

https://github.com/baci-ak/b-vista

Interactive EDA tool to explore pandas DataFrames β€” via Python, notebooks & Docker
https://github.com/baci-ak/b-vista

analytics data-analysis data-science data-visualization dataframe docker eda flask flexible ipython jupyter notebook pandas python react visualization

Last synced: 2 months ago
JSON representation

Interactive EDA tool to explore pandas DataFrames β€” via Python, notebooks & Docker

Awesome Lists containing this project

README

        

---

# πŸ“Š B-vista

> **Visual, Scalable, and Real-Time Exploratory Data Analysis β€” Built for modern notebooks and the browser.**

---

![Untitled design (6)](https://github.com/user-attachments/assets/240b0325-92aa-40ef-822d-af3b0c765699)

## What is it?
**B-vista** is a full-stack Exploratory Data Analysis (EDA) interface for `pandas` DataFrames. It connects a **Flask + WebSocket backend** to a **dynamic React frontend**, offering everything from descriptive stats to missing data diagnostics β€” in real-time.

---

| **Testing** | ![Build](https://img.shields.io/badge/build-passing-brightgreen) ![Tests](https://img.shields.io/badge/tests-passing-brightgreen) ![Coverage](https://img.shields.io/badge/coverage-85%25-yellowgreen) |
|-------------|----------------------------------------------------------------------------------------------------------------------------------|
| **Package** | [![PyPI Version](https://img.shields.io/pypi/v/bvista)](https://pypi.org/project/bvista/) [![PyPI Downloads](https://static.pepy.tech/badge/bvista)](https://pepy.tech/projects/bvista) ![Python](https://img.shields.io/badge/python-3.7%2B-blue) |
| **Meta** | ![Docs](https://img.shields.io/badge/docs-available-brightgreen) [![License](https://img.shields.io/badge/license-BSD%203--Clause-blue)](https://opensource.org/licenses/BSD-3-Clause) ![Status](https://img.shields.io/badge/status-active-success) |

---

> 🎯 **Designed for**
> Data Scientists Β· Analysts Β· Educators
> Teams collaborating over datasets

---

## πŸ“š Contents

- [✨ Main Features](#-main-features)
- [πŸ“¦ Installation](#-installation)
- [🐳 Docker Quickstart](#-docker-quickstart)
- [πŸš€ Quickstart](#-quickstart)
- [βš™οΈ Advanced Usage](#️-advanced-usage)
- [πŸ” Reconnect to a Previous Session](#-reconnect-to-a-previous-session)
- [🐳 Environment & Compatibility](#-️environment--compatibility)
- [πŸ“˜ Documentation](#-documentation)
- [πŸ–₯️ UI](#-ui)
- [πŸ”’ Interactive Data Grid](#-interactive-data-grid)
- [πŸ“‚ Session Management](#-session-management)
- [πŸ“‚ No-Code Cleaning & Transformation](#-no-code-cleaning--transformation)
- [πŸ“Š Performance & Usability](#-performance--usability)
- [πŸ’‘ In the News & Inspiration](#-in-the-news--inspiration)
- [πŸ”— Related Tools & Inspiration](#-related-tools--inspiration)
- [πŸ“‚ Project Structure](#-project-structure)
- [πŸ“‚ Dataset](#-dataset)
- [πŸ”– Versioning](#-versioning)
- [πŸ§‘β€πŸ’» Developer Setup & Contributing](#-developer-setup--contributing)
- [πŸ§‘β€πŸ’» Security](#-security)
- [πŸ“„ License](#-license)

---

## ✨ Main Features

B-vista transforms how you explore and clean pandas DataFrames. With just a few clicks or lines of code, you get a comprehensive, interactive EDA experience tailored for effecient workflows.

- **πŸ“Š Descriptive Statistics**
Summarize distributions with enhanced stats including skewness, kurtosis, Shapiro-Wilk normality, and z-scoresβ€”beyond standard `.describe()`.

- **πŸ”— Correlation Matrix Explorer**
Instantly visualize relationships using Pearson, Spearman, Kendall, Mutual Info, Partial, Robust, and Distance correlations.

- **πŸ“ˆ Distribution Analysis**
Generate histograms, KDE plots, box plots (with auto log-scaling), and QQ plots for deep insight into variable spread and outliers.

- **🧼 Missing Data Diagnostics**
Visualize missingness (matrix, heatmap, dendrogram), identify patterns, and classify gaps using MCAR/MAR/NMAR inference methods.

- **πŸ› οΈ Smart Data Cleaning**
Drop or impute missing values with Mean, Median, Mode, Forward/Backward Fill, Interpolation, KNN, Iterative, Regression, or Autoencoder.

- **πŸ” Data Transformation Engine**
Cast column types, format as time or currency, normalize/standardize, rename or reorder columnsβ€”all with audit-safe tracking.

- **🧬 Duplicate Detection & Resolution**
Automatically detect, isolate, or remove duplicate rows with real-time filtering.

- **πŸ”„ Inline Cell Editing & Updates**
Update any cell in-place and sync live across sessions via WebSocket-powered pipelines.

- **πŸ“‚ Seamless Dataset Upload**
Drag-and-drop or API-based DataFrame ingestion using secure, session-isolated pickle transport.

> πŸ” [See full feature breakdown β†’](docs/features.md)

---
### Where to get it
the source code is currently hosted on Github at β†’ [Source code](https://github.com/Baci-Ak/b-vista).
> Binary installers for the latest released version are available at the β†’ [Python Package Index (PyPI)](https://pypi.org/project/bvista/)

---
## πŸ“¦ Installation

```bash
#PYPI
pip install bvista
```
```bash
#Conda
conda install -c conda-forge bvista
```

## 🐳 Docker Quickstart

B-Vista is available as a ready-to-run Docker image on β†’ [Docker Hub](https://hub.docker.com/r/baciak/bvista):

```bash
docker pull baciak/bvista:latest
```

> βœ… Works on Linux, Windows, and macOS
> βœ… On Apple Silicon (M1/M2/M3), use: `--platform linux/amd64`

### ▢️ Run the App

To launch the B-Vista web app locally:

```bash
docker run --platform linux/amd64 -p 8501:5050 baciak/bvista:latest
```

Then open your browser and go to:

```
http://localhost:8501
```

> [Docker Doc](https://hub.docker.com/r/baciak/bvista)
---

## πŸš€ Quickstart

The fastest way to get started (in a notebook):

```python

import bvista

df = pd.read_csv("dataset.csv")
bvista.show(df)
```
![demo_fast](https://github.com/user-attachments/assets/ab9c225a-49ed-4c64-a6ed-e9601ed2fc9f)

### Command line (terminal)
![new](https://github.com/user-attachments/assets/9b586970-a8cf-4d58-8ee2-e4521662b894)

---

## βš™οΈ Advanced Usage

For full control over how and where B-Vista runs, use the `show()` function with advanced arguments:

```python
import bvista
import pandas as pd

df = pd.read_csv("dataset.csv")

# πŸ‘‡ Customize how B-Vista starts and displays
bvista.show(
df, # Required: your pandas DataFrame
name="my_dataset", # Optional: session name
open_browser=True, # Optional: open in browser outside notebooks
silent=False # Optional: print connection messages
)
```

---

### πŸ” Reconnect to a Previous Session

```python
bvista.show(session_id="your_previous_session_id")
```

Use this to revisit an earlier session or re-use a shared session.

---

## 🐳 Environment & Compatibility

| Tool | Version |
|-----------|-----------------|
| Python | β‰₯ 3.7 (tested on 3.10) |
| Node.js | ^18.x |
| npm | ^9.x |

---

## πŸ“˜ Documentation

for full usage details and architecture?

πŸ‘‰ See [**DOCUMENTATION.md**](./DOCUMENTATION.md) for complete docs.

---

## πŸ–₯️ UI

B-Vista features a modern, interactive, and highly customizable interface built with React and AG Grid Enterprise. It’s designed to handle large datasets with performance and clarity β€” right from your notebook and browser.

---

### πŸ”’ Interactive Data Grid

At the heart of B-Vista is the **Data Table view** β€” a real-time, Excel-like experience for your DataFrame.

#### Key Features:

- **🧭 Column-wise Data Types**
Each column displays its **data type** (`int`, `float`, `bool`, `datetime`, etc.) along its name. These types are detected on upload and can be modified from the UI my using the convert data type feature on the **Formatting** dropdown.

- **πŸ” Live Editing + Sync**
Click any cell to edit it directly. Changes are **WebSocket-synced** across tabs and sessions β€” only the changed cell is transmitted.

- **πŸ” Smart Filters & Search**
Use quick column filters or open the **adjustable right-hand panel** to:
- Build complex filters
- Filter by range, category, substring, null presence, etc.

- **🧱 Column Grouping & Aggregation**
- Drag columns to group by their values
- Aggregate via **Sum**, **Avg**, **Min/Max**, **Count**, or **Custom**
- View live totals per group or globally

- **πŸͺŸ Adjustable Layout Panel**
Expand/collapse the sidebar for:
- Column manager (reorder, hide, freeze)
- Pivot setup
- Filter manager
- Aggregation panel

- **πŸ“ Dataset Shape + Schema Summary**
Always visible at the top:
- Dataset shape: `rows Γ— columns`

- **πŸ“¦ Column Tools Menu**
- Each column has a dropdown for filtering, sorting, etc
- Type conversion (e.g., to `currency`, `bool`, `date`, etc.) via Formatting dropdown
- Format adjustment (round decimals, datetime formats) via Formatting dropdown
- Replace values in-place via Formatting dropdown
- Detect/remove duplicates via Formatting dropdown

---

### πŸ“‚ Session Management

B-Vista supports **session-based dataset isolation**, letting you work across multiple datasets seamlessly.

#### Features:

- **🧾 Session Selector**
At the top-left, select your active dataset (e.g. `df`, `sales_data`, `test_set`). You can switch sessions without re-uploading.

- **πŸ•’ Session Expiry**
- Sessions expire **after 60 minutes of inactivity**
- Expiration is automatic to prevent memory buildup

- **πŸ“œ Session History**
- See all available sessions
- Session IDs are generated automatically but customizable on upload

---

### πŸ“‚ No-Code Cleaning & Transformation

All transformations can be performed from the UI with no code:

- Impute missing values (mean, median, mode, etc.)
- Remove duplicates (first, last, all)
- Cast column data types
- Normalize or standardize
- Rename columns or reorder

---

### πŸ“Š Performance & Usability

- **⚑ Fast rendering** with virtualized rows/columns for large datasets
- **πŸ“‹ Copy/paste** supported for multiple cells (just like Excel)
- **🧾 Export to CSV/Excel/image(charts)** with formatting preserved
- **πŸ“± Responsive** UI β€” works across notebooks and modern desktop browsers

---
![new1](https://github.com/user-attachments/assets/47ca953a-a84c-4d27-b1ff-cf72f5cdefd3)

---

## πŸ’‘ In the News & Inspiration

> β€œ**B-Vista** solves the frustration of static DataFrames β€” making EDA easy and accessible with no codes: **interactive**, **shareable**, and **explorable**.”
> β€” *Beta User & Data Science Educator*

---

We built B-Vista to bridge the gap between:
- πŸ’» **command line**
- πŸ’» **The Notebook**
- 🌐 **The Browser**
- πŸ”„ **Real-time collaboration and computation**

---

It’s designed to serve:

- **Data scientists** who want speed, clarity, data preparation for modeling, etc
- **Analysts** who need to clean and shape data efficiently
- **Teams** who need to explore shared datasets interactively

---

## πŸ”— Related Tools & Inspiration

B-Vista builds upon and complements other amazing open-source projects:

| Tool | Purpose |
|-------------------|----------------------------------------------|
| [pandas](https://pandas.pydata.org/) | Core DataFrame engine |
| [Lux](https://github.com/lux-org/lux) | EDA assistant for pandas |
| [pandas-profiling](https://github.com/ydataai/pandas-profiling) | Automated summary reports |
| [Plotly](https://plotly.com/python/) | Rich interactive visualizations |
| [Flask-SocketIO](https://flask-socketio.readthedocs.io/) | WebSocket backbone for real-time sync |
| [Vite](https://vitejs.dev/) | Lightning-fast frontend dev server |

---

## πŸ“‚ Project Structure

The B-Vista project is organized as a **modular full-stack application**. Below is an overview of the core directories and files.

```
b-vista/
β”œβ”€β”€ bvista/ ← Main Python package
β”‚ β”œβ”€β”€ __init__.py ← Auto-start backend in notebooks
β”‚ β”œβ”€β”€ notebook_integration.py← Jupyter + Colab + terminal helper
β”‚ β”œβ”€β”€ server_manager.py ← Launch logic for backend server
β”‚ β”œβ”€β”€ frontend/ ← React-based UI (AG Grid, Vite, Plotly)
β”‚ β”œβ”€β”€ backend/ ← Flask + WebSocket backend API
β”‚ β”‚ β”œβ”€β”€ app.py ← Backend entry point
β”‚ β”‚ β”œβ”€β”€ config.py ← Server config & constants
β”‚ β”‚ β”œβ”€β”€ models/ ← Data processing logic (stats, EDA)
β”‚ β”‚ β”œβ”€β”€ routes/ ← Flask API routes (upload, clean, stats)
β”‚ β”‚ β”œβ”€β”€ websocket/ ← Real-time updates via Socket.IO
β”‚ β”‚ β”œβ”€β”€ static/ ← Temp storage, file handling utils
β”‚ β”‚ └── utils/ ← Logging, helpers
β”‚ └── datasets/ ← Example datasets
β”‚
β”œβ”€β”€ tests/ ← Pytest-based backend test suite
β”œβ”€β”€ docs/ ← Extended documentation & wiki stubs
β”œβ”€β”€ requirements.txt ← Production dependencies
β”œβ”€β”€ pyproject.toml ← Packaging metadata (PEP 621)
β”œβ”€β”€ Dockerfile ← Builds self-contained container
β”œβ”€β”€ DOCUMENTATION.md ← Full technical documentation
β”œβ”€β”€ CONTRIBUTING.md ← Developer guide & contribution rules
β”œβ”€β”€ CODE_OF_CONDUCT.md ← Community standards
β”œβ”€β”€ README.md ← You’re reading this
```

---

### 🧭 Key Architecture Highlights

- **Modular Backend:** Each core task (e.g. correlation, distribution, missing data) has its own logic module under `backend/models`.

- **Stateless API Routes:** `backend/routes/data_routes.py` handles all DataFrame operations through REST endpoints.

- **WebSocket Sync:** Bi-directional session sync, live cell edits, and notifications are handled by `websocket/socket_manager.py`.

- **Frontend SPA (Single Page App):** The UI lives in `frontend/` and is powered by React + Vite for fast loading and a responsive user experience.

- **Notebook-Aware:** `notebook_integration.py` detects Jupyter/Colab environments and renders inline IFrames automatically.

---

## πŸ“‚ Dataset

B-Vista ships with a growing collection of **built-in datasets** and **live data connectors**, making it easy to start exploring.

### πŸŽ’ Built-in Datasets

These datasets are included with the package and require no setup or internet connection:

| Dataset | Description |
|----------------|--------------------------------------------------|
| `ames_housing` | 🏠 Real estate dataset with 80+ features on home sales in Ames, Iowa. |
| `titanic` | 🚒 Titanic survival dataset β€” classic classification use case. |
| `testing_data` | πŸ§ͺ Lightweight sample DataFrame used for test automation. |

Usage:

```python
from bvista.datasets import ames_housing, titanic

df = ames_housing.load()
df2 = titanic.load()
```
![Untitled design (7)](https://github.com/user-attachments/assets/ea753a23-f7dc-4680-b19f-63c5983bf010)

---

### πŸ”Œ Live Data Connectors

B-Vista also includes **plug-and-play connectors** for real-world, real-time data APIs. These are great for dynamic dashboards, teaching demos, or financial/data journalism.

#### 🦠 `covid19_live` β€” COVID-19 Tracker
- Powered by: [API Ninjas](https://api-ninjas.com/api/covid19)
- Fetch confirmed + new cases per region and day
- Requires an **API key** via env variable or argument

```python
from bvista.datasets import covid19_live

df = covid19_live.load(country="Canada", API_KEY="your_key")
```

πŸ“„ Full doc: [covid19_live.md](./docs/Datasets/covid19_live.md)

---

#### πŸ“ˆ `stock_prices` β€” Live Stock Market Data
- Powered by: [Alpha Vantage](https://www.alphavantage.co/)
- Supports daily, weekly, or monthly prices
- Filter by year or date range
- Single or multiple tickers supported

```python
from bvista.datasets import stock_prices

df = stock_prices.load(
symbol=["AAPL", "TSLA"],
interval="daily",
date="2023",
API_KEY="your_key"
)
```

πŸ“„ Full doc: [stock_prices.md](./docs/Datasets/stock_prices.md)

---

### πŸ”‘ API Key Configuration

Some datasets require an API key. You can provide it in two ways:

βœ… **Inline** (for quick testing):

```python
df = covid19_live.load(country="Nigeria", API_KEY="your_key")
```

βœ… **Environment variable** (recommended for reuse):

```bash
export API_NINJAS_API_KEY="your_key"
export ALPHAVANTAGE_API_KEY="your_key"
```

---

### πŸ§ͺ Testing Dataset for Devs

```python
from bvista.datasets import testing_data

df = testing_data.load()
```

Use this for:
- UI stress testing
- Column type detection
- Testing WebSocket edits & missing data tools

---

## πŸ”– Versioning

Follows [Semantic Versioning](https://semver.org)

```
Current: v0.1.0 (pre-release)
```

Expect fast iteration and breaking changes until 1.0.0

---

## πŸ§‘β€πŸ’» Developer Setup & Contributing

Whether you're fixing a bug, improving the UI, or adding new data science modules β€” you're welcome to contribute to B-Vista!

---

### 🧰 1. Clone the Repository

```bash
git clone https://github.com/Baci-Ak/b-vista.git
cd b-vista
```

---

### πŸ§ͺ 2. Local Development (Recommended)

Set up a virtual environment and install dependencies:

```bash
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt

pip install --upgrade pip
pip install -e ".[dev]"
python bvista/backend/app.py
```

---

### 🐳 3. Docker Dev Environment

Prefer isolation? Use Docker to build and run the entire app:

```bash
# Build the image
docker buildx build --platform linux/amd64 -t baciak/bvista:test .

# Run the container
docker run --platform linux/amd64 -p 8501:5050 baciak/bvista:test
```

Your app will be available at:

```
http://localhost:8501
```

---

### πŸ”§ 4. Live Dev with Volume Mounting

For live updates as you edit:

```bash
docker run --platform linux/amd64 \
-p 8501:5050 \
-v $(pwd):/app \
-w /app \
--entrypoint bash \
baciak/bvista:test
```

Inside the container, launch the backend manually:

```bash
python bvista/backend/app.py
```

---

### 🧼 5. Frontend Setup (Optional)

The frontend lives in `bvista/frontend`. To run it independently:

```bash
cd bvista/frontend
npm install

`npm start`

```
Runs the app in the development mode.\
Open [http://localhost:3000](http://localhost:3000) to view it in your browser

```bash
npm run dev`
or
npm run build

```
Builds the app for production to the `dev` folder.\ or build.\
refer to [ Frontend Setup](./bvista/frontend/README.md) for more details

---

### 🀝 6. Want to Contribute?

All contributions are welcome β€” from UI polish and bug reports to backend features.

Check out [CONTRIBUTING.md](./CONTRIBUTING.md) to learn how to:

- Open a pull request (PR)
- Follow code style and linting
- Suggest new ideas
- Join our community discussions

---

πŸ”’ By contributing, you agree to follow our [Code of Conduct](./CODE_OF_CONDUCT.md).

## πŸ§‘β€πŸ’» Security

B-Vista is designed with session safety, memory isolation, and zero-disk write defaults.

πŸ‘‰ For full details, see our [**SECURITY.md**](./SECURITY.md)

## πŸ“„ License

B-Vista is released under the **[BSD 3-Clause License](./LICENSE)**

---