https://github.com/baci-ak/b-vista
Interactive EDA tool to explore pandas DataFrames β via Python, notebooks & Docker
https://github.com/baci-ak/b-vista
analytics data-analysis data-science data-visualization dataframe docker eda flask flexible ipython jupyter notebook pandas python react visualization
Last synced: 2 months ago
JSON representation
Interactive EDA tool to explore pandas DataFrames β via Python, notebooks & Docker
- Host: GitHub
- URL: https://github.com/baci-ak/b-vista
- Owner: Baci-Ak
- License: bsd-3-clause
- Created: 2025-02-16T12:09:12.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-04-14T08:41:00.000Z (2 months ago)
- Last Synced: 2025-04-14T09:44:18.001Z (2 months ago)
- Topics: analytics, data-analysis, data-science, data-visualization, dataframe, docker, eda, flask, flexible, ipython, jupyter, notebook, pandas, python, react, visualization
- Language: Jupyter Notebook
- Homepage:
- Size: 263 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
---
# π B-vista
> **Visual, Scalable, and Real-Time Exploratory Data Analysis β Built for modern notebooks and the browser.**
---

## What is it?
**B-vista** is a full-stack Exploratory Data Analysis (EDA) interface for `pandas` DataFrames. It connects a **Flask + WebSocket backend** to a **dynamic React frontend**, offering everything from descriptive stats to missing data diagnostics β in real-time.---
| **Testing** |    |
|-------------|----------------------------------------------------------------------------------------------------------------------------------|
| **Package** | [](https://pypi.org/project/bvista/) [](https://pepy.tech/projects/bvista)  |
| **Meta** |  [](https://opensource.org/licenses/BSD-3-Clause)  |---
> π― **Designed for**
> Data Scientists Β· Analysts Β· Educators
> Teams collaborating over datasets---
## π Contents
- [β¨ Main Features](#-main-features)
- [π¦ Installation](#-installation)
- [π³ Docker Quickstart](#-docker-quickstart)
- [π Quickstart](#-quickstart)
- [βοΈ Advanced Usage](#οΈ-advanced-usage)
- [π Reconnect to a Previous Session](#-reconnect-to-a-previous-session)
- [π³ Environment & Compatibility](#-οΈenvironment--compatibility)
- [π Documentation](#-documentation)
- [π₯οΈ UI](#-ui)
- [π’ Interactive Data Grid](#-interactive-data-grid)
- [π Session Management](#-session-management)
- [π No-Code Cleaning & Transformation](#-no-code-cleaning--transformation)
- [π Performance & Usability](#-performance--usability)
- [π‘ In the News & Inspiration](#-in-the-news--inspiration)
- [π Related Tools & Inspiration](#-related-tools--inspiration)
- [π Project Structure](#-project-structure)
- [π Dataset](#-dataset)
- [π Versioning](#-versioning)
- [π§βπ» Developer Setup & Contributing](#-developer-setup--contributing)
- [π§βπ» Security](#-security)
- [π License](#-license)---
## β¨ Main Features
B-vista transforms how you explore and clean pandas DataFrames. With just a few clicks or lines of code, you get a comprehensive, interactive EDA experience tailored for effecient workflows.
- **π Descriptive Statistics**
Summarize distributions with enhanced stats including skewness, kurtosis, Shapiro-Wilk normality, and z-scoresβbeyond standard `.describe()`.- **π Correlation Matrix Explorer**
Instantly visualize relationships using Pearson, Spearman, Kendall, Mutual Info, Partial, Robust, and Distance correlations.- **π Distribution Analysis**
Generate histograms, KDE plots, box plots (with auto log-scaling), and QQ plots for deep insight into variable spread and outliers.- **π§Ό Missing Data Diagnostics**
Visualize missingness (matrix, heatmap, dendrogram), identify patterns, and classify gaps using MCAR/MAR/NMAR inference methods.- **π οΈ Smart Data Cleaning**
Drop or impute missing values with Mean, Median, Mode, Forward/Backward Fill, Interpolation, KNN, Iterative, Regression, or Autoencoder.- **π Data Transformation Engine**
Cast column types, format as time or currency, normalize/standardize, rename or reorder columnsβall with audit-safe tracking.- **𧬠Duplicate Detection & Resolution**
Automatically detect, isolate, or remove duplicate rows with real-time filtering.- **π Inline Cell Editing & Updates**
Update any cell in-place and sync live across sessions via WebSocket-powered pipelines.- **π Seamless Dataset Upload**
Drag-and-drop or API-based DataFrame ingestion using secure, session-isolated pickle transport.> π [See full feature breakdown β](docs/features.md)
---
### Where to get it
the source code is currently hosted on Github at β [Source code](https://github.com/Baci-Ak/b-vista).
> Binary installers for the latest released version are available at the β [Python Package Index (PyPI)](https://pypi.org/project/bvista/)---
## π¦ Installation```bash
#PYPI
pip install bvista
```
```bash
#Conda
conda install -c conda-forge bvista
```## π³ Docker Quickstart
B-Vista is available as a ready-to-run Docker image on β [Docker Hub](https://hub.docker.com/r/baciak/bvista):
```bash
docker pull baciak/bvista:latest
```> β Works on Linux, Windows, and macOS
> β On Apple Silicon (M1/M2/M3), use: `--platform linux/amd64`### βΆοΈ Run the App
To launch the B-Vista web app locally:
```bash
docker run --platform linux/amd64 -p 8501:5050 baciak/bvista:latest
```Then open your browser and go to:
```
http://localhost:8501
```> [Docker Doc](https://hub.docker.com/r/baciak/bvista)
---## π Quickstart
The fastest way to get started (in a notebook):
```python
import bvista
df = pd.read_csv("dataset.csv")
bvista.show(df)
```
### Command line (terminal)
---
## βοΈ Advanced Usage
For full control over how and where B-Vista runs, use the `show()` function with advanced arguments:
```python
import bvista
import pandas as pddf = pd.read_csv("dataset.csv")
# π Customize how B-Vista starts and displays
bvista.show(
df, # Required: your pandas DataFrame
name="my_dataset", # Optional: session name
open_browser=True, # Optional: open in browser outside notebooks
silent=False # Optional: print connection messages
)
```---
### π Reconnect to a Previous Session
```python
bvista.show(session_id="your_previous_session_id")
```Use this to revisit an earlier session or re-use a shared session.
---
## π³ Environment & Compatibility
| Tool | Version |
|-----------|-----------------|
| Python | β₯ 3.7 (tested on 3.10) |
| Node.js | ^18.x |
| npm | ^9.x |---
## π Documentation
for full usage details and architecture?
π See [**DOCUMENTATION.md**](./DOCUMENTATION.md) for complete docs.
---
## π₯οΈ UI
B-Vista features a modern, interactive, and highly customizable interface built with React and AG Grid Enterprise. Itβs designed to handle large datasets with performance and clarity β right from your notebook and browser.
---
### π’ Interactive Data Grid
At the heart of B-Vista is the **Data Table view** β a real-time, Excel-like experience for your DataFrame.
#### Key Features:
- **π§ Column-wise Data Types**
Each column displays its **data type** (`int`, `float`, `bool`, `datetime`, etc.) along its name. These types are detected on upload and can be modified from the UI my using the convert data type feature on the **Formatting** dropdown.- **π Live Editing + Sync**
Click any cell to edit it directly. Changes are **WebSocket-synced** across tabs and sessions β only the changed cell is transmitted.- **π Smart Filters & Search**
Use quick column filters or open the **adjustable right-hand panel** to:
- Build complex filters
- Filter by range, category, substring, null presence, etc.- **π§± Column Grouping & Aggregation**
- Drag columns to group by their values
- Aggregate via **Sum**, **Avg**, **Min/Max**, **Count**, or **Custom**
- View live totals per group or globally- **πͺ Adjustable Layout Panel**
Expand/collapse the sidebar for:
- Column manager (reorder, hide, freeze)
- Pivot setup
- Filter manager
- Aggregation panel- **π Dataset Shape + Schema Summary**
Always visible at the top:
- Dataset shape: `rows Γ columns`- **π¦ Column Tools Menu**
- Each column has a dropdown for filtering, sorting, etc
- Type conversion (e.g., to `currency`, `bool`, `date`, etc.) via Formatting dropdown
- Format adjustment (round decimals, datetime formats) via Formatting dropdown
- Replace values in-place via Formatting dropdown
- Detect/remove duplicates via Formatting dropdown---
### π Session Management
B-Vista supports **session-based dataset isolation**, letting you work across multiple datasets seamlessly.
#### Features:
- **π§Ύ Session Selector**
At the top-left, select your active dataset (e.g. `df`, `sales_data`, `test_set`). You can switch sessions without re-uploading.- **π Session Expiry**
- Sessions expire **after 60 minutes of inactivity**
- Expiration is automatic to prevent memory buildup- **π Session History**
- See all available sessions
- Session IDs are generated automatically but customizable on upload---
### π No-Code Cleaning & Transformation
All transformations can be performed from the UI with no code:
- Impute missing values (mean, median, mode, etc.)
- Remove duplicates (first, last, all)
- Cast column data types
- Normalize or standardize
- Rename columns or reorder---
### π Performance & Usability
- **β‘ Fast rendering** with virtualized rows/columns for large datasets
- **π Copy/paste** supported for multiple cells (just like Excel)
- **π§Ύ Export to CSV/Excel/image(charts)** with formatting preserved
- **π± Responsive** UI β works across notebooks and modern desktop browsers---
---
## π‘ In the News & Inspiration
> β**B-Vista** solves the frustration of static DataFrames β making EDA easy and accessible with no codes: **interactive**, **shareable**, and **explorable**.β
> β *Beta User & Data Science Educator*---
We built B-Vista to bridge the gap between:
- π» **command line**
- π» **The Notebook**
- π **The Browser**
- π **Real-time collaboration and computation**---
Itβs designed to serve:
- **Data scientists** who want speed, clarity, data preparation for modeling, etc
- **Analysts** who need to clean and shape data efficiently
- **Teams** who need to explore shared datasets interactively---
## π Related Tools & Inspiration
B-Vista builds upon and complements other amazing open-source projects:
| Tool | Purpose |
|-------------------|----------------------------------------------|
| [pandas](https://pandas.pydata.org/) | Core DataFrame engine |
| [Lux](https://github.com/lux-org/lux) | EDA assistant for pandas |
| [pandas-profiling](https://github.com/ydataai/pandas-profiling) | Automated summary reports |
| [Plotly](https://plotly.com/python/) | Rich interactive visualizations |
| [Flask-SocketIO](https://flask-socketio.readthedocs.io/) | WebSocket backbone for real-time sync |
| [Vite](https://vitejs.dev/) | Lightning-fast frontend dev server |---
## π Project Structure
The B-Vista project is organized as a **modular full-stack application**. Below is an overview of the core directories and files.
```
b-vista/
βββ bvista/ β Main Python package
β βββ __init__.py β Auto-start backend in notebooks
β βββ notebook_integration.pyβ Jupyter + Colab + terminal helper
β βββ server_manager.py β Launch logic for backend server
β βββ frontend/ β React-based UI (AG Grid, Vite, Plotly)
β βββ backend/ β Flask + WebSocket backend API
β β βββ app.py β Backend entry point
β β βββ config.py β Server config & constants
β β βββ models/ β Data processing logic (stats, EDA)
β β βββ routes/ β Flask API routes (upload, clean, stats)
β β βββ websocket/ β Real-time updates via Socket.IO
β β βββ static/ β Temp storage, file handling utils
β β βββ utils/ β Logging, helpers
β βββ datasets/ β Example datasets
β
βββ tests/ β Pytest-based backend test suite
βββ docs/ β Extended documentation & wiki stubs
βββ requirements.txt β Production dependencies
βββ pyproject.toml β Packaging metadata (PEP 621)
βββ Dockerfile β Builds self-contained container
βββ DOCUMENTATION.md β Full technical documentation
βββ CONTRIBUTING.md β Developer guide & contribution rules
βββ CODE_OF_CONDUCT.md β Community standards
βββ README.md β Youβre reading this
```---
### π§ Key Architecture Highlights
- **Modular Backend:** Each core task (e.g. correlation, distribution, missing data) has its own logic module under `backend/models`.
- **Stateless API Routes:** `backend/routes/data_routes.py` handles all DataFrame operations through REST endpoints.
- **WebSocket Sync:** Bi-directional session sync, live cell edits, and notifications are handled by `websocket/socket_manager.py`.
- **Frontend SPA (Single Page App):** The UI lives in `frontend/` and is powered by React + Vite for fast loading and a responsive user experience.
- **Notebook-Aware:** `notebook_integration.py` detects Jupyter/Colab environments and renders inline IFrames automatically.
---
## π Dataset
B-Vista ships with a growing collection of **built-in datasets** and **live data connectors**, making it easy to start exploring.
### π Built-in Datasets
These datasets are included with the package and require no setup or internet connection:
| Dataset | Description |
|----------------|--------------------------------------------------|
| `ames_housing` | π Real estate dataset with 80+ features on home sales in Ames, Iowa. |
| `titanic` | π’ Titanic survival dataset β classic classification use case. |
| `testing_data` | π§ͺ Lightweight sample DataFrame used for test automation. |Usage:
```python
from bvista.datasets import ames_housing, titanicdf = ames_housing.load()
df2 = titanic.load()
```
---
### π Live Data Connectors
B-Vista also includes **plug-and-play connectors** for real-world, real-time data APIs. These are great for dynamic dashboards, teaching demos, or financial/data journalism.
#### π¦ `covid19_live` β COVID-19 Tracker
- Powered by: [API Ninjas](https://api-ninjas.com/api/covid19)
- Fetch confirmed + new cases per region and day
- Requires an **API key** via env variable or argument```python
from bvista.datasets import covid19_livedf = covid19_live.load(country="Canada", API_KEY="your_key")
```π Full doc: [covid19_live.md](./docs/Datasets/covid19_live.md)
---
#### π `stock_prices` β Live Stock Market Data
- Powered by: [Alpha Vantage](https://www.alphavantage.co/)
- Supports daily, weekly, or monthly prices
- Filter by year or date range
- Single or multiple tickers supported```python
from bvista.datasets import stock_pricesdf = stock_prices.load(
symbol=["AAPL", "TSLA"],
interval="daily",
date="2023",
API_KEY="your_key"
)
```π Full doc: [stock_prices.md](./docs/Datasets/stock_prices.md)
---
### π API Key Configuration
Some datasets require an API key. You can provide it in two ways:
β **Inline** (for quick testing):
```python
df = covid19_live.load(country="Nigeria", API_KEY="your_key")
```β **Environment variable** (recommended for reuse):
```bash
export API_NINJAS_API_KEY="your_key"
export ALPHAVANTAGE_API_KEY="your_key"
```---
### π§ͺ Testing Dataset for Devs
```python
from bvista.datasets import testing_datadf = testing_data.load()
```Use this for:
- UI stress testing
- Column type detection
- Testing WebSocket edits & missing data tools---
## π Versioning
Follows [Semantic Versioning](https://semver.org)
```
Current: v0.1.0 (pre-release)
```Expect fast iteration and breaking changes until 1.0.0
---
## π§βπ» Developer Setup & Contributing
Whether you're fixing a bug, improving the UI, or adding new data science modules β you're welcome to contribute to B-Vista!
---
### π§° 1. Clone the Repository
```bash
git clone https://github.com/Baci-Ak/b-vista.git
cd b-vista
```---
### π§ͺ 2. Local Development (Recommended)
Set up a virtual environment and install dependencies:
```bash
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txtpip install --upgrade pip
pip install -e ".[dev]"
python bvista/backend/app.py
```---
### π³ 3. Docker Dev Environment
Prefer isolation? Use Docker to build and run the entire app:
```bash
# Build the image
docker buildx build --platform linux/amd64 -t baciak/bvista:test .# Run the container
docker run --platform linux/amd64 -p 8501:5050 baciak/bvista:test
```Your app will be available at:
```
http://localhost:8501
```---
### π§ 4. Live Dev with Volume Mounting
For live updates as you edit:
```bash
docker run --platform linux/amd64 \
-p 8501:5050 \
-v $(pwd):/app \
-w /app \
--entrypoint bash \
baciak/bvista:test
```Inside the container, launch the backend manually:
```bash
python bvista/backend/app.py
```---
### π§Ό 5. Frontend Setup (Optional)
The frontend lives in `bvista/frontend`. To run it independently:
```bash
cd bvista/frontend
npm install`npm start`
```
Runs the app in the development mode.\
Open [http://localhost:3000](http://localhost:3000) to view it in your browser```bash
npm run dev`
or
npm run build```
Builds the app for production to the `dev` folder.\ or build.\
refer to [ Frontend Setup](./bvista/frontend/README.md) for more details---
### π€ 6. Want to Contribute?
All contributions are welcome β from UI polish and bug reports to backend features.
Check out [CONTRIBUTING.md](./CONTRIBUTING.md) to learn how to:
- Open a pull request (PR)
- Follow code style and linting
- Suggest new ideas
- Join our community discussions---
π By contributing, you agree to follow our [Code of Conduct](./CODE_OF_CONDUCT.md).
## π§βπ» Security
B-Vista is designed with session safety, memory isolation, and zero-disk write defaults.
π For full details, see our [**SECURITY.md**](./SECURITY.md)
## π License
B-Vista is released under the **[BSD 3-Clause License](./LICENSE)**
---