{"id":50798085,"url":"https://github.com/iannil/one-data-studio","last_synced_at":"2026-06-12T16:04:11.910Z","repository":{"id":334418908,"uuid":"1141113835","full_name":"iannil/one-data-studio","owner":"iannil","description":"one-data-studio integrates a data governance and development platform, a cloud-native MLOps platform, and a large model application development platform. It connects the entire value chain from raw data governance to model training and deployment, and further to the construction of generative AI applications.","archived":false,"fork":false,"pushed_at":"2026-02-04T14:16:26.000Z","size":7926,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-02-05T01:35:52.426Z","etag":null,"topics":["data","llm","model","platform"],"latest_commit_sha":null,"homepage":"https://zhurongshuo.com/products/one-data-studio/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iannil.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-24T09:17:55.000Z","updated_at":"2026-02-04T14:20:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"a5c9f169-f577-4672-857f-b7ac014d936f","html_url":"https://github.com/iannil/one-data-studio","commit_stats":null,"previous_names":["iannil/one-data-studio"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/iannil/one-data-studio","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iannil%2Fone-data-studio","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iannil%2Fone-data-studio/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iannil%2Fone-data-studio/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iannil%2Fone-data-studio/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iannil","download_url":"https://codeload.github.com/iannil/one-data-studio/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iannil%2Fone-data-studio/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34251783,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","llm","model","platform"],"created_at":"2026-06-12T16:04:10.497Z","updated_at":"2026-06-12T16:04:11.890Z","avatar_url":"https://github.com/iannil.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ONE-DATA-STUDIO\n\n\u003cdiv align=\"center\"\u003e\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)\n[![Python](https://img.shields.io/badge/Python-3.10%2B-green.svg)](https://www.python.org/)\n[![React](https://img.shields.io/badge/React-18.3-blue.svg)](https://reactjs.org/)\n[![TypeScript](https://img.shields.io/badge/TypeScript-5.4-blue.svg)](https://www.typescriptlang.org/)\n[![Kubernetes](https://img.shields.io/badge/Kubernetes-1.27%2B-326ce5.svg)](https://kubernetes.io/)\n[![Docker](https://img.shields.io/badge/Docker-20.10%2B-2496ED.svg)](https://www.docker.com/)\n\nEnterprise-Grade DataOps + MLOps + LLMOps Converged Platform\n\n*From Raw Data to Intelligent Applications — All in One Platform*\n\n[Features](#-features) | [Quick Start](#-quick-start) | [Architecture](#-architecture) | [Use Cases](#-use-cases) | [Comparison](#-comparison-with-alternatives) | [Documentation](#-documentation) | [简体中文](README_ZH.md)\n\n\u003c/div\u003e\n\n---\n\n## What is ONE-DATA-STUDIO?\n\nONE-DATA-STUDIO is an open-source enterprise platform that uniquely converges three critical AI infrastructure layers into a unified system:\n\n| Layer | Name | Description |\n| ------- | ------ | ------------- |\n| Data | DataOps Platform | Data integration, ETL, governance, feature store, and vector storage |\n| Model | MLOps Platform | Jupyter notebooks, distributed training, model registry, and serving |\n| Agent | LLMOps Platform | RAG pipelines, agent orchestration, workflow builder, and prompt management |\n\nUnlike traditional platforms that treat these as separate silos, ONE-DATA-STUDIO creates seamless integration points between layers, enabling enterprises to build end-to-end AI solutions from raw data to production applications.\n\n### Key Value Propositions\n\n1. Complete Value Chain: Raw data → Governed datasets → Trained models → Deployed applications\n2. Unified Governance: Single pane of glass for data lineage, model lineage, and application logs\n3. Private \u0026 Secure: Deploy entirely on-premises with your own data, compute, and models\n4. Production-Ready: Battle-tested with enterprise-grade security, monitoring, and scalability\n\n---\n\n## Features\n\n### Data Layer (DataOps)\n\n| Feature | Description | Implementation |\n| --------- | ------------- | ---------------- |\n| Data Integration | Connect to 50+ data sources (databases, APIs, files) | Flask-based connectors with async I/O |\n| ETL Pipelines | Visual pipeline builder with Flink/Spark execution | Declarative DAG definitions |\n| Metadata Management | Automatic schema discovery and cataloging | OpenMetadata integration |\n| Data Quality | Rule-based validation and anomaly detection | Custom quality engine |\n| Data Lineage | Track data flow from source to consumption | Column-level lineage tracking |\n| Feature Store | Unified feature management for ML models | MinIO + versioned datasets |\n| Vector Storage | High-performance vector database for RAG | Milvus 2.3 integration |\n\n### Model Layer (MLOps)\n\n| Feature | Description | Implementation |\n| --------- | ------------- | ---------------- |\n| Notebook Environment | JupyterHub with GPU support | K8s-native deployment |\n| Distributed Training | Multi-GPU, multi-node training | Ray integration |\n| Model Registry | Version control for models | MLflow-compatible API |\n| Model Serving | High-throughput inference | vLLM with OpenAI-compatible API |\n| Experiment Tracking | Log metrics, parameters, artifacts | Built-in tracking system |\n| A/B Deployment | Gradual rollout with traffic splitting | Istio service mesh |\n\n### Agent Layer (LLMOps)\n\n| Feature | Description | Implementation |\n| --------- | ------------- | ---------------- |\n| RAG Pipeline | End-to-end retrieval-augmented generation | LangChain + Milvus |\n| Agent Orchestration | Multi-agent systems with tool use | Custom agent framework |\n| Visual Workflow | Drag-and-drop workflow builder | ReactFlow canvas |\n| Prompt Management | Template library with versioning | A/B testing support |\n| Knowledge Base | Document ingestion and chunking | PDF, DOCX, Markdown support |\n| Text-to-SQL | Natural language database queries | Metadata-enhanced prompts |\n| Token Tracking | Usage monitoring and cost control | Per-request token counting |\n\n### Platform Administration\n\n| Feature | Description | Implementation |\n| --------- | ------------- | ---------------- |\n| Identity Management | SSO with OIDC/SAML support | Keycloak 23.0 |\n| Access Control | Fine-grained RBAC | Role-based permissions |\n| Multi-tenancy | Isolated workspaces | Namespace-level isolation |\n| Audit Logging | Comprehensive activity tracking | Searchable audit trail |\n| Observability | Metrics, traces, logs | Prometheus + Grafana + Jaeger |\n\n---\n\n## Architecture\n\n### Four-Layer Architecture\n\n```\n┌───────────────────────────────────────────────────────────────────────────┐\n│                     L4 APPLICATION LAYER (Agent)                          │\n│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │\n│  │ RAG Pipeline│  │Agent System │  │ Workflow    │  │ Text-to-SQL │       │\n│  │ • Embedding │  │ • Planning  │  │ • ReactFlow │  │ • Schema    │       │\n│  │ • Retrieval │  │ • Tool Use  │  │ • Nodes     │  │ • Query Gen │       │\n│  │ • Generation│  │ • Memory    │  │ • Execution │  │ • Results   │       │\n│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘       │\n└───────────────────────────────────────────────────────────────────────────┘\n                        ↕ OpenAI-Compatible API / Metadata Injection\n┌───────────────────────────────────────────────────────────────────────────┐\n│                    L3 ALGORITHM ENGINE LAYER (Model)                      │\n│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │\n│  │ Notebook    │  │ Distributed │  │ Model       │  │ Inference   │       │\n│  │ • Jupyter   │  │ Training    │  │ Registry    │  │ • vLLM      │       │\n│  │ • GPU       │  │ • Ray       │  │ • Versions  │  │ • Batching  │       │\n│  │ • Kernels   │  │ • Multi-GPU │  │ • Artifacts │  │ • Scaling   │       │\n│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘       │\n└───────────────────────────────────────────────────────────────────────────┘\n                        ↕ Dataset Mounting / Feature Retrieval\n┌───────────────────────────────────────────────────────────────────────────┐\n│                    L2 DATA FOUNDATION LAYER (Data)                        │\n│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │\n│  │ Integration │  │ ETL Engine  │  │ Governance  │  │ Storage     │       │\n│  │ • Connectors│  │ • Flink     │  │ • Metadata  │  │ • MinIO     │       │\n│  │ • CDC       │  │ • Spark     │  │ • Quality   │  │ • Milvus    │       │\n│  │ • Streaming │  │ • Transform │  │ • Lineage   │  │ • Redis     │       │\n│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘       │\n└───────────────────────────────────────────────────────────────────────────┘\n                        ↕ Storage Protocol / Resource Scheduling\n┌───────────────────────────────────────────────────────────────────────────┐\n│                    L1 INFRASTRUCTURE LAYER (Kubernetes)                   │\n│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │\n│  │ Compute     │  │ Storage     │  │ Network     │  │ Observability│      │\n│  │ • CPU Pool  │  │ • PVC       │  │ • Istio     │  │ • Prometheus│       │\n│  │ • GPU Pool  │  │ • MinIO     │  │ • Ingress   │  │ • Grafana   │       │\n│  │ • Auto-scale│  │ • HDFS      │  │ • DNS       │  │ • Jaeger    │       │\n│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘       │\n└───────────────────────────────────────────────────────────────────────────┘\n```\n\n### Core Services\n\n| Service | Port | Framework | Description |\n| --------- | ------ | ----------- | ------------- |\n| web | 3000 | React + Vite | Main application frontend |\n| agent-api | 8000 | Flask | LLMOps orchestration service |\n| data-api | 8001 | Flask | Data governance service |\n| model-api | 8002 | FastAPI | MLOps management service |\n| openai-proxy | 8003 | FastAPI | OpenAI-compatible proxy |\n| admin-api | 8004 | Flask | Platform administration |\n| ocr-service | 8005 | FastAPI | Document recognition |\n| behavior-service | 8006 | Flask | User analytics |\n\n### Integration Architecture\n\n```\n┌─────────────────────────────────────────────────────────────────────────┐\n│                         Integration Points                              │\n├─────────────────────────────────────────────────────────────────────────┤\n│                                                                         │\n│  ┌──────────┐     Data → Model (90%)      ┌──────────┐                  │\n│  │   Data   │ ─────────────────────────▶  │  Model   │                  │\n│  │  Layer   │   • Unified storage (MinIO) │  Layer   │                  │\n│  │          │   • Dataset versioning      │          │                  │\n│  │          │   • Auto dataset registry   │          │                  │\n│  └──────────┘                             └──────────┘                  │\n│       │                                        │                        │\n│       │                                        │                        │\n│       │  Data → Agent (75%)    Model → Agent (85%)                      │\n│       │  • Metadata injection  • OpenAI API                             │\n│       │  • Text-to-SQL         • vLLM serving                           │\n│       │  • Schema context      • Model routing                          │\n│       ▼                                        ▼                        │\n│                        ┌──────────┐                                     │\n│                        │  Agent   │                                     │\n│                        │  Layer   │                                     │\n│                        └──────────┘                                     │\n│                                                                         │\n└─────────────────────────────────────────────────────────────────────────┘\n```\n\n---\n\n## Quick Start\n\n### Prerequisites\n\n| Requirement | Version | Notes |\n| ------------ | --------- | ------- |\n| Docker | 20.10+ | Required for all deployment options |\n| Docker Compose | 2.0+ | For local development |\n| Node.js | 18+ | For frontend development |\n| Python | 3.10+ | For backend development |\n| kubectl | 1.25+ | For Kubernetes deployment |\n| Helm | 3.x | For Helm deployment |\n\n### Option 1: Docker Compose (Development)\n\n```bash\n# Clone the repository\ngit clone https://github.com/iannil/one-data-studio.git\ncd one-data-studio\n\n# Configure environment\ncp .env.example .env\n# Edit .env to set passwords: MYSQL_PASSWORD, REDIS_PASSWORD, MINIO_SECRET_KEY, etc.\n\n# Start all services\ndocker-compose -f deploy/local/docker-compose.yml up -d\n\n# Check status\ndocker-compose -f deploy/local/docker-compose.yml ps\n\n# View logs\ndocker-compose -f deploy/local/docker-compose.yml logs -f\n```\n\nUsing Makefile:\n\n```bash\nmake dev          # Start development environment\nmake dev-status   # Check service status\nmake dev-logs     # View service logs\nmake dev-stop     # Stop all services\nmake dev-clean    # Clean up volumes\n```\n\n### Option 2: Kubernetes (Production)\n\n```bash\n# Create a local Kind cluster (for testing)\nmake kind-cluster\n\n# Install with Kustomize\nkubectl apply -k deploy/kubernetes/overlays/production\n\n# Or install with Helm\nhelm install one-data deploy/helm/charts/one-data \\\n  --namespace one-data \\\n  --create-namespace \\\n  --values deploy/helm/charts/one-data/values-production.yaml\n\n# Check status\nkubectl get pods -n one-data\n\n# Forward ports for local access\nmake forward\n```\n\n### Access the Platform\n\n| Service | URL | Credentials |\n| --------- | ----- | ------------- |\n| Web UI | \u003chttp://localhost:3000\u003e | - |\n| Agent API | \u003chttp://localhost:8000/docs\u003e | - |\n| Data API | \u003chttp://localhost:8001/docs\u003e | - |\n| Model API | \u003chttp://localhost:8002/docs\u003e | - |\n| OpenAI Proxy | \u003chttp://localhost:8003/docs\u003e | API Key |\n| Keycloak | \u003chttp://localhost:8080\u003e | admin/admin |\n| MinIO | \u003chttp://localhost:9001\u003e | minioadmin/minioadmin |\n| Grafana | \u003chttp://localhost:3001\u003e | admin/admin |\n| Prometheus | \u003chttp://localhost:9090\u003e | - |\n\n---\n\n## Use Cases\n\n### 1. Enterprise Knowledge Center\n\nScenario: Enterprises have scattered documents across departments — policies, procedures, technical docs, FAQs. Employees struggle to find information quickly.\n\nSolution with ONE-DATA-STUDIO:\n\n```\n┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐\n│  Document   │    │   Data      │    │   Agent     │    │   Chat      │\n│  Sources    │───▶│   Layer     │───▶│   Layer     │───▶│   Interface │\n│             │    │             │    │             │    │             │\n│ • PDF       │    │ • Chunking  │    │ • RAG       │    │ • Q\u0026A       │\n│ • DOCX      │    │ • Embedding │    │ • Reranking │    │ • Citations │\n│ • Markdown  │    │ • Milvus    │    │ • Generation│    │ • History   │\n└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘\n```\n\nBenefits:\n\n- 70% reduction in time-to-answer for employee queries\n- Automatic document updates with versioning\n- Source citations for every answer\n\n### 2. ChatBI (Business Intelligence)\n\nScenario: Business users need data insights but can't write SQL. They depend on data analysts for every query, creating bottlenecks.\n\nSolution with ONE-DATA-STUDIO:\n\n```\n┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐\n│  Natural    │    │   Data      │    │   Agent     │    │   Visual    │\n│  Language   │───▶│   Layer     │───▶│   Layer     │───▶│   Results   │\n│   Query     │    │             │    │             │    │             │\n│             │    │ • Metadata  │    │ • Text2SQL  │    │ • Charts    │\n│ \"Show Q4    │    │ • Schema    │    │ • Query     │    │ • Tables    │\n│  sales by   │    │ • Relations │    │ • Validate  │    │ • Export    │\n│  region\"    │    │ • Context   │    │ • Execute   │    │             │\n└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘\n```\n\nBenefits:\n\n- Self-service analytics without SQL knowledge\n- 80% reduction in data analyst workload for ad-hoc queries\n- Metadata-enhanced accuracy for complex queries\n\n### 3. Private LLM Deployment\n\nScenario: Enterprises want to use LLMs but have strict data privacy requirements. Cloud APIs are not an option.\n\nSolution with ONE-DATA-STUDIO:\n\n```\n┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐\n│  Private    │    │   Model     │    │   OpenAI    │    │   Agent     │\n│   Data      │───▶│   Layer     │───▶│   Proxy     │───▶│   Apps      │\n│             │    │             │    │             │    │             │\n│ • Training  │    │ • Fine-tune │    │ • Compat API│    │ • Chat      │\n│   Data      │    │ • vLLM      │    │ • Routing   │    │ • RAG       │\n│ • Documents │    │ • Multi-GPU │    │ • Rate Limit│    │ • Workflow  │\n└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘\n```\n\nBenefits:\n\n- 100% on-premises deployment\n- OpenAI-compatible API for easy integration\n- Cost control with private GPU clusters\n\n### 4. Industrial Quality Inspection\n\nScenario: Manufacturing lines generate sensor data. Detecting anomalies early prevents costly defects and downtime.\n\nSolution with ONE-DATA-STUDIO:\n\n```\n┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐\n│  IoT        │    │   Data      │    │   Model     │    │   Alerting  │\n│  Sensors    │───▶│   Layer     │───▶│   Layer     │───▶│   System    │\n│             │    │             │    │             │    │             │\n│ • Temp      │    │ • Streaming │    │ • Anomaly   │    │ • Threshold │\n│ • Pressure  │    │ • Feature   │    │ • Detection │    │ • Dashboard │\n│ • Vibration │    │ • Store     │    │ • Real-time │    │ • Actions   │\n└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘\n```\n\nBenefits:\n\n- Real-time anomaly detection at sub-second latency\n- Unified feature store for training and inference\n- Traceability from prediction to source data\n\n### 5. Custom AI Workflow Automation\n\nScenario: Complex business processes require multiple AI capabilities — document extraction, decision making, and action execution.\n\nSolution with ONE-DATA-STUDIO:\n\n```\n┌─────────────────────────────────────────────────────────────────────┐\n│                    Visual Workflow Builder                          │\n├─────────────────────────────────────────────────────────────────────┤\n│                                                                      │\n│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐          │\n│  │ Trigger │───▶│  OCR    │───▶│  LLM    │───▶│ Action  │          │\n│  │ (Email) │    │ Extract │    │ Decide  │    │ Execute │          │\n│  └─────────┘    └─────────┘    └─────────┘    └─────────┘          │\n│                      │              │              │                 │\n│                      ▼              ▼              ▼                 │\n│               ┌──────────────────────────────────────┐              │\n│               │          Execution Engine            │              │\n│               │  • Parallel execution                │              │\n│               │  • Error handling                    │              │\n│               │  • State management                  │              │\n│               └──────────────────────────────────────┘              │\n│                                                                      │\n└─────────────────────────────────────────────────────────────────────┘\n```\n\nBenefits:\n\n- No-code workflow creation with visual builder\n- Combine any AI capability in a single flow\n- Built-in scheduling and monitoring\n\n---\n\n## ⚖️ Comparison with Alternatives\n\n### vs. Standalone Platforms\n\n| Aspect | ONE-DATA-STUDIO | Separate Tools |\n| -------- | ----------------- | ---------------- |\n| Data + ML + LLM | Single integrated platform | 3+ separate tools (Airflow + MLflow + LangChain) |\n| Data-to-Model Pipeline | Native integration | Manual data export/import |\n| Model-to-App Pipeline | OpenAI-compatible API | Custom integration code |\n| Unified Governance | Single audit trail | Scattered logs |\n| Learning Curve | One platform to learn | Multiple tools to master |\n| Deployment | Single Helm chart | Multiple deployments |\n| Cost | Single infrastructure | Multiple infrastructures |\n\n### vs. Cloud Platforms\n\n| Aspect | ONE-DATA-STUDIO | Cloud Platforms (Databricks, SageMaker, Vertex AI) |\n| -------- | ----------------- | --------------------------------------------------- |\n| Deployment | On-premises, any cloud, hybrid | Vendor-locked cloud |\n| Data Privacy | Data stays on-premises | Data in vendor's cloud |\n| Pricing | Open source (free) | Usage-based (expensive at scale) |\n| Customization | Full source code access | Limited customization |\n| LLM Integration | Built-in LLMOps layer | Separate LLM tools needed |\n| Vendor Lock-in | None | High |\n\n### vs. Other Open Source Platforms\n\n| Feature | ONE-DATA-STUDIO | LangChain | MLflow | Apache Airflow |\n| --------- | ----------------- | ----------- | -------- | ---------------- |\n| Data Integration | ✅ Full | ❌ No | ❌ No | ✅ Basic |\n| ETL Pipelines | ✅ Visual | ❌ No | ❌ No | ✅ Code-based |\n| Feature Store | ✅ Built-in | ❌ No | ❌ No | ❌ No |\n| Vector Storage | ✅ Milvus | ✅ Integration | ❌ No | ❌ No |\n| Model Training | ✅ Distributed | ❌ No | ✅ Tracking only | ❌ No |\n| Model Serving | ✅ vLLM | ❌ No | ✅ Basic | ❌ No |\n| RAG Pipeline | ✅ Full | ✅ Full | ❌ No | ❌ No |\n| Agent Framework | ✅ Built-in | ✅ Primary | ❌ No | ❌ No |\n| Visual Workflow | ✅ ReactFlow | ❌ No | ❌ No | ❌ Code-based |\n| Web UI | ✅ Full | ❌ No | ✅ Tracking UI | ✅ DAG UI |\n| Multi-tenancy | ✅ Full | ❌ No | ❌ No | ❌ Limited |\n| Enterprise Auth | ✅ Keycloak | ❌ No | ❌ No | ❌ Limited |\n\n### When to Choose ONE-DATA-STUDIO\n\n✅ Best fit for:\n\n- Enterprises needing complete data-to-application pipeline\n- Organizations requiring on-premises deployment\n- Teams wanting unified platform instead of tool sprawl\n- Companies with both structured data and document knowledge needs\n- Projects requiring full audit trail and governance\n\n❌ Consider alternatives if:\n\n- You only need a single capability (e.g., just MLflow for experiment tracking)\n- You're a cloud-first organization comfortable with vendor lock-in\n- You need minimal infrastructure and prefer SaaS solutions\n- Your team size is very small (\u003c 5 people) with simple needs\n\n---\n\n## Technical Specifications\n\n### Code Statistics\n\n| Component | Files | Lines of Code |\n| ----------- | ------- | --------------- |\n| Python Backend | 289 | ~142,000 |\n| TypeScript Frontend | 232 | ~120,000 |\n| Test Code | 135+ | ~32,000 |\n| Deployment Config | 155+ | ~15,000 |\n| Total | 630+ | ~300,000 |\n\n### Technology Stack\n\nFrontend:\n\n- React 18.3 with TypeScript 5.4\n- Ant Design 5.14 for UI components\n- ReactFlow 11.10 for workflow canvas\n- Zustand 4.5 for state management\n- React Query 5.24 for server state\n- Vite 5.1 for build tooling\n\nBackend:\n\n- Python 3.10+ runtime\n- Flask 3.0 for Data/Agent/Admin APIs\n- FastAPI for Model/Proxy APIs\n- SQLAlchemy 2.0 with Alembic migrations\n- Celery for background tasks\n\nStorage:\n\n- MySQL 8.0 for relational data\n- Redis 7.0 for caching and sessions\n- MinIO for S3-compatible object storage\n- Milvus 2.3 for vector embeddings\n- Elasticsearch 8.10 for search\n\nInfrastructure:\n\n- Kubernetes 1.27+ for orchestration\n- Helm 3.x for package management\n- Istio for service mesh\n- Keycloak 23.0 for identity management\n- Prometheus + Grafana for monitoring\n- Jaeger for distributed tracing\n\n### Security Features\n\n| Category | Features |\n| ---------- | ---------- |\n| Authentication | JWT tokens, Keycloak SSO, OIDC/SAML support |\n| Authorization | RBAC, fine-grained permissions, multi-tenant isolation |\n| Network | TLS/HTTPS, HSTS headers, CORS configuration |\n| Data | SQL injection protection, input sanitization, encryption at rest |\n| Audit | Comprehensive logging, searchable audit trail, compliance support |\n\n### Performance Characteristics\n\n| Metric | Value | Notes |\n| -------- | ------- | ------- |\n| API Response Time | \u003c 100ms (p95) | For metadata operations |\n| RAG Query Latency | \u003c 2s (p95) | Including retrieval and generation |\n| Vector Search | \u003c 50ms | For 10M+ vectors |\n| Concurrent Users | 1000+ | With proper resource allocation |\n| Model Inference | Depends on model | vLLM provides high throughput |\n\n---\n\n## Project Structure\n\n```\none-data-studio/\n├── services/                     # Backend microservices\n│   ├── data-api/                 # Data governance API (Flask)\n│   │   ├── app/\n│   │   │   ├── routes/           # API endpoints\n│   │   │   ├── services/         # Business logic\n│   │   │   ├── models/           # Database models\n│   │   │   └── schemas/          # Request/response schemas\n│   │   └── requirements.txt\n│   ├── agent-api/                # LLMOps orchestration API (Flask)\n│   │   ├── app/\n│   │   │   ├── routes/           # Workflow, RAG, Agent endpoints\n│   │   │   ├── services/         # Execution engine, RAG service\n│   │   │   ├── core/             # LLM clients, embeddings\n│   │   │   └── tools/            # Agent tools\n│   │   └── requirements.txt\n│   ├── model-api/                # MLOps management API (FastAPI)\n│   │   ├── app/\n│   │   │   ├── routers/          # Model, training, serving endpoints\n│   │   │   ├── services/         # K8s integration, job management\n│   │   │   └── schemas/          # Pydantic schemas\n│   │   └── requirements.txt\n│   ├── openai-proxy/             # OpenAI-compatible proxy (FastAPI)\n│   │   ├── app/\n│   │   │   ├── routers/          # Chat, completions, embeddings\n│   │   │   ├── services/         # Model routing, rate limiting\n│   │   │   └── middleware/       # Token counting, cost tracking\n│   │   └── requirements.txt\n│   ├── admin-api/                # Platform administration (Flask)\n│   ├── ocr-service/              # Document recognition (FastAPI)\n│   ├── behavior-service/         # User analytics (Flask)\n│   └── shared/                   # Shared modules\n│       ├── auth/                 # JWT, permissions\n│       ├── storage/              # MinIO, file handling\n│       ├── cache/                # Redis utilities\n│       └── utils/                # Common utilities\n├── web/                          # Frontend application\n│   ├── src/\n│   │   ├── components/           # Reusable UI components\n│   │   │   ├── common/           # Buttons, inputs, modals\n│   │   │   ├── workflow/         # ReactFlow nodes and edges\n│   │   │   └── charts/           # Data visualization\n│   │   ├── pages/                # Page components\n│   │   │   ├── data/             # Data platform pages\n│   │   │   ├── model/            # Model platform pages\n│   │   │   ├── agent/            # Agent platform pages\n│   │   │   └── admin/            # Admin pages\n│   │   ├── services/             # API clients\n│   │   ├── stores/               # Zustand state stores\n│   │   ├── hooks/                # Custom React hooks\n│   │   ├── utils/                # Utility functions\n│   │   └── locales/              # i18n translations (en, zh)\n│   ├── public/                   # Static assets\n│   └── package.json\n├── deploy/                       # Deployment configurations\n│   ├── local/                    # Docker Compose\n│   │   ├── docker-compose.yml    # Main compose file\n│   │   └── docker-compose.*.yml  # Service overlays\n│   ├── kubernetes/               # Kubernetes manifests\n│   │   ├── base/                 # Kustomize base\n│   │   └── overlays/             # dev, staging, production\n│   ├── helm/                     # Helm charts\n│   │   └── charts/one-data/      # Main chart\n│   ├── dockerfiles/              # Dockerfile for each service\n│   ├── argocd/                   # ArgoCD applications\n│   └── monitoring/               # Prometheus, Grafana configs\n├── tests/                        # Test suites\n│   ├── unit/                     # Unit tests by service\n│   ├── integration/              # API integration tests\n│   ├── e2e/                      # Playwright end-to-end tests\n│   └── performance/              # Load testing scripts\n├── docs/                         # Documentation\n│   ├── 01-architecture/          # Architecture docs\n│   ├── 02-integration/           # Integration guides\n│   ├── 06-development/           # Development guides\n│   ├── 07-operations/            # Operations guides\n│   └── 08-user-guide/            # User documentation\n└── examples/                     # Usage examples\n    ├── langchain/                # LangChain integration\n    ├── python/                   # Python SDK examples\n    └── workflows/                # Workflow definitions\n```\n\n---\n\n## Documentation\n\n| Document | Description |\n| ---------- | ------------- |\n| [Platform Overview](docs/01-architecture/platform-overview.md) | High-level architecture and concepts |\n| [Four-Layer Stack](docs/01-architecture/four-layer-stack.md) | Detailed layer descriptions |\n| [Integration Guide](docs/02-integration/integration-overview.md) | How layers connect |\n| [API Specifications](docs/02-integration/api-specifications.md) | REST API documentation |\n| [Development Guide](docs/06-development/poc-playbook.md) | Local development setup |\n| [Operations Guide](docs/07-operations/operations-guide.md) | Production deployment |\n| [User Guide](docs/08-user-guide/getting-started.md) | End-user documentation |\n\n---\n\n## Contributing\n\nWe welcome contributions from the community!\n\n### Development Setup\n\n```bash\n# Backend development\ncd services/agent-api\npython -m venv venv\nsource venv/bin/activate  # Windows: venv\\Scripts\\activate\npip install -r requirements.txt\npip install -r requirements-dev.txt\npython app.py\n\n# Frontend development\ncd web\nnpm install\nnpm run dev\n```\n\n### Code Standards\n\n| Language | Standards |\n| ---------- | ----------- |\n| Python | PEP 8, use `logging` (not `print`), type hints |\n| TypeScript | ESLint + Prettier, avoid `console.log` |\n| Git | Conventional commits, small atomic changes |\n\n### Testing\n\n```bash\n# Run Python tests\npytest tests/ -v\n\n# Run with coverage\npytest tests/ --cov=services/ --cov-report=html\n\n# Run frontend tests\ncd web \u0026\u0026 npm test\n\n# Run E2E tests\ncd tests/e2e \u0026\u0026 npx playwright test\n```\n\n### Pull Request Process\n\n1. Fork the repository\n2. Create a feature branch: `git checkout -b feature/amazing-feature`\n3. Make changes and add tests\n4. Ensure tests pass: `pytest tests/ \u0026\u0026 cd web \u0026\u0026 npm test`\n5. Commit with clear message: `git commit -m 'feat: add amazing feature'`\n6. Push and create Pull Request\n\n---\n\n## License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n```\nCopyright 2024-2026 ONE-DATA-STUDIO Contributors\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n```\n\n---\n\n## Acknowledgments\n\nBuilt with and inspired by:\n\n- [OpenMetadata](https://open-metadata.org/) - Open source metadata platform\n- [Ray](https://github.com/ray-project/ray) - Distributed computing framework\n- [vLLM](https://github.com/vllm-project/vllm) - High-throughput LLM serving\n- [LangChain](https://github.com/langchain-ai/langchain) - LLM application framework\n- [Milvus](https://github.com/milvus-io/milvus) - Vector database\n- [ReactFlow](https://reactflow.dev/) - Node-based graph editor\n\n---\n\n## Community\n\n- Issues: [GitHub Issues](https://github.com/iannil/one-data-studio/issues)\n- Discussions: [GitHub Discussions](https://github.com/iannil/one-data-studio/discussions)\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\nBuilt with ❤️ by the ONE-DATA-STUDIO Community\n\nIf you find this project useful, please consider giving it a ⭐!\n\n[Back to Top](#one-data-studio)\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiannil%2Fone-data-studio","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiannil%2Fone-data-studio","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiannil%2Fone-data-studio/lists"}