An open API service indexing awesome lists of open source software.

https://github.com/opendata-lab/opendataworks

opendataworks 是一个面向大数据平台的统一数据门户系统,基于DolphinScheduler、Doris等开源项目,旨在为企业提供一站式的数据资产管理、任务调度编排和血缘关系追踪解决方案。
https://github.com/opendata-lab/opendataworks

data-engineering datax dolphinscheduler doris

Last synced: 2 days ago
JSON representation

opendataworks 是一个面向大数据平台的统一数据门户系统,基于DolphinScheduler、Doris等开源项目,旨在为企业提供一站式的数据资产管理、任务调度编排和血缘关系追踪解决方案。

Awesome Lists containing this project

README

          

# OpenDataWorks




OpenDataWorks icon


Stars Badge
Forks Badge
Pull Requests Badge
Issues Badge
License Badge
Downloads
Ask DeepWiki
Slack Community

**A unified data portal for workflow orchestration, intelligent query, and data lineage visualization.**

English | [简体中文](README_zh-CN.md)

[Website](https://opendataworks.vercel.app/) · [Quick Start](https://opendataworks.vercel.app/guide/quick-start.html) · [Features](https://opendataworks.vercel.app/guide/features.html) · [Architecture](https://opendataworks.vercel.app/architecture/overview.html) · [Configuration](https://opendataworks.vercel.app/guide/configuration.html) · [Contributing](https://opendataworks.vercel.app/guide/contribution.html) · [Slack](https://opendataworkshq.slack.com/)

---

## Overview

OpenDataWorks is an open-source data platform portal for teams that need one place to manage metadata, orchestrate data workflows, analyze lineage, and ask data questions with natural language.

It brings the core pieces of a modern data platform into a deployable full-stack application: a Java backend, a Vue frontend, a Python DataAgent service for intelligent query, and Docker Compose assets for local and production environments.

## Why OpenDataWorks

- **Unified data asset management**: organize table metadata, data domains, business domains, and layered data models.
- **Workflow orchestration**: configure batch and streaming jobs visually, with deep DolphinScheduler integration.
- **Lineage analysis**: parse SQL lineage automatically and explore upstream/downstream relationships in an interactive graph.
- **Intelligent query**: use natural language to generate SQL, execute analysis, and review results from the main portal.
- **Ready to deploy**: run the frontend, backend, DataAgent backend, Redis, MySQL, and Portal MCP from the provided Docker Compose setup.

## Feature Highlights

- Metadata management for ODS, DWD, DIM, DWS, and ADS layers
- Workflow authoring, publishing, scheduling, and execution monitoring
- SQL and Shell task support
- Data lineage visualization with ECharts
- Data Studio with catalog browsing, SQL editing, and table-level metadata context
- Built-in NL2SQL intelligent-query entrypoint
- Runtime logs, execution history, and operational statistics

## Demo

[https://opendataworks-demo.vercel.app](https://opendataworks-demo.vercel.app)

## Screenshots

### Workflow Orchestration

![OpenDataWorks workflow orchestration screen](website/public/readme-workflows.png)

Manage workflow lists, publishing status, and common workflow actions.

### Data Lineage

![OpenDataWorks data lineage screen](website/public/readme-lineage.png)

Explore upstream and downstream table relationships around a selected table.

### Data Studio

![OpenDataWorks Data Studio screen](website/public/readme-datastudio.png)

Browse catalogs, write SQL, and inspect table metadata in one workspace.

## Docker Deployment

### Start the Development Environment

Use the development Docker Compose profile to start the frontend, backend, DataAgent backend, Redis, MySQL, and Portal MCP together:

```bash
# 1. Prepare configuration
cp deploy/.env.example deploy/.env

# 2. Pull the latest images
docker compose -f deploy/docker-compose.dev.yml pull

# 3. Start services
docker compose -f deploy/docker-compose.dev.yml up -d

# Access points
# Frontend: http://localhost:8081
# Backend: http://localhost:8080/api
# DataAgent Backend: http://localhost:8900
# Portal MCP: http://localhost:8801/mcp
```

### Production and Offline Deployment

See the [deployment guide](deploy/README.md) for production deployment and offline package instructions.

## Quick Start

Follow the [quick start guide](https://opendataworks.vercel.app/guide/quick-start.html) to deploy and run OpenDataWorks locally.

## Documentation

Full documentation is available at: **https://opendataworks.vercel.app/**

- [Quick Start](https://opendataworks.vercel.app/guide/quick-start.html)
- [Architecture](https://opendataworks.vercel.app/architecture/overview.html)
- [Configuration](https://opendataworks.vercel.app/guide/configuration.html)
- [FAQ](https://opendataworks.vercel.app/guide/faq.html)

## Community

- Join the [OpenDataWorks Slack community](https://opendataworkshq.slack.com/) to discuss usage, deployment, roadmap ideas, and contributions.
- Open a [GitHub Issue](https://github.com/opendata-lab/opendataworks/issues) for bugs, feature requests, or documentation feedback.

## Contributing

Contributions are welcome. Please read the [contribution guide](https://opendataworks.vercel.app/guide/contribution.html) before opening a pull request.

## License

OpenDataWorks is licensed under the [GNU General Public License v3.0 only](LICENSE).