{"id":18258819,"url":"https://github.com/datavane/datavines","last_synced_at":"2025-04-09T05:08:41.989Z","repository":{"id":36994548,"uuid":"477096938","full_name":"datavane/datavines","owner":"datavane","description":"Know your data better！Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.","archived":false,"fork":false,"pushed_at":"2024-09-16T02:16:47.000Z","size":22844,"stargazers_count":429,"open_issues_count":49,"forks_count":143,"subscribers_count":11,"default_branch":"dev","last_synced_at":"2024-09-16T03:28:49.592Z","etag":null,"topics":["dataobservability","dataprofile","dataquality","datascience","doris","metadata","spark"],"latest_commit_sha":null,"homepage":"https://datavane.github.io/datavines-website/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datavane.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-02T15:37:20.000Z","updated_at":"2024-09-16T02:16:51.000Z","dependencies_parsed_at":"2024-03-02T04:20:30.358Z","dependency_job_id":"866df934-f15f-46cd-b332-2903cffe57c2","html_url":"https://github.com/datavane/datavines","commit_stats":null,"previous_names":["datavane/datavines"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datavane%2Fdatavines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datavane%2Fdatavines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datavane%2Fdatavines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datavane%2Fdatavines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datavane","download_url":"https://codeload.github.com/datavane/datavines/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247980837,"owners_count":21027808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataobservability","dataprofile","dataquality","datascience","doris","metadata","spark"],"created_at":"2024-11-05T10:34:49.860Z","updated_at":"2025-04-09T05:08:41.971Z","avatar_url":"https://github.com/datavane.png","language":"Java","funding_links":[],"categories":["数据科学"],"sub_categories":[],"readme":"\u003c!--\n  ~ Licensed to the Apache Software Foundation (ASF) under one or more\n  ~ contributor license agreements.  See the NOTICE file distributed with\n  ~ this work for additional information regarding copyright ownership.\n  ~ The ASF licenses this file to You under the Apache License, Version 2.0\n  ~ (the \"License\"); you may not use this file except in compliance with\n  ~ the License.  You may obtain a copy of the License at\n  ~\n  ~    http://www.apache.org/licenses/LICENSE-2.0\n  ~\n  ~ Unless required by applicable law or agreed to in writing, software\n  ~ distributed under the License is distributed on an \"AS IS\" BASIS,\n  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n  ~ See the License for the specific language governing permissions and\n  ~ limitations under the License.\n  ~\n  --\u003e\n\n# Datavines\n[![EN doc](https://img.shields.io/badge/document-English-blue.svg)](README.md)\n[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](README.zh-CN.md)\n---\n\nData quality is used to ensure the accuracy of data in the process of integration and processing. It is also the core component of DataOps. DataVines is an easy-to-use data quality service platform that supports multiple metric.\n\n## Architecture Design\n![DataVinesArchitecture](docs/img/architecture.jpg)\n\n## Install\n\nNeed: Maven 3.6.1 and later\n```sh\n$ mvn clean package -Prelease -DskipTests\n```\n## Features\n\n### Data Catalog\n\n- Obtain **data source metadata** regularly to construct data directory \n- Regular monitoring of **metadata changes**\n- **Tag management** with support for metadata\n\n![Data Catalog](docs/img/data-catalog.jpg)\n\n### Data Quality\n\n- Built-in **27** data quality check rules\n- Support **4** data quality check rule types\n    - Single Table-Column Check\n    - Single Table Custom `SQL` check \n    - Cross Table Accuracy Check\n    - Two Table Value Comparison Check\n- Support schedule tasks for check\n- Support `SLA` for **check result alert**\n\n![Data Quality](docs/img/data-quality.jpg)\n\n### Data Profile\n\n- Support timing execution of data detection, output **data profile report**\n- Support **automatically identify** column types to automatically match appropriate data profile indicators\n- Support **table row number trend** monitoring\n- Support **data distribution** view\n\n![数据目录](docs/img/data-profile.jpg)\n\n### Plug-in Design\n\nThe platform is based on plug-in design, and the following modules support user-defined plug-ins to expand\n\n- **Data Source**: `MySQL`, `Impala`, `StarRocks`, `Doris`, `Presto`, `Trino`, `ClickHouse`, `PostgreSQL` are already supported\n- **Check Rules**: 27 check rules such as built-in null value check, non-null check, enumeration check, etc.\n- **Job Execution Engine**: Two execution engines `Spark` and `Local` have been supported. The `Spark` engine currently only supports the `Spark2.4` version, and the `Local` engine is a local execution engine developed based on `JDBC`, without relying on other execution engines.\n- **Alert Channel**: Supported **Email**\n- **Error Data Storage**: `MySQL` and **local files** are already supported (only `Local` execution engine is supported)\n- **Registry**: Already supports `MySQL`, `PostgreSQL` and `ZooKeeper`\n\n### Multiple Execute Modes\n\n- Provide **Web page** to configure check jobs, run jobs, view job execution logs, view error data and check results\n- Support **online generation** job running scripts, submit jobs through `datavines-submit.sh`, can be used in conjunction with the scheduling system\n\n![作业脚本](docs/img/data-job-script.jpg)\n\n### Easy Deployment \u0026 High Availability\n\n- Less platform dependency, easy to deploy\n- Minimal only rely on `MySQL` to start the project and complete the check of data quality operations\n- Support horizontal expansion, automatic fault tolerance\n- **Decentralized design**, `Server` node supports horizontal expansion to improve performance\n- Job **Automatic Fault Tolerance**, to ensure that jobs are not lost or repeated\n\n## Environmental Dependency\n\n1. java runtime environment: jdk8\n2. If the data volume is small, or the goal is merely for functional verification, you can use JDBC engine\n3. If you want to run DataVines based on Spark, you need to ensure that your server has spark installed\n## Quick Start\nClick [Document](https://datavane.github.io/datavines-website/docs/user-guide/quick-start) for more information\n\n## Development\n\nClick [Document](https://datavane.github.io/datavines-website/docs/development/environment-preparation) for more information\n\n## Contribution\n\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](https://github.com/datavane/datavines/pulls)\n\nYou can submit any ideas as [pull requests](https://github.com/datavane/datavines/pulls) or as [GitHub issues](https://github.com/datavane/datavines/issues/new/choose).\n\n\u003e If you're new to posting issues, we ask that you read [*How To Ask Questions The Smart Way*](http://www.catb.org/~esr/faqs/smart-questions.html) (**This guide does not provide actual support services for this project!**), [How to Report Bugs Effectively](http://www.chiark.greenend.org.uk/~sgtatham/bugs.html) prior to posting. Well written bug reports help us help you!\n\nThank you to all the people who already contributed to Datavines!\n\n[![contrib graph](https://contrib.rocks/image?repo=datavane/datavines)](https://github.com/datavane/datavines/graphs/contributors)\n\n## License\n\nDatavines is licensed under the [Apache License 2.0](LICENSE). Datavines relies on some third-party components, and their open source protocols are also Apache License 2.0 or compatible with Apache License 2.0. In addition, Datavines also directly references or modifies some codes in Apache DolphinScheduler, SeaTunnel and Dubbo, all of which are Apache License 2.0. Thanks for contributions to these projects.\n\n## Social Media\n\n- WeChat Official Account (in Chinese, scan the QR code to follow)\n\n![wechat-qrcode](docs/img/wechat-qrcode-en.jpg)\n\n## Contact Author\n\n- Notes \"Datavines\" When Adding Me On WeChat\n \n![wechat-author-qrcode](docs/img/wechat-author-qrcode.jpg)\n\n## Donation\n\n![wechat-donation-qrcode](docs/img/wechat-donation-qrcode.jpg)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatavane%2Fdatavines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatavane%2Fdatavines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatavane%2Fdatavines/lists"}