{"id":14982286,"url":"https://github.com/hexnn/stark","last_synced_at":"2026-02-22T18:03:22.612Z","repository":{"id":255137913,"uuid":"848593705","full_name":"hexnn/Stark","owner":"hexnn","description":"基于Spark+SparkMLlib+Debezium+Deequ打造的简单易用、超高性能大数据治理引擎。适用于批流一体的数据集成和数据分析，支持CDC实时数据采集、机器学习算法模型、数据质量校验、数据标注、敏感数据识别、数据建模、算法建模和OLAP数据分析","archived":false,"fork":false,"pushed_at":"2025-08-18T10:02:30.000Z","size":7642,"stargazers_count":36,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-18T11:41:53.087Z","etag":null,"topics":["cdc","dataworks","datax","debezium","deequ","etl","flink","hadoop","kettle","mllib","seatunnel","spark","sparkml","sparkmllib"],"latest_commit_sha":null,"homepage":"https://github.com/hexnn/Stark","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hexnn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-28T03:39:59.000Z","updated_at":"2025-08-18T10:02:33.000Z","dependencies_parsed_at":"2024-10-11T23:43:06.887Z","dependency_job_id":"d03a889a-0943-40e2-b4fe-77c96ea9962d","html_url":"https://github.com/hexnn/Stark","commit_stats":{"total_commits":18,"total_committers":1,"mean_commits":18.0,"dds":0.0,"last_synced_commit":"5500c5b17df468d6871494ef873f8a3388ddac5c"},"previous_names":["hexnn/stark"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/hexnn/Stark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hexnn%2FStark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hexnn%2FStark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hexnn%2FStark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hexnn%2FStark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hexnn","download_url":"https://codeload.github.com/hexnn/Stark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hexnn%2FStark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29721057,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-22T15:10:41.462Z","status":"ssl_error","status_checked_at":"2026-02-22T15:10:04.636Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdc","dataworks","datax","debezium","deequ","etl","flink","hadoop","kettle","mllib","seatunnel","spark","sparkml","sparkmllib"],"created_at":"2024-09-24T14:05:04.694Z","updated_at":"2026-02-22T18:03:22.599Z","avatar_url":"https://github.com/hexnn.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Stark一站式大数据治理引擎\n## 基于Spark内核的简单易用、超高性能批流一体数据集成和数据分析引擎\n* 【开箱即用】零编码，基于规则文件即可完成一站式数据采集、数据建模、算法建模和数据分析等任务\n* 【来源丰富】支持关系数据库、NoSQL数据仓库、图数据库、文件等几十种数据源，满足各种数据对接需求\n* 【批流一体】支持离线、实时、离线实时混合三种读写模式，离线和实时数据全流程打通\n* 【变化捕获】支持CDC变化数据捕获，实时监测数据变化并更新到目标，支持离线与实时数据多源异构融合\n* 【信创支持】支持基于X86及ARM架构的国产CPU及操作系统部署，支持达梦、人大金仓等国产信创数据源\n* 【机器学习】支持几十种机器学习算法，与各种数据源全流程融合，基于简单配置即可实现复杂的算法模型分析\n* 【算法调优】支持机器学习算法参数调优，支持超参动态优化，算法训练及预测过程可观测，让数据更智能\n* 【质量校验】内置50+数据质量校验规则，支持对各种离线和实时数据进行数据质量监测，同时生成校验报告\n* 【数据标注】支持自动化数据标注，可对100+敏感类型数据进行自动识别，用于数据安全管控和动态脱敏\n* 【过程观测】支持单步调试，可监测每一个环节的任务执行情况，了解数据处理过程，核查计算结果和数据质量\n* 【性能高效】支持跨数据源单表、多表、多表关联、整库数据采集和数据分析，可利用集群能力处理百亿级数据\n* 【易于扩展】基于Stark规则引擎界面化封装，可在极短时间实现整套数据采集、数据建模和数据分析等中台产品\n\n## 引擎技术架构\n![技术架构](技术架构.png)\n\n## 数据源读写特性（持续扩展中）\n|类型         |数据源        |批模式(读)|批模式(写) |流模式(读)|流模式(写) |CDC(读) |CDC(写) |\n|:------------|:------------|:--------:|:--------:|:--------:|:--------:|:------:|:------:|\n|关系型数据库  |MySQL        |√         |√         |√         |√         |增,删,改|增,删,改|\n|\t\t\t  |MariaDB      |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |Oracle       |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |PostgreSQL   |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |SQLServer    |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |DB2          |√         |√         |√         |√         |增,删,改|增,删,改|\n|NoSQL数据库  |HBase\t        |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |Phoenix      |√         |√         |          |√         |        |增,删,改|\n|             |Cassandra    |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |MongoDB      |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |Redis        |√         |√         |√         |√         |增      |增,删,改|\n|             |Elasticsearch|√         |√         |√         |√         |增      |增,删,改|\n|MPP架构数据库 |Impala       |√         |√         |          |√         |        |增      |\n|             |StarRocks    |√         |√         |          |√         |        |增,删,改|\n|             |Doris        |√         |√         |          |√         |        |增,删,改|\n|             |ClickHouse   |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |Greenplum    |√         |√         |√         |√         |增,删,改|增,删,改|\n|数据仓库      |Hive         |√         |√         |          |√         |        |增      |\n|数据湖       |Iceberg       |√         |√         |√         |√         |增      |增,删,改 |\n|             |Hudi         |√         |√         |√          |√         |增,删,改|增,删,改 |\n|             |DeltaLake    |√         |√         |√          |√         |增,删,改|增,删,改 |\n|             |Paimon       |√         |√         |√          |√         |增,删,改|增,删,改 |\n|消息中间件    |Kafka        |√         |√         |√         |√         |增      |增      |\n|图数据库      |Neo4j        |√         |√         |√         |√         |增      |增,删,改|\n|空间数据库    |PostGIS      |√         |√         |√         |√         |增,删,改|增,删,改|\n|文件数据源    |Text         |√         |√         |√         |√         |增      |增      |\n|             |CSV          |√         |√         |√         |√         |增      |增      |\n|             |Excel        |√         |√         |√         |√         |增      |增      |\n|             |JSON         |√         |√         |√         |√         |增      |增      |\n|             |XML          |√         |√         |√         |√         |增      |增      |\n|             |ORC          |√         |√         |√         |√         |增      |增      |\n|             |Parquet      |√         |√         |√         |√         |增      |增      |\n|             |Avro         |√         |√         |√         |√         |增      |增      |\n|信创数据源    |OceanBase    |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |GaussDB      |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |达梦数据库    |√         |√         |√         |√         |增,删,改|增,删,改|\n|             |人大金仓     |√         |√         |√         |√         |增,删,改|增,删,改|\n|嵌入式数据库  |SQLite       |√         |√         |√         |√         |增,删,改|增,删,改|\n|云原生数据库  |Snowflake    |√         |√         |          |√         |        |增,删,改|\n\n## 机器学习算法特性（持续扩展中）\n|算法类型      |算法名称                        |算法简称     |算法描述           |应用场景               |\n|:-----------:|:------------------------------:|:----------:|:------------------|:---------------------|\n|统计算法      |Correlation                    |CORR        |相关性检测          |预处理,数据探索,特征选择 |\n|             |ChiSquareTest                  |CST         |卡方校验            |预处理,数据探索,特征选择 |\n|             |Summarizer                     |SUMMARY     |汇总器              |预处理,数据探索,特征选择 |\n|分类算法      |DecisionTreeClassifier         |DTC         |决策树分类          |二分类,多分类           |\n|             |FMClassifier                   |FMC         |因子分解机分类       |二分类                 |\n|             |GBTClassifier                  |GBTC        |梯度提升树分类       |二分类                 |\n|             |LogisticRegression             |LRC         |逻辑回归分类         |二分类                 |\n|             |MultilayerPerceptronClassifier |MLPC        |多层感知器分类       |二分类,多分类           |\n|             |NaiveBayes                     |NBC         |朴素贝叶斯分类       |二分类,多分类           |\n|             |RandomForestClassifier         |RFC         |随机森林分类         |二分类,多分类          |\n|             |LinearSVC                      |SVC         |线性SVM分类         |二分类                 |\n|回归算法      |AFTSurvivalRegression          |AFTSR       |加速失效时间模型回归 |数据预测               |\n|             |DecisionTreeRegressor          |DTR         |决策树回归          |数据预测               |\n|             |FMRegressor                    |FMR         |因子分解机回归       |数据预测               |\n|             |GBTRegressor                   |GBTR        |梯度提升树回归       |数据预测               |\n|             |GeneralizedLinearRegression    |GLMR        |广义线性模型回归     |数据预测               |\n|             |IsotonicRegression             |IR          |保序回归            |数据预测               |\n|             |LinearRegression               |LR          |线性回归            |数据预测               |\n|             |RandomForestRegressor          |RFR         |随机森林回归         |数据预测              |\n|聚类算法      |KMeans                         |KMEANS      |K均值聚类           |聚类                  |\n|             |GaussianMixture                |GM          |高斯混合模型         |聚类                  |\n|             |LDA                            |LDA         |潜在狄利克雷分配     |聚类                  |\n|推荐算法      |AlternatingLeastSquares        |ALS         |交替最小二乘法       |数据推荐              | \n\n## 数据质量校验特性（持续扩展中）\n|质量维度                    |规则参数                 |规则描述                                |使用示例     \t\t\t                                    |\n|:--------------------------:|:-----------------------:|:--------------------------------------:|:---------------------------------------------------------:|\n|完整性(checkCompleteness)   |isComplete               |校验单个字段非空                        |\"isComplete\": [\"id\"]                                       |\n|                            |areComplete              |校验组合字段全部非空                    |\"areComplete\": [\"id,name\"]                                 |\n|                            |areAnyComplete           |校验组合字段任一非空                    |\"areAnyComplete\": [\"age,birthday\"]                         |\n|                            |hasCompleteness          |校验单个字段非空比例                    |\"hasCompleteness\": [\"age:ratio\u003e0.2\"]                       |\n|                            |haveCompleteness         |校验组合字段全部非空比例                |\"haveCompleteness\": [\"name,age:ratio\u003e0.2\"]                 |\n|                            |haveAnyCompleteness      |校验组合字段任一非空比例                |\"haveAnyCompleteness\": [\"name,age:ratio\u003e0.2\"]              |\n|唯一性(checkUniqueness)     |isUnique                 |校验字段唯一性                          |\"isUnique\": [\"id\"]                                         |\n|                            |isPrimaryKey             |校验字段是否为主键                      |\"isPrimaryKey\": [\"id,name\"]                                |\n|                            |hasUniqueness            |校验单个字段唯一性比例                  |\"hasUniqueness\": [\"id:ratio==1\"]                           |\n|                            |haveUniqueness           |校验组合字段唯一性比例                  |\"haveUniqueness\": [\"name,age:ratio\u003e0.5\"]                   |\n|                            |hasDistinctness          |校验单个字段去重性比例                  |\"hasDistinctness\": [\"id:ratio==1\"]                         |\n|                            |haveDistinctness         |校验组合字段去重性比例                  |\"haveDistinctness\": [\"name,age:ratio\u003e0.5\"]                 |\n|                            |hasUniqueValueRatio      |校验单个字段唯一性比例                  |\"hasUniqueValueRatio\": [\"id:ratio==1\"]                     |\n|                            |haveUniqueValueRatio     |校验组合字段唯一性比例                  |\"haveUniqueValueRatio\": [\"name,age:ratio\u003e0.5\"]             |\n|                            |hasNumberOfDistinctValues|校验单个字段去重后的个数                |\"hasNumberOfDistinctValues\": [\"name:number\u003e0 \u0026\u0026 number\u003c10\"]|\n|准确性(checkAccuracy)       |hasSize                  |校验数据行数                            |\"hasSize\": [\"size\u003e0 \u0026\u0026 size\u003c100\"]                          |\n|                            |hasColumnCount           |校验字段个数                            |\"hasColumnCount\": [\"count\u003e0 \u0026\u0026 count\u003c10\"]                  |\n|                            |hasMin                   |校验数值型字段的最小值                  |\"hasMin\": [\"age:min\u003e=18 \u0026\u0026 min\u003c=35\"]                       |\n|                            |hasMax                   |校验数值型字段的最大值                  |\"hasMax\": [\"age:max\u003e=60 \u0026\u0026 max\u003c120\"]                       |\n|                            |hasMean                  |校验数值型字段的平均值                  |\"hasMean\": [\"age:mean\u003e=18 \u0026\u0026 mean\u003c=35\"]                    |\n|                            |hasSum                   |校验数值型字段的汇总值                  |\"hasSum\": [\"salary:sum\u003e0 \u0026\u0026 sum\u003c100000\"]                   |\n|                            |isNonNegative            |校验数值型字段为非负数                  |\"isNonNegative\": [\"age\"]                                   |\n|                            |isPositive               |校验数值型字段为非0正数                 |\"isPositive\": [\"age\"]                                      |\n|                            |hasMinLength             |校验文本型字段的最小长度                |\"hasMinLength\": [\"name:length==2\"]                         |\n|                            |hasMaxLength             |校验文本型字段的最大长度                |\"hasMaxLength\": [\"name:length==4\"]                         |\n|有效性(checkEffectiveness)  |containsFullName         |校验字段为姓名的比例                    |\"containsFullName\": [\"fullName:ratio\u003e0.8\"]                 |\n|                            |containsGender           |校验字段为性别的比例                    |\"containsGender\": [\"gender:ratio\u003e0.8\"]                     |\n|                            |containsIdCard           |校验字段为身份证号的比例                |\"containsIdCard\": [\"idCard:ratio\u003e0.8\"]                     |\n|                            |containsMobilePhone      |校验字段为手机号码的比例                |\"containsMobilePhone\": [\"mobile:ratio\u003e0.8\"]                |\n|                            |containsTelePhone        |校验字段为电话号码的比例                |\"containsTelePhone\": [\"tele:ratio\u003e0.8\"]                    |\n|                            |containsEmail            |校验字段为邮箱账号的比例                |\"containsEmail\": [\"email:ratio\u003e0.8\"]                       |\n|                            |containsBankCard         |校验字段为银行卡号的比例                |\"containsBankCard\": [\"bankCard:ratio\u003e0.8\"]                 |\n|                            |containsAddress          |校验字段为地址的比例                    |\"containsAddress\": [\"address:ratio\u003e0.8\"]                   |\n|                            |containsLongitude        |校验字段为经度的比例                    |\"containsLongitude\": [\"longitude:ratio\u003e0.8\"]               |\n|                            |containsLatitude         |校验字段为纬度的比例                    |\"containsLatitude\": [\"latitude:ratio\u003e0.8\"]                 |\n|                            |containsCarNumber        |校验字段为车牌号的比例                  |\"containsCarNumber\": [\"carNumber:ratio\u003e0.8\"]               |\n|                            |containsURL              |校验字段为URL地址的比例                 |\"containsURL\": [\"url:ratio\u003e0.8\"]                           |\n|                            |containsIP               |校验字段为IP地址的比例                  |\"containsIP\": [\"ip:ratio\u003e0.8\"]                             |\n|                            |containsPort             |校验字段为端口号的比例                  |\"containsPort\": [\"port:ratio\u003e0.8\"]                         |\n|                            |hasDataType              |校验字段是否符合指定数据类型            |\"hasDataType\": [\"id:Numeric\"]                              |\n|                            |hasPattern               |校验字段是否符合指定的正则表达          |\"hasPattern\": [\"mobile:^1[3-9]\\d{9}$\"]                     |\n|一致性(checkConsistency)    |isLessThan               |校验每行数据的字段c1值小于字段c2值      |\"isLessThan\": [\"c1,c2\"]                                    |\n|                            |isLessThanOrEqualTo      |校验每行数据的字段c1值小于或等于字段c2值|\"isLessThanOrEqualTo\": [\"c1,c2\"]                           |\n|                            |isGreaterThan            |校验每行数据的字段c1值大于字段c2值      |\"isGreaterThan\": [\"c1,c2\"]                                 |\n|                            |isGreaterThanOrEqualTo   |校验每行数据的字段c1值大于或等于字段c2值|\"isGreaterThanOrEqualTo\": [\"c1,c2\"]                        |\n|                            |isContainedIn            |校验字段值分布在一组固定值中            |\"isContainedIn\": [\"sex:男,女\"]                             |\n|                            |hasMutualInformation     |校验字段c1和字段c2的数据相互关系        |\"hasMutualInformation\": [\"city,address:ratio\u003e0.5\"]         |\n\n## 敏感类型识别特性（持续扩展中）\n|一级分类    |二级分类             |敏感类型标识                        |类型描述              |\n|:----------:|:-------------------:|:----------------------------------:|:--------------------:|\n|PEP(人)     |PEP_STD(基础信息)    |PEP_STD_FULLNAME                    |姓名                  |\n|            |                     |PEP_STD_GENDER                      |性别                  |\n|            |                     |PEP_STD_IDCARD                      |身份证号码            |\n|            |                     |PEP_STD_ETHNICITY                   |民族                  |\n|            |                     |PEP_STD_AGE                         |年龄                  |\n|            |                     |PEP_STD_HEIGHT                      |身高                  |\n|            |                     |PEP_STD_WEIGHT                      |体重                  |\n|            |                     |PEP_STD_BLOOD                       |血型                  |\n|            |                     |PEP_STD_MARITAL                     |婚姻状况              |\n|            |                     |PEP_STD_PASSPORT                    |护照号码              |\n|            |                     |PEP_STD_PERMANENT_RESIDENCE_PERMIT  |外国人永久居留证号码  |\n|            |                     |PEP_STD_OFFICER                     |军官证号码            |\n|            |PEP_CTC(联系方式)    |PEP_CTC_MOBILEPHONE                 |手机号码              |\n|            |                     |PEP_CTC_TELEPHONE                   |电话号码              |\n|            |                     |PEP_CTC_EMAIL                       |邮箱账号              |\n|            |                     |PEP_CTC_WECHAT                      |微信号                |\n|            |                     |PEP_CTC_QQ                          |QQ号                  |\n|            |PEP_EDU(教育信息)    |PEP_EDU_BACKGROUND                  |文化程度              |\n|            |                     |PEP_EDU_SCHOOL                      |毕业院校              |\n|            |                     |PEP_EDU_MAJOR                       |专业领域              |\n|            |PEP_JOB(职业信息)    |PEP_JOB_POSITION                    |职业                  |\n|            |PEP_FIN(财务信息)    |PEP_FIN_BANKCARD                    |银行卡号              |\n|LOC(地)     |LOC_STD(基础信息)    |LOC_STD_REGIONALISM_CODE            |行政区划代码          |\n|            |                     |LOC_STD_PROVINCE                    |省级                  |\n|            |                     |LOC_STD_CITY                        |地级                  |\n|            |                     |LOC_STD_COUNTY                      |县级                  |\n|            |                     |LOC_STD_TOWN                        |乡级                  |\n|            |LOC_POS(位置信息)    |LOC_POS_ADDRESS                     |地址                  |\n|            |                     |LOC_POS_POSTAL                      |邮政编码              |\n|            |                     |LOC_POS_LONGITUDE                   |经度                  |\n|            |                     |LOC_POS_LATITUDE                    |纬度                  |\n|OBJ(物)     |OBJ_CAR(汽车)        |OBJ_CAR_VIN                         |车架号                |\n|            |                     |OBJ_CAR_NUMBER                      |车牌号                |\n|            |OBJ_TRN(火车)        |OBJ_TRN_NUMBER                      |车次                  |\n|            |                     |OBJ_TRN_SEATS                       |席别                  |\n|NET(网)     |NET_STD(基础信息)    |NET_STD_URL                         |URL地址               |\n|            |                     |NET_STD_IP                          |IP地址                |\n|            |                     |NET_STD_PORT                        |端口号                |\n|            |                     |NET_STD_MASK                        |子网掩码              |\n|            |                     |NET_STD_MAC                         |MAC地址               |\n|            |                     |NET_STD_PROTOCOL                    |协议类型              |\n|            |NET_DEV(开发信息)    |NET_DEV_LINUX_PATH                  |Linux路径             |\n|            |                     |NET_DEV_WINDOWS_PATH                |Windows路径           |\n|            |                     |NET_DEV_MD5                         |MD5                   |\n|            |                     |NET_DEV_UUID                        |GUID/UUID             |\n|            |                     |NET_DEV_BASE64                      |Base64                |\n|ENV(情)     |ENV_WEA(天气信息)    |ENV_WEA_TEMPERATURE                 |温度                  |\n|            |ENV_STK(股市信息)    |ENV_STK_CODE                        |股票代码              |\n|ORG(组织)   |ORG_BIZ(工商信息)    |ORG_BIZ_NAME                        |企业名称              |\n|            |                     |ORG_BIZ_UNIFIED_SOCIAL_CREDIT_CODE  |统一社会信用代码      |\n|            |                     |ORG_BIZ_REGISTRATION_STATUS         |企业登记状态          |\n\n## 数据集成+数据开发+数据建模规则文件样例（开箱即用，批流一体）\n```\n{\n  \"env\": {\n    \"param\": \"hdfs://cluster/stark/params/test.json\",\n    \"udf\": [\n      {\n        \"name\": \"maps\",\n        \"class\": \"cn.hex.bricks.udf.Maps\",\n        \"jar\": \"hdfs://cluster/stark/udf/bricks.jar\",\n        \"temporary\": \"true\"\n      }\n    ]\n  },\n  \"source\": [\n    {\n      \"identifier\": \"ss001\",\n      \"name\": \"用户基础信息表(MYSQL存量数据)\",\n      \"type\": \"MYSQL\",\n      \"mode\": \"BATCH\",\n      \"connection\": {\n        \"url\": \"jdbc:mysql://127.0.0.1:3306/test\",\n        \"driver\": \"com.mysql.cj.jdbc.Driver\",\n        \"user\": \"root\",\n        \"password\": \"root\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"ss002\",\n      \"name\": \"用户详细信息表(HIVE存量数据)\",\n      \"type\": \"HIVE\",\n      \"mode\": \"BATCH\",\n      \"connection\": {\n        \"url\": \"thrift://127.0.0.1:9083\",\n        \"database\": \"test\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"ss003\",\n      \"name\": \"用户维度信息表(CSV实时数据)\",\n      \"type\": \"CSV\",\n      \"mode\": \"STREAM\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/test/\"\n      }\n    }\n  ],\n  \"transform\": [\n    {\n      \"identifier\": \"tf001\",\n      \"name\": \"根据CSV中的用户维度实时数据，对用户基本信息和详细信息进行关联合并\",\n      \"source\": [\n        \"ss001\",\n        \"ss002\",\n        \"ss003\"\n      ],\n      \"sql\": \"select ss001.*, ss002.detail as detail from ss001 inner join ss002 on ss001.id = ss002.id inner join ss003 on ss001.id = ss003.id\",\n      \"transout\": [\n        \"ts001\"\n      ]\n    }\n  ],\n  \"transout\": [\n    {\n      \"identifier\": \"ts001\",\n      \"transform\": [\n        \"tf001\"\n      ],\n      \"sink\": [\n        \"sk_jdbc_mysql\",\n        \"sk_jdbc_mariadb\",\n        \"sk_jdbc_oracle\",\n        \"sk_jdbc_postgresql\",\n        \"sk_jdbc_sqlserver\",\n        \"sk_jdbc_db2\",\n        \"sk_jdbc_hive\",\n        \"sk_jdbc_impala\",\n        \"sk_jdbc_doris\",\n        \"sk_jdbc_starrocks\",\n        \"sk_jdbc_phoenix\",\n        \"sk_jdbc_dameng\",\n        \"sk_jdbc_kingbase\",\n        \"sk_file_excel\",\n        \"sk_file_json\",\n        \"sk_file_text\",\n        \"sk_file_csv\",\n        \"sk_file_orc\",\n        \"sk_file_parquet\",\n        \"sk_file_xml\",\n        \"sk_hive\",\n        \"sk_iceberg\",\n        \"sk_kafka\",\n        \"sk_hbase\",\n        \"sk_mongodb\",\n        \"sk_elasticsearch\",\n        \"sk_redis\"\n      ]\n    }\n  ],\n  \"sink\": [\n    {\n      \"identifier\": \"sk_jdbc_mysql\",\n      \"name\": \"通过JDBC协议输出到MYSQL(实时更新)\",\n      \"type\": \"MYSQL\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:mysql://127.0.0.1:3306/stark\",\n        \"driver\": \"com.mysql.cj.jdbc.Driver\",\n        \"user\": \"stark\",\n        \"password\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_mariadb\",\n      \"name\": \"通过JDBC协议输出到MARIADB(实时更新)\",\n      \"type\": \"MARIADB\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:mariadb://127.0.0.1:3306/stark\",\n        \"driver\": \"org.mariadb.jdbc.Driver\",\n        \"user\": \"stark\",\n        \"password\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_oracle\",\n      \"name\": \"通过JDBC协议输出到ORACLE(实时更新)\",\n      \"type\": \"ORACLE\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:oracle:thin:@127.0.0.1:1521:XE\",\n        \"driver\": \"oracle.jdbc.OracleDriver\",\n        \"user\": \"stark\",\n        \"password\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_postgresql\",\n      \"name\": \"通过JDBC协议输出到POSTGRESQL(实时更新)\",\n      \"type\": \"POSTGRESQL\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:postgresql://127.0.0.1:5432/stark\",\n        \"driver\": \"org.postgresql.Driver\",\n        \"user\": \"stark\",\n        \"password\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_sqlserver\",\n      \"name\": \"通过JDBC协议输出到SQLSERVER(实时更新)\",\n      \"type\": \"SQLSERVER\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:sqlserver://;serverName=127.0.0.1;port=1433;databaseName=stark\",\n        \"driver\": \"com.microsoft.sqlserver.jdbc.SQLServerDriver\",\n        \"user\": \"sa\",\n        \"password\": \"password\",\n        \"schema\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_db2\",\n      \"name\": \"通过JDBC协议输出到DB2(实时更新)\",\n      \"type\": \"DB2\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:db2://127.0.0.1:50000/stark\",\n        \"driver\": \"com.ibm.db2.jcc.DB2Driver\",\n        \"user\": \"stark\",\n        \"password\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_hive\",\n      \"name\": \"通过JDBC协议输出到HIVE(实时更新)\",\n      \"type\": \"HIVEJDBC\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:hive2://127.0.0.1:10000/stark\",\n        \"driver\": \"org.apache.hive.jdbc.HiveDriver\",\n        \"user\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n\t{\n      \"identifier\": \"sk_jdbc_impala\",\n      \"name\": \"通过JDBC协议输出到IMPALA(离线任务)\",\n      \"type\": \"IMPALA\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:impala://127.0.0.1:21050/stark\",\n        \"driver\": \"com.cloudera.impala.jdbc.Driver\",\n        \"user\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_doris\",\n      \"name\": \"通过JDBC协议输出到DORIS(实时更新)\",\n      \"type\": \"DORIS\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:mysql://127.0.0.1:3306/stark\",\n        \"driver\": \"com.mysql.cj.jdbc.Driver\",\n        \"user\": \"stark\",\n        \"password\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_starrocks\",\n      \"name\": \"通过JDBC协议输出到STARROCKS(实时更新)\",\n      \"type\": \"STARROCKS\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:mysql://127.0.0.1:3306/stark\",\n        \"driver\": \"com.mysql.cj.jdbc.Driver\",\n        \"user\": \"stark\",\n        \"password\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_phoenix\",\n      \"name\": \"通过JDBC协议输出到PHOENIX(实时更新)\",\n      \"type\": \"PHOENIX\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:phoenix:node01,node02,node03:2181\",\n        \"driver\": \"org.apache.phoenix.jdbc.PhoenixDriver\",\n        \"schema\": \"STARK\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_dameng\",\n      \"name\": \"通过JDBC协议输出到DAMENG(实时更新)\",\n      \"type\": \"DAMENG\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:dm://127.0.0.1:5236/STARK\",\n        \"driver\": \"dm.jdbc.driver.DmDriver\",\n        \"user\": \"STARK\",\n        \"password\": \"STARK\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_jdbc_kingbase\",\n      \"name\": \"通过JDBC协议输出到KINGBASE(实时更新)\",\n      \"type\": \"KINGBASE\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:kingbase8://127.0.0.1:54321/stark\",\n        \"driver\": \"com.kingbase8.Driver\",\n        \"user\": \"kingbase\",\n        \"password\": \"kingbase\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_file_excel\",\n      \"name\": \"输出到EXCEL文件(实时更新)\",\n      \"type\": \"EXCEL\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/stark/users.xlsx\"\n      }\n    },\n    {\n      \"identifier\": \"sk_file_json\",\n      \"name\": \"输出到JSON文件(实时更新)\",\n      \"type\": \"JSON\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/stark/users.json\"\n      }\n    },\n    {\n      \"identifier\": \"sk_file_text\",\n      \"name\": \"输出到TXT文件(实时更新)\",\n      \"type\": \"TEXT\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/stark/users.txt\"\n      }\n    },\n    {\n      \"identifier\": \"sk_file_csv\",\n      \"name\": \"输出到CSV文件(实时更新)\",\n      \"type\": \"CSV\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/stark/users.csv\"\n      }\n    },\n    {\n      \"identifier\": \"sk_file_orc\",\n      \"name\": \"输出到ORC文件(实时更新)\",\n      \"type\": \"ORC\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/stark/users.orc\"\n      }\n    },\n    {\n      \"identifier\": \"sk_file_parquet\",\n      \"name\": \"输出到PARQUET文件(实时更新)\",\n      \"type\": \"PARQUET\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/stark/users.parquet\"\n      }\n    },\n    {\n      \"identifier\": \"sk_file_xml\",\n      \"name\": \"输出到XML文件(实时更新)\",\n      \"type\": \"XML\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/stark/users.xml\"\n      }\n    },\n    {\n      \"identifier\": \"sk_hive\",\n      \"name\": \"通过ThriftServer协议输出到HIVE(实时更新)\",\n      \"type\": \"HIVE\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"thrift://127.0.0.1:9083\",\n        \"database\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_iceberg\",\n      \"name\": \"通过ThriftServer协议输出到ICEBERG(实时更新)\",\n      \"type\": \"ICEBERG\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"thrift://127.0.0.1:9083\",\n        \"database\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_kafka\",\n      \"name\": \"输出到Kafka消息队列(实时更新)\",\n      \"type\": \"KAFKA\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"node01:9092,node02:9092,node03:9092\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_hbase\",\n      \"name\": \"输出到HBase列存数据库(实时更新)\",\n      \"type\": \"HBASE\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"node01,node02,node03\",\n        \"port\": \"2181\",\n        \"schema\": \"stark\",\n        \"dataset\": \"users\",\n        \"primaryKey\": \"id\"\n      }\n    },\n    {\n      \"identifier\": \"sk_mongodb\",\n      \"name\": \"输出到MongoDB文档数据库(实时更新)\",\n      \"type\": \"MONGODB\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"mongodb://127.0.0.1:27017\",\n        \"database\": \"stark\",\n        \"dataset\": \"users\"\n      }\n    },\n    {\n      \"identifier\": \"sk_elasticsearch\",\n      \"name\": \"输出到ElasticSearch全文检索数据库(实时更新)\",\n      \"type\": \"ELASTICSEARCH\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"127.0.0.1\",\n        \"port\": \"9200\",\n        \"dataset\": \"users\"\n      }\n    },\n\t{\n      \"identifier\": \"sk_redis\",\n      \"name\": \"输出到Redis缓存数据库(实时更新)\",\n      \"type\": \"REDIS\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"127.0.0.1\",\n        \"port\": \"6379\",\n        \"dataset\": \"users\"\n      }\n    }\n  ]\n}\n```\n\n## 机器学习规则文件样例（支持流式数据增量预测）\n```\n{\n  \"env\": {},\n  \"source\": [\n    {\n      \"identifier\": \"user_movie_training\",\n      \"name\": \"用户对电影类型的偏好记录样本数据集\",\n      \"type\": \"JSON\",\n      \"mode\": \"BATCH\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/stark/ml/data/user_movie_training.json\"\n      },\n      \"options\": {\n        \"multiLine\": \"true\"\n      }\n    },\n    {\n      \"identifier\": \"user_movie_prediction\",\n      \"name\": \"用户对电影类型的偏好预测数据集\",\n      \"type\": \"JSON\",\n      \"mode\": \"STREAM\",\n      \"connection\": {\n        \"url\": \"hdfs://cluster/stark/ml/data/user_movie_prediction\"\n      },\n      \"options\": {\n        \"multiLine\": \"true\"\n      }\n    }\n  ],\n  \"transform\": [\n    {\n      \"identifier\": \"tf001\",\n      \"name\": \"对用户电影类型偏好记录进行分类模型训练，用于根据用户预测偏好的电影类型\",\n      \"source\": [\n        \"user_movie_training\"\n      ],\n      \"ml\": {\n        \"training\": {\n          \"type\": \"GBTC\",\n          \"path\": \"hdfs://cluster/stark/ml/gbtc\",\n          \"params\": {\n            \"labelCol\": \"preference\"\n          }\n        }\n      },\n      \"transout\": [\n        \"ts001\"\n      ]\n    },\n    {\n      \"identifier\": \"tf002\",\n      \"name\": \"根据用户和电影类型推荐记录，预测对该电影类型的偏好程度\",\n      \"source\": [\n        \"user_movie_prediction\"\n      ],\n      \"ml\": {\n        \"prediction\": {\n          \"type\": \"GBTC\",\n          \"path\": \"hdfs://cluster/stark/ml/gbtc\"\n        }\n      },\n      \"transout\": [\n        \"ts002\"\n      ]\n    }\n  ],\n  \"transout\": [\n    {\n      \"identifier\": \"ts001\",\n      \"transform\": [\n        \"tf001\"\n      ]\n    },\n    {\n      \"identifier\": \"ts002\",\n      \"transform\": [\n        \"tf002\"\n      ],\n      \"sink\": [\n        \"sk_gbtc_prediction\"\n      ]\n    }\n  ],\n  \"sink\": [\n    {\n      \"identifier\": \"sk_gbtc_prediction\",\n      \"name\": \"输出根据用户和电影类型预测的偏好结果\",\n      \"type\": \"MYSQL\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:mysql://127.0.0.1:3306/stark\",\n        \"driver\": \"com.mysql.cj.jdbc.Driver\",\n        \"user\": \"stark\",\n        \"password\": \"stark\",\n        \"dataset\": \"sk_gbtc_prediction\"\n      }\n    }\n  ]\n}\n```\n\n## 数据质量校验规则文件样例（支持流式数据增量校验，生成校验报告）\n```\n{\n  \"env\": {},\n  \"source\": [\n    {\n      \"identifier\": \"ss001\",\n      \"name\": \"数据质量校验测试表\",\n      \"type\": \"MYSQL\",\n      \"mode\": \"BATCH\",\n      \"connection\": {\n        \"url\": \"jdbc:mysql://127.0.0.1:3306/stark\",\n        \"driver\": \"com.mysql.cj.jdbc.Driver\",\n        \"user\": \"root\",\n        \"password\": \"root\",\n        \"dataset\": \"dqtest\"\n      }\n    }\n  ],\n  \"transform\": [\n    {\n      \"identifier\": \"tf001\",\n      \"name\": \"数据质量校验：完整性、唯一性、准确性、及时性、有效性、一致性\",\n      \"source\": [\n        \"ss001\"\n      ],\n      \"check\": {\n        \"checkAccuracy\": {\n          \"hasSize\": [\"size\u003e0 \u0026\u0026 size\u003c100\"],\n          \"hasColumnCount\": [\"count\u003e0 \u0026\u0026 count\u003c10\"],\n          \"hasMin\": [\"age:min\u003e=18 \u0026\u0026 min\u003c=35\"],\n          \"hasMax\": [\"age:max\u003e=60 \u0026\u0026 max\u003c120\"],\n          \"hasMean\": [\"age:mean\u003e=18 \u0026\u0026 mean\u003c=35\"],\n          \"hasSum\": [\"salary:sum\u003e0 \u0026\u0026 sum\u003c100000\"],\n          \"isNonNegative\": [\"age\"],\n          \"isPositive\": [\"age\"],\n          \"hasMinLength\": [\"name:length==2\"],\n          \"hasMaxLength\": [\"name:length==4\"]\n        },\n        \"checkCompleteness\": {\n          \"isComplete\": [\"id\"],\n          \"areComplete\": [\"id,name\"],\n          \"areAnyComplete\": [\"age,birthday\"],\n          \"hasCompleteness\": [\"age:ratio\u003e0.2\"],\n          \"haveCompleteness\": [\"name,age:ratio\u003e0.2\"],\n          \"haveAnyCompleteness\": [\"name,age:ratio\u003e0.2\"]\n        },\n        \"checkConsistency\": {\n          \"isLessThan\": [\"c1,c2\"],\n          \"isLessThanOrEqualTo\": [\"c1,c2\"],\n          \"isGreaterThan\": [\"c1,c2\"],\n          \"isGreaterThanOrEqualTo\": [\"c1,c2\"],\n          \"isContainedIn\": [\"sex:男,女\"],\n          \"hasMutualInformation\": [\"city,address:ratio\u003e0.5\"]\n        },\n        \"checkEffectiveness\": {\n          \"containsIdCard\": [\"idCard:ratio\u003e0.8\"],\n          \"containsMobilePhone\": [\"mobile:ratio\u003e0.8\"],\n          \"containsTelePhone\": [\"tele:ratio\u003e0.8\"],\n          \"containsBankCard\": [\"bankcard:ratio\u003e0.8\"],\n          \"containsEmail\": [\"email:ratio\u003e0.8\"],\n          \"containsURL\": [\"url:ratio\u003e0.8\"],\n          \"containsIP\": [\"ip:ratio\u003e0.8\"],\n          \"containsLongitude\": [\"longitude:ratio\u003e0.8\"],\n          \"containsLatitude\": [\"latitude:ratio\u003e0.8\"],\n          \"hasDataType\": [\"id:Numeric\"],\n          \"hasPattern\": [\"idcard:pattern\"]\n        },\n        \"checkUniqueness\": {\n          \"isUnique\": [\"id\"],\n          \"isPrimaryKey\": [\"id,name\"],\n          \"hasUniqueness\": [\"id:ratio==1\"],\n          \"haveUniqueness\": [\"name,age:ratio\u003e0.5\"],\n          \"hasDistinctness\": [\"id:ratio==1\"],\n          \"haveDistinctness\": [\"name,age:ratio\u003e0.5\"],\n          \"hasUniqueValueRatio\": [\"id:ratio==1\"],\n          \"haveUniqueValueRatio\": [\"name,age:ratio\u003e0.5\"],\n          \"hasNumberOfDistinctValues\": [\"name:number\u003e0 \u0026\u0026 number\u003c10\"]\n        }\n      },\n      \"transout\": [\n        \"ts001\"\n      ]\n    }\n  ],\n  \"transout\": [\n    {\n      \"identifier\": \"ts001\",\n      \"transform\": [\n        \"tf001\"\n      ],\n      \"sink\": [\n        \"sk_mysql\"\n      ]\n    }\n  ],\n  \"sink\": [\n    {\n      \"identifier\": \"sk_mysql\",\n      \"name\": \"输出数据质量检测报告\",\n      \"type\": \"MYSQL\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:mysql://127.0.0.1:3306/stark\",\n        \"driver\": \"com.mysql.cj.jdbc.Driver\",\n        \"user\": \"root\",\n        \"password\": \"root\",\n        \"dataset\": \"dqreport\"\n      }\n    }\n  ]\n}\n```\n\n## 敏感类型识别规则文件样例（支持100+敏感类型自动识别）\n```\n{\n  \"env\": {},\n  \"source\": [\n    {\n      \"identifier\": \"ss001\",\n      \"name\": \"敏感类型识别\",\n      \"type\": \"MYSQL\",\n      \"mode\": \"BATCH\",\n      \"connection\": {\n        \"url\": \"jdbc:mysql://127.0.0.1:3306/stark\",\n        \"driver\": \"com.mysql.cj.jdbc.Driver\",\n        \"user\": \"root\",\n        \"password\": \"root\",\n        \"dataset\": \"dttest\"\n      }\n    }\n  ],\n  \"transform\": [\n    {\n      \"identifier\": \"tf001\",\n      \"name\": \"敏感数据识别策略配置\",\n      \"source\": [\n        \"ss001\"\n      ],\n      \"detect\": {\n        \"match\": {\n          \"ratio\": \"0.5\"\n        }\n      },\n      \"transout\": [\n        \"ts001\"\n      ]\n    }\n  ],\n  \"transout\": [\n    {\n      \"identifier\": \"ts001\",\n      \"transform\": [\n        \"tf001\"\n      ],\n      \"sink\": [\n        \"sk_mysql\"\n      ]\n    }\n  ],\n  \"sink\": [\n    {\n      \"identifier\": \"sk_mysql\",\n      \"name\": \"输出敏感类型识别结果\",\n      \"type\": \"MYSQL\",\n      \"mode\": \"APPEND\",\n      \"connection\": {\n        \"url\": \"jdbc:mysql://127.0.0.1:3306/stark\",\n        \"driver\": \"com.mysql.cj.jdbc.Driver\",\n        \"user\": \"root\",\n        \"password\": \"root\",\n        \"dataset\": \"dtreport\"\n      }\n    }\n  ]\n}\n```\n\n## Stark大数据治理引擎 `[2.2.0]` 重磅更新！基于二进制安装包，无需任何配置，解压即用！\n* 全量功能免费开放，支持批流一体的数据集成和数据分析，支持CDC实时数据采集、机器学习算法模型、数据质量校验、数据标注、敏感数据识别、数据建模、算法建模和OLAP数据分析\n* 零编码，零技术门槛，仅需配置规则文件即可完成一站式的大数据治理任务，人人都可以成为大数据治理专家\n* 自带集群内核，支持本地模式、集群模式提交任务，集群节点支持动态扩容，可满足百亿级的多源异构数据处理需求\n* 支持30+数据源，涵盖关系型数据库、NoSQL数据库、MPP数据库、数据湖、消息中间件、图数据库、空间数据库、时序库、分布式文件等\n* 内置20+机器学习算法，包括分类算法、回归算法、聚类算法和推荐算法等，未来还会融入深度学习以及自然语言处理算法等\n* 内置50+数据质量校验规则，涵盖`[完整性、唯一性、准确性、及时性、有效性、一致性]`六种校验维度，支持对各种离线和实时数据进行数据质量监测，同时生成校验报告\n* 内置`[人、地、事、物、网、情、组织]`七大类数据要素，支持自动化数据标注，可对100+敏感类型数据进行自动识别，用于数据安全管控和动态脱敏等场景\n* 点击下载安装包：[stark-2.2.0.tgz](https://github.com/hexnn/Stark/releases/download/2.2.0/stark-2.2.0.tgz)\n* 将安装包上传到服务器，执行 `tar -zxvf stark-2.2.0.tgz` 命令解压完成安装，解压后的目录结构及说明如下\n```\nstark-2.2.0\n  /bin            # 命令行工具，Stark引擎启动入口\n  /conf           # 引擎配置文件\n  /connect        # CDC数据采集插件\n  /data           # 样例数据，包括机器学习训练及预测样本等\n  /examples       # 规则文件示例，涵盖离线、实时、批流一体、机器学习、数据质量校验、敏感数据识别等规则示例\n  /jars           # 依赖包\n  /kafka-logs     # kafka数据目录\n  /logs           # 引擎执行日志\n  /rule           # 规则文件目录\n  /sbin           # 管理工具，Stark集群管理命令\n  /stark-events   # 事件执行日志\n  /zkdata         # zookeeper数据目录\n```\n* 修改 `rule/rule.json` 规则文件，指定 `source` 和 `sink` 中的数据源连接信息，执行 `bin/stark-run` 命令启动任务\n* 支持多种任务提交方式，可按照实际需求自由选择，以下为 `stark-run` 命令行示例\n```\nExamples:\n  1.以默认配置文件和规则文件运行\n  $ stark-run\n\n  2.自定义规则文件\n  $ stark-run --rule ../rule/rule.json\n\n  3.自定义配置文件和规则文件\n  $ stark-run --config ../conf/stark.properties --rule ../rule/rule.json\n\n  4.以本地模式提交任务\n  $ stark-run --master local[*]\n\n  5.提交任务到SPARK独立集群\n  $ stark-run --master spark://host:port --deploy-mode cluster\n\n  6.提交任务到YARN集群\n  $ stark-run --master yarn --deploy-mode cluster --queue default\n```\n* 任务执行结束后，查看 `sink` 节点指定的数据连接及输出，验证数据是否写入成功\n* 新增 `stark-check`引擎规则文件校验工具，支持对引擎规则文件的配置信息进行有效性验证，使用方式如下\n```\nExamples:\n  1.检测默认引擎规则文件的有效性\n  $ stark-check\n\n  2.检测自定义引擎规则文件的有效性\n  $ stark-check --rule ../rule/rule.json\n\n校验成功示例：$ stark-check --rule ../rule/batch.json\n输出信息：\n待校验引擎规则文件：/opt/stark-2.2.0/rule/batch.json\n=======================================================================\n信息：Stark引擎规则文件[/opt/stark-2.2.0/rule/batch.json]配置正确\n=======================================================================\n\n校验失败示例：$ stark-check --rule ../rule/batch.json\n输出信息：\n待校验引擎规则文件：/opt/stark-2.2.0/rule/batch.json\n=======================================================================\n错误：Stark引擎规则文件[/opt/stark-2.2.0/rule/batch.json]配置有误，请参阅以下信息进行格式检查\n- $.source[0]: 未找到所需属性“identifier”\n- $.source[0]: 未找到所需属性“name”\n- $.source[0]: 未找到所需属性“type”\n- $.source[0]: 未找到所需属性“mode”\n=======================================================================\n```\n\n## 联系方式\n* 通过以下方式了解更多关于Stark大数据治理引擎的相关信息，可接受各种技术合作、项目定制化开发等需求↓↓↓\n* WeChat：xxx-hx-xxx（潇湘夜雨）\n* Email：hexing_xx@163.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhexnn%2Fstark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhexnn%2Fstark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhexnn%2Fstark/lists"}