{"id":14987956,"url":"https://github.com/apache/doris-spark-connector","last_synced_at":"2025-05-16T09:04:10.559Z","repository":{"id":36986898,"uuid":"457619173","full_name":"apache/doris-spark-connector","owner":"apache","description":"Spark Connector for Apache Doris","archived":false,"fork":false,"pushed_at":"2025-05-09T01:51:39.000Z","size":983,"stargazers_count":91,"open_issues_count":37,"forks_count":100,"subscribers_count":33,"default_branch":"master","last_synced_at":"2025-05-10T17:16:24.946Z","etag":null,"topics":["apache","connector","data-warehousing","dbms","doris","mpp","olap","spark"],"latest_commit_sha":null,"homepage":"https://doris.apache.org/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-dependencies.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-10T03:40:38.000Z","updated_at":"2025-05-09T01:51:44.000Z","dependencies_parsed_at":"2023-01-17T12:16:51.538Z","dependency_job_id":"94baac2e-ffce-4203-9347-fb3cd7e57852","html_url":"https://github.com/apache/doris-spark-connector","commit_stats":{"total_commits":191,"total_committers":51,"mean_commits":"3.7450980392156863","dds":0.7225130890052356,"last_synced_commit":"8e3e514a2699661603505abbe91d372908e64313"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fdoris-spark-connector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fdoris-spark-connector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fdoris-spark-connector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fdoris-spark-connector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/doris-spark-connector/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253850678,"owners_count":21973663,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache","connector","data-warehousing","dbms","doris","mpp","olap","spark"],"created_at":"2024-09-24T14:15:48.662Z","updated_at":"2025-05-16T09:04:10.552Z","avatar_url":"https://github.com/apache.png","language":"Java","funding_links":[],"categories":["大数据"],"sub_categories":[],"readme":"\u003c!--\nLicensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements.  See the NOTICE file\ndistributed with this work for additional information\nregarding copyright ownership.  The ASF licenses this file\nto you under the Apache License, Version 2.0 (the\n\"License\"); you may not use this file except in compliance\nwith the License.  You may obtain a copy of the License at\n\n  http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing,\nsoftware distributed under the License is distributed on an\n\"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\nKIND, either express or implied.  See the License for the\nspecific language governing permissions and limitations\nunder the License.\n--\u003e\n\n# Spark Connector for Apache Doris \n\n[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)\n[![Join the Doris Community at Slack](https://img.shields.io/badge/chat-slack-brightgreen)](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-11jb8gesh-7IukzSrdea6mqoG0HB4gZg)\n\n### Spark Doris Connector\n\nMore information about compilation and usage, please visit [Spark Doris Connector](https://doris.apache.org/docs/ecosystem/spark-doris-connector)\n\n## License\n\n[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)\n\n## How to Build\n\nYou need to copy customer_env.sh.tpl to customer_env.sh before build and you need to configure it before build.\n```shell\ngit clone git@github.com:apache/doris-spark-connector.git\ncd doris-spark-connector/spark-doris-connector\n./build.sh\n```\n\n### QuickStart\n\n1. download and compile Spark Doris Connector from  https://github.com/apache/doris-spark-connector, we suggest compile Spark Doris Connector  by Doris official image。\n\n```bash\n$ docker pull apache/doris:build-env-ldb-toolchain-latest\n```\n\n2. the result of compile jar is like：spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar\n\n3. download spark for https://spark.apache.org/downloads.html   .if in china there have a good choice of tencent link  https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/\n\n```bash\n#download\nwget https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz\n#decompression\ntar -xzvf spark-3.1.2-bin-hadoop3.2.tgz\n```\n\n4. config Spark environment\n\n```shell\nvim /etc/profile\nexport SPARK_HOME=/your_parh/spark-3.1.2-bin-hadoop3.2\nexport PATH=$PATH:$SPARK_HOME/bin\nsource /etc/profile\n```\n\n5. copy spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar to spark  jars directory。\n\n```shell\ncp /your_path/spark-doris-connector/target/spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar  $SPARK_HOME/jars\n```\n\n6. created  doris database and table。\n\n   ```sql\n   create database mongo_doris;\n   use mongo_doris;\n   CREATE TABLE data_sync_test_simple\n    (\n            _id VARCHAR(32) DEFAULT '',\n            id VARCHAR(32) DEFAULT '',\n            user_name VARCHAR(32) DEFAULT '',\n            member_list VARCHAR(32) DEFAULT ''\n    )\n    DUPLICATE KEY(_id)\n    DISTRIBUTED BY HASH(_id) BUCKETS 10\n    PROPERTIES(\"replication_num\" = \"1\");\n   INSERT INTO data_sync_test_simple VALUES ('1','1','alex','123');\n   ```\n\n   7. Input this coed in spark-shell.\n\n```bash\nimport org.apache.doris.spark._\nval dorisSparkRDD = sc.dorisRDD(\n  tableIdentifier = Some(\"mongo_doris.data_sync_test\"),\n  cfg = Some(Map(\n    \"doris.fenodes\" -\u003e \"127.0.0.1:8030\",\n    \"doris.request.auth.user\" -\u003e \"root\",\n    \"doris.request.auth.password\" -\u003e \"\"\n  ))\n)\ndorisSparkRDD.collect()\n```\n\n- mongo_doris:doris database name\n- data_sync_test:doris  table mame.\n- doris.fenodes:doris FE IP:http_port\n- doris.request.auth.user:doris  user name.\n- doris.request.auth.password:doris  password\n\n8. if Spark is Cluster model,upload Jar to HDFS，add doris-spark-connector jar HDFS URL in  spark.yarn.jars.\n\n```bash\nspark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar\n```\n\nLink：https://github.com/apache/doris/discussions/9486\n\n9. in pyspark,input this code in pyspark shell command.\n\n```bash\ndorisSparkDF = spark.read.format(\"doris\")\n.option(\"doris.table.identifier\", \"mongo_doris.data_sync_test\")\n.option(\"doris.fenodes\", \"127.0.0.1:8030\")\n.option(\"user\", \"root\")\n.option(\"password\", \"\")\n.load()\n# show 5 lines data \ndorisSparkDF.show(5)\n```\n\n## type convertion for writing to doris using arrow\n|doris|spark|\n|---|---|\n| BOOLEAN | BooleanType |\n| TINYINT | ByteType |\n| SMALLINT | ShortType |\n| INT | IntegerType |\n| BIGINT | LongType |\n| LARGEINT | StringType |\n| FLOAT | FloatType |\n| DOUBLE | DoubleType |\n| DECIMAL(M,D) | DecimalType(M,D) |\n| DATE | DateType |\n| DATETIME | TimestampType |\n| CHAR(L) | StringType |\n| VARCHAR(L) | StringType |\n| STRING | StringType |\n| ARRAY | ARRAY |\n| MAP | MAP |\n| STRUCT | STRUCT |\n\n\n\n## Report issues or submit pull request\n\nIf you find any bugs, feel free to file a [GitHub issue](https://github.com/apache/doris/issues) or fix it by submitting a [pull request](https://github.com/apache/doris/pulls).\n\n## Contact Us\n\nContact us through the following mailing list.\n\n| Name                                                                          | Scope                           |                                                                 |                                                                     |                                                                              |\n|:------------------------------------------------------------------------------|:--------------------------------|:----------------------------------------------------------------|:--------------------------------------------------------------------|:-----------------------------------------------------------------------------|\n| [dev@doris.apache.org](mailto:dev@doris.apache.org)     | Development-related discussions | [Subscribe](mailto:dev-subscribe@doris.apache.org)   | [Unsubscribe](mailto:dev-unsubscribe@doris.apache.org)   | [Archives](https://mail-archives.apache.org/mod_mbox/doris-dev/)   |\n\n## Links\n\n* Doris official site - \u003chttps://doris.apache.org\u003e\n* Developer Mailing list - \u003cdev@doris.apache.org\u003e. Mail to \u003cdev-subscribe@doris.apache.org\u003e, follow the reply to subscribe the mail list.\n* Slack channel - [Join the Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-11jb8gesh-7IukzSrdea6mqoG0HB4gZg)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fdoris-spark-connector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Fdoris-spark-connector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fdoris-spark-connector/lists"}