{"id":28401921,"url":"https://github.com/griddb/griddb_spark","last_synced_at":"2025-10-07T07:44:45.664Z","repository":{"id":65218803,"uuid":"95866806","full_name":"griddb/griddb_spark","owner":"griddb","description":"GridDB connector for Apache Spark","archived":false,"fork":false,"pushed_at":"2022-12-26T01:34:10.000Z","size":191,"stargazers_count":4,"open_issues_count":0,"forks_count":4,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-06-25T17:43:49.336Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/griddb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-30T08:25:27.000Z","updated_at":"2022-12-26T01:34:15.000Z","dependencies_parsed_at":"2023-01-15T15:30:23.563Z","dependency_job_id":null,"html_url":"https://github.com/griddb/griddb_spark","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/griddb/griddb_spark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griddb%2Fgriddb_spark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griddb%2Fgriddb_spark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griddb%2Fgriddb_spark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griddb%2Fgriddb_spark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/griddb","download_url":"https://codeload.github.com/griddb/griddb_spark/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griddb%2Fgriddb_spark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278740830,"owners_count":26037480,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-01T14:06:40.730Z","updated_at":"2025-10-07T07:44:45.652Z","avatar_url":"https://github.com/griddb.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"GridDB connector for Apache Spark\n\n## Overview\n\nGridDB connector for [Apache Spark](https://spark.apache.org/) is a module supporting connection between GridDB and Apache Spark. \nThis uses GridDB server, GridDB Java client, and GridDB connector for [Apache Hadoop](http://hadoop.apache.org/) MapReduce.\nWe can create DataFrame from an existing GridDB container and operate with it.\n\n## Operating environment\n\nLibrary building and program execution are checked in the environment below.\n\n    OS:             CentOS6.7(x64)\n    Java:           JDK 1.8.0_101\n\tApache Hadoop:  Version 2.6.5\n\tApache Spark:   Version 2.1.0\n\tScala:          Version 2.11.8\n\t\n    GridDB server and Java client:                3.0 CE\n    GridDB connector for Apache Hadoop MapReduce: 1.0\n\n## QuickStart\n### Preparations\n1. Install Hadoop and Spark\n\n\t\t$ cd [INSTALL_FOLDER]\n\t\t$ wget http://archive.apache.org/dist/hadoop/core/hadoop-2.6.5/hadoop-2.6.5.tar.gz\n\t\t$ tar xvfz hadoop-2.6.5.tar.gz\n\t\t$ wget http://archive.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.6.tgz\n\t\t$ tar xvfz spark-2.1.0-bin-hadoop2.6.tgz\n\n    Note: [INSTALL_FOLDER] means the folder installed for Spark, Hadoop and GridDB connector for Spark.\n\n2. Please add the following environment variables to .bashrc\n\t\t\n\t\t$ vi ~/.bashrc\n\t\texport JAVA_HOME=/usr/lib/jvm/[JDK folder]\n\t\texport HADOOP_HOME=[INSTALL_FOLDER]/hadoop-2.6.5\n\t\texport SPARK_HOME=[INSTALL_FOLDER]/spark-2.1.0-bin-hadoop2.6\n\t\texport GRIDDB_SPARK=[INSTALL_FOLDER]/griddb_spark\n\t\texport GRIDDB_SPARK_PROPERTIES=$GRIDDB_SPARK/gd-config.xml\n\t\t\n\t\texport PATH=$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$PATH\n\t\t\n\t\texport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native\n\t\texport HADOOP_OPTS=\"$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native\"\n\n\t\t$ source ~/.bashrc\n\n3. Please modify file \"gd-config.xml\"\n\n\t\t$ cd [INSTALL_FOLDER]/griddb_spark\n\t\t$ vi gd-config.xml\n\t\t\n\t\t\u003c!-- GridDB properties --\u003e\n\t\t\u003cproperty\u003e\n\t\t\t\u003cname\u003egs.user\u003c/name\u003e\n\t\t\t\u003cvalue\u003e[GridDB user]\u003c/value\u003e\n\t\t\u003c/property\u003e\n\t\t\u003cproperty\u003e\n\t\t\t\u003cname\u003egs.password\u003c/name\u003e\n\t\t\t\u003cvalue\u003e[GridDB password]\u003c/value\u003e\n\t\t\u003c/property\u003e\n\t\t\u003cproperty\u003e\n\t\t\t\u003cname\u003egs.cluster.name\u003c/name\u003e\n\t\t\t\u003cvalue\u003e[GridDB cluster name]\u003c/value\u003e\n\t\t\u003c/property\u003e\n\t\t\u003c!-- Define address and port for multicast method, leave it blank if using other method --\u003e\n\t\t\u003cproperty\u003e\n\t\t\t\u003cname\u003egs.notification.address\u003c/name\u003e\n\t\t\t\u003cvalue\u003e[GridDB notification address(default is 239.0.0.1)]\u003c/value\u003e\n\t\t\u003c/property\u003e\n\t\t\u003cproperty\u003e\n\t\t\t\u003cname\u003egs.notification.port\u003c/name\u003e\n\t\t\t\u003cvalue\u003e[GridDB notification port(default is 31999)]\u003c/value\u003e\n\t\t\u003c/property\u003e\n\nPlease refer to [Configuration](Configuration.md) for GridDB properties.\n\n4. Build a GridDB Java client and a GridDB connector for Hadoop MapReduce,  \n   place the following files under the griddb_spark/gs-spark-datasource/lib directory.\n\n    gridstore.jar  \n    gs-hadoop-mapreduce-client-1.0.0.jar\n\n5. Add SPARK_CLASSPATH to \"spark-env.sh\"\n\t\t\n\t\t$ cd [INSTALL_FOLDER]/spark-2.1.0-bin-hadoop2.6\n\t\t$ vi conf/spark-env.sh\n\t\tSPARK_CLASSPATH=.:$GRIDDB_SAPRK/gs-spark-datasource/target/gs-spark-datasource.jar:\n\t\t\t$GRIDDB_SAPRK/gs-spark-datasource/lib/gridstore.jar:\n\t\t\t$GRIDDB_SAPRK/gs-spark-datasource/lib/gs-hadoop-mapreduce-client-1.0.0.jar\n\n### Build the connector and an example\n\nRun the mvn command like the following:\n\n\t$ cd [INSTALL_FOLDER]/griddb_spark\n\t$ mvn package\n\nand create the following jar files. \n\n\tgs-spark-datasource/target/gs-spark-datasource.jar\n\tgs-spark-datasource-example/target/example.jar\n\n### Run the example program\n\nGridDB cluster needs to be started in advance.\n\n1. Put data to server with GridDB Java client\n\n\t\t$ cd [INSTALL_FOLDER]/griddb_spark\n\t\t$ java -cp ./gs-spark-datasource-example/target/example.jar:gs-spark-datasource/lib/gridstore.jar \n\t\t\tInit \u003cGridDB notification address\u003e \u003cGridDB notification port\u003e\n\t\t\t\t\t\t\u003cGridDB cluster name\u003e \u003cGridDB user\u003e \u003cGridDB password\u003e\n\n2. Run some queries with GridDB connector for Spark\n\n\t\t$ spark-submit --class Query ./gs-spark-datasource-example/target/example.jar\n\n## API\n\nWith a SparkSession, applications can create DataFrames from an existing GridDB container in the form as bellow.\n\n    var df = session.read.format(\"com.toshiba.mwcloud.gs.spark.datasource\").load(containerName)\n\n## Community\n\n  * Issues  \n    Use the GitHub issue function if you have any requests, questions, or bug reports. \n  * PullRequest  \n    Use the GitHub pull request function if you want to contribute code.\n    You'll need to agree GridDB Contributor License Agreement(CLA_rev1.1.pdf).\n    By using the GitHub pull request function, you shall be deemed to have agreed to GridDB Contributor License Agreement.\n\n## License\n  \n  The GridDB connector source is licensed under the Apache License, version 2.0.\n  \n## Trademarks\n  \n  Apache Spark, Apache Hadoop, Spark, and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgriddb%2Fgriddb_spark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgriddb%2Fgriddb_spark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgriddb%2Fgriddb_spark/lists"}