{"id":13795039,"url":"https://github.com/orientechnologies/spark-orientdb","last_synced_at":"2025-05-12T21:33:11.682Z","repository":{"id":49598994,"uuid":"72446612","full_name":"orientechnologies/spark-orientdb","owner":"orientechnologies","description":"Apache Spark datasource for OrientDB","archived":false,"fork":false,"pushed_at":"2021-06-12T15:05:25.000Z","size":148,"stargazers_count":19,"open_issues_count":17,"forks_count":11,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-08-04T23:09:11.051Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/orientechnologies.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-10-31T14:51:23.000Z","updated_at":"2022-08-03T07:26:45.000Z","dependencies_parsed_at":"2022-09-19T04:21:35.697Z","dependency_job_id":null,"html_url":"https://github.com/orientechnologies/spark-orientdb","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orientechnologies%2Fspark-orientdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orientechnologies%2Fspark-orientdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orientechnologies%2Fspark-orientdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orientechnologies%2Fspark-orientdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/orientechnologies","download_url":"https://codeload.github.com/orientechnologies/spark-orientdb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225157000,"owners_count":17429698,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T23:00:51.501Z","updated_at":"2024-11-18T09:31:36.263Z","avatar_url":"https://github.com/orientechnologies.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"spark-orientdb\r\n==============\r\n[![Build Status](https://travis-ci.org/sbcd90/spark-orientdb.svg?branch=master)](https://travis-ci.org/sbcd90/spark-orientdb)    [ ![Download](https://api.bintray.com/packages/sbcd90/org.apache.spark/spark-orientdb-1.6.2_2.10/images/download.svg) ](https://bintray.com/sbcd90/org.apache.spark/spark-orientdb-1.6.2_2.10/_latestVersion)\r\n\r\nApache Spark datasource for OrientDB\r\n\r\nOrientDB documentation\r\n======================\r\n\r\nHere is the latest documentation on [OrientDB](http://orientdb.com/orientdb/)\r\n\r\nCompatibility\r\n=============\r\n\r\n`Spark`: 1.6+\r\n`OrientDB`: 2.2.0+\r\n\r\nGetting Started\r\n===============\r\n\r\n- Add the repository\r\n\r\n```\r\n\u003crepository\u003e\r\n   \u003cid\u003ebintray\u003c/id\u003e\r\n   \u003cname\u003ebintray\u003c/name\u003e\r\n   \u003curl\u003ehttps://dl.bintray.com/sbcd90/org.apache.spark/\u003c/url\u003e\r\n\u003c/repository\u003e\r\n```\r\n\r\n### For Spark 1.6\r\n\r\n- Add the datasource as a maven dependency\r\n\r\n```\r\n\u003cdependency\u003e\r\n   \u003cgroupId\u003eorg.apache.spark\u003c/groupId\u003e\r\n   \u003cartifactId\u003espark-orientdb-1.6.2_2.10\u003c/artifactId\u003e\r\n   \u003cversion\u003e1.3\u003c/version\u003e\r\n\u003c/dependency\u003e\r\n```\r\n\r\n### For Spark 2.0\r\n\r\n- Add the datasource as a maven dependency\r\n\r\n```\r\n\u003cdependency\u003e\r\n   \u003cgroupId\u003eorg.apache.spark\u003c/groupId\u003e\r\n   \u003cartifactId\u003espark-orientdb-2.0.0_2.10\u003c/artifactId\u003e\r\n   \u003cversion\u003e1.4\u003c/version\u003e\r\n\u003c/dependency\u003e\r\n```\r\n\r\n### For Spark 2.1\r\n\r\n```\r\n\u003cdependency\u003e\r\n   \u003cgroupId\u003eorg.apache.spark\u003c/groupId\u003e\r\n   \u003cartifactId\u003espark-orientdb-2.1.1_2.11\u003c/artifactId\u003e\r\n   \u003cversion\u003e1.4\u003c/version\u003e\r\n\u003c/dependency\u003e\r\n```\r\n\r\n### For Spark 2.2\r\n\r\n```\r\n\u003cdependency\u003e\r\n   \u003cgroupId\u003eorg.apache.spark\u003c/groupId\u003e\r\n   \u003cartifactId\u003espark-orientdb-2.2.1_2.11\u003c/artifactId\u003e\r\n   \u003cversion\u003e1.4\u003c/version\u003e\r\n\u003c/dependency\u003e\r\n```\r\n\r\nScala api\r\n=========\r\n\r\n### OrientDB Documents\r\n\r\n#### Write api:\r\n\r\n```\r\nimport org.apache.spark.sql.SQLContext\r\n\r\nval sqlContext = new SQLContext(sc)\r\nsqlContext.createDataFrame(sc.parallelize(Array(1, 2, 3, 4, 5)), \r\n      StructType(Seq(StructField(\"id\", IntegerType)))\r\n      .write\r\n      .format(\"org.apache.spark.orientdb.documents\")\r\n      .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n      .option(\"user\", ORIENTDB_USER).option(\"password\", ORIENTDB_PASSWORD)\r\n      .option(\"class\", test_table)\r\n      .mode(SaveMode.Overwrite)\r\n      .save()\r\n```\r\n\r\n#### Read api:\r\n\r\n```\r\nimport org.apache.spark.sql.SQLContext\r\n\r\nval sqlContext = new SQLContext(sc)\r\nval loadedDf = sqlContext.read\r\n      .format(\"org.apache.spark.orientdb.documents\")\r\n      .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n      .option(\"user\", ORIENTDB_USER)\r\n      .option(\"password\", ORIENTDB_PASSWORD)\r\n      .option(\"class\", test_table)\r\n      .option(\"query\", s\"select * from $test_table where teststring = 'asdf'\")\r\n      .load()\r\n```\r\n\r\n#### Query using OrientDB SQL:\r\n\r\n```\r\nimport org.apache.spark.sql.SQLContext\r\n\r\nval sqlContext = new SQLContext(sc)\r\nval loadedDf = sqlContext.read\r\n      .format(\"org.apache.spark.orientdb.documents\")\r\n      .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n      .option(\"user\", ORIENTDB_USER)\r\n      .option(\"password\", ORIENTDB_PASSWORD)\r\n      .option(\"class\", test_table)\r\n      .option(\"query\", s\"select * from $test_table where teststring = 'asdf'\")\r\n      .load()\r\n```\r\n\r\n#### Support for Embedded Types( Since Spark 2.1 release):\r\n\r\n```\r\nval testSchemaForEmbeddedUDTs: StructType = {\r\n    StructType(Seq(\r\n      StructField(\"embeddedlist\", EmbeddedListType),\r\n      StructField(\"embeddedset\", EmbeddedSetType),\r\n      StructField(\"embeddedmap\", EmbeddedMapType)\r\n    ))\r\n  }\r\n```\r\n\r\n```\r\nval expectedDataForEmbeddedUDTs: Seq[Row] = Seq(\r\n    Row(EmbeddedList(Array(1, 1.toByte, true, TestUtils.toDate(2015, 6, 1), 1234152.12312498,\r\n      1.0f, 42, 1239012341823719L, 23.toShort, \"Unicode's樂趣\",\r\n      TestUtils.toTimestamp(2015, 6, 1, 0, 0, 0, 1))),\r\n      EmbeddedSet(Array(1, 1.toByte, true, TestUtils.toDate(2015, 6, 1), 1234152.12312498,\r\n        1.0f, 42, 1239012341823719L, 23.toShort, \"Unicode's樂趣\",\r\n        TestUtils.toTimestamp(2015, 6, 1, 0, 0, 0, 1))),\r\n      EmbeddedMap(Map(1 -\u003e 1, 2 -\u003e 1.toByte, 3 -\u003e true, 4 -\u003e TestUtils.toDate(2015, 6, 1), 5 -\u003e 1234152.12312498,\r\n        6 -\u003e 1.0f, 7 -\u003e 42, 8 -\u003e 1239012341823719L, 9 -\u003e 23.toShort, 10 -\u003e \"Unicode's樂趣\", 11 -\u003e TestUtils.toTimestamp(2015, 6, 1, 0, 0, 0, 1))))\r\n  )\r\n```\r\n\r\n#### Support for Link Types( Since Spark 2.1 release):\r\n\r\n```\r\nval testSchemaForLinkUDTs: StructType = {\r\n    StructType(Seq(\r\n      StructField(\"linklist\", LinkListType),\r\n      StructField(\"linkset\", LinkSetType),\r\n      StructField(\"linkmap\", LinkMapType),\r\n      StructField(\"linkbag\", LinkBagType)\r\n    ))\r\n  }\r\n```\r\n\r\n```\r\nval expectedDataForLinkUDTs: Seq[Row] = Seq(\r\n    Row(LinkList(Array(oDocument1)), LinkSet(Array(oDocument1)), LinkMap(Map(\"1\" -\u003e oDocument1)), LinkBag(Array(oRid1))),\r\n    Row(LinkList(Array(oDocument2)), LinkSet(Array(oDocument2)), LinkMap(Map(\"1\" -\u003e oDocument2)), LinkBag(Array(oRid2))),\r\n    Row(LinkList(Array(oDocument3)), LinkSet(Array(oDocument3)), LinkMap(Map(\"1\" -\u003e oDocument3)), LinkBag(Array(oRid3))),\r\n    Row(LinkList(Array(oDocument4)), LinkSet(Array(oDocument4)), LinkMap(Map(\"1\" -\u003e oDocument4)), LinkBag(Array(oRid4))),\r\n    Row(LinkList(Array(oDocument5)), LinkSet(Array(oDocument5)), LinkMap(Map(\"1\" -\u003e oDocument5)), LinkBag(Array(oRid5)))\r\n  )\r\n```\r\n\r\n### OrientDB Graphs:\r\n\r\n#### Create Vertex api:\r\n\r\n```\r\nimport org.apache.spark.sql.SQLContext\r\n\r\nval sqlContext = new SQLContext(sc)\r\nsqlContext.createDataFrame(sc.parallelize(Array(1, 2, 3, 4, 5)),\r\n      StructType(Seq(StructField(\"id\", IntegerType)))\r\n      .write\r\n      .format(\"org.apache.spark.orientdb.graphs\")\r\n      .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n      .option(\"user\", ORIENTDB_USER)\r\n      .option(\"password\", ORIENTDB_PASSWORD)\r\n      .option(\"vertextype\", test_vertex_type2)\r\n      .mode(SaveMode.Overwrite)\r\n      .save()\r\n```\r\n\r\n#### Create Edge api:\r\n\r\n```\r\nimport org.apache.spark.sql.SQLContext\r\n\r\nval sqlContext = new SQLContext(sc)\r\nsqlContext.createDataFrame(\r\n      sc.parallelize(Seq(\r\n            Row(1, 2, \"friends\"),\r\n            Row(2, 3, \"enemy\"),\r\n            Row(3, 4, \"friends\"),\r\n            Row(4, 1, \"enemy\")\r\n      )),\r\n      StructType(Seq(\r\n            StructField(\"src\", IntegerType),\r\n            StructField(\"dst\", IntegerType),\r\n            StructField(\"relationship\", StringType)\r\n          )))\r\n      .write\r\n      .format(\"org.apache.spark.orientdb.graphs\")\r\n      .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n      .option(\"user\", ORIENTDB_USER)\r\n      .option(\"password\", ORIENTDB_PASSWORD)\r\n      .option(\"vertextype\", test_vertex_type2)\r\n      .option(\"edgetype\", test_edge_type2)\r\n      .mode(SaveMode.Overwrite)\r\n      .save()\r\n```\r\n\r\n#### Read Vertex api:\r\n\r\n```\r\nimport org.apache.spark.sql.SQLContext\r\n\r\nval sqlContext = new SQLContext(sc)\r\nval loadedDf = sqlContext.read\r\n                    .format(\"org.apache.spark.orientdb.graphs\")\r\n                    .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n                    .option(\"user\", ORIENTDB_USER)\r\n                    .option(\"password\", ORIENTDB_PASSWORD)\r\n                    .option(\"vertextype\", test_vertex_type2)\r\n                    .load()\r\n```\r\n\r\n#### Read edge api:\r\n\r\n```\r\nimport org.apache.spark.sql.SQLContext\r\n\r\nval sqlContext = new SQLContext(sc)\r\nval loadedDf = sqlContext.read\r\n                   .format(\"org.apache.spark.orientdb.graphs\")\r\n                   .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n                   .option(\"user\", ORIENTDB_USER)\r\n                   .option(\"password\", ORIENTDB_PASSWORD)\r\n                   .option(\"edgetype\", test_edge_type2)\r\n                   .load()\r\n```\r\n\r\n#### Query using OrientDB Graph SQL:\r\n\r\n```\r\nimport org.apache.spark.sql.SQLContext\r\n\r\nval sqlContext = new SQLContext(sc)\r\nval loadedVerticesDf = sqlContext.read\r\n                 .format(\"org.apache.spark.orientdb.graphs\")\r\n                 .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n                 .option(\"user\", ORIENTDB_USER)\r\n                 .option(\"password\", ORIENTDB_PASSWORD)\r\n                 .option(\"vertextype\", test_vertex_type2)\r\n                 .option(\"query\", s\"select * from $test_vertex_type2 where teststring = 'asdf'\")\r\n                 .load()\r\n                 \r\nval loadedEdgesDf = sqlContext.read\r\n                 .format(\"org.apache.spark.orientdb.graphs\")\r\n                 .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n                 .option(\"user\", ORIENTDB_USER)\r\n                 .option(\"password\", ORIENTDB_PASSWORD)\r\n                 .option(\"edgetype\", test_edge_type2)\r\n                 .option(\"query\", s\"select * from $test_edge_type2 where relationship = 'friends'\")\r\n                 .load()                 \r\n```\r\n\r\n#### Support for embedded types \u0026 link types( Since Spark 2.1 release)\r\n\r\nThe Spark UDTs are available for OrientDB Graph datasource as well.\r\nUsage is very similar to the ones documented for OrientDB Document datasource.\r\nExamples can be found in Integration tests.\r\n\r\n### Integration with GraphFrames\r\n\r\n```\r\nimport org.apache.spark.sql.SQLContext\r\n\r\nval sqlContext = new SQLContext(sc)\r\nval loadedVerticesDf = sqlContext.read\r\n                 .format(\"org.apache.spark.orientdb.graphs\")\r\n                 .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n                 .option(\"user\", ORIENTDB_USER)\r\n                 .option(\"password\", ORIENTDB_PASSWORD)\r\n                 .option(\"vertextype\", test_vertex_type2)\r\n                 .option(\"query\", s\"select * from $test_vertex_type2 where teststring = 'asdf'\")\r\n                 .load()\r\n                 \r\nval loadedEdgesDf = sqlContext.read\r\n                 .format(\"org.apache.spark.orientdb.graphs\")\r\n                 .option(\"dburl\", ORIENTDB_CONNECTION_URL)\r\n                 .option(\"user\", ORIENTDB_USER)\r\n                 .option(\"password\", ORIENTDB_PASSWORD)\r\n                 .option(\"edgetype\", test_edge_type2)\r\n                 .option(\"query\", s\"select * from $test_edge_type2 where relationship = 'friends'\")\r\n                 .load()\r\n                 \r\nval g = GraphFrame(loadedVerticesDf, loadedEdgesDf)                 \r\n```\r\n\r\nA full example can be found in directory `src/main/examples`","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forientechnologies%2Fspark-orientdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Forientechnologies%2Fspark-orientdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forientechnologies%2Fspark-orientdb/lists"}