Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/coreyauger/agent-smith

Neo4J/Titan dbpedia loader.
https://github.com/coreyauger/agent-smith

connector neo4j scala titan titan-loader

Last synced: about 1 month ago
JSON representation

Neo4J/Titan dbpedia loader.

Host: GitHub
URL: https://github.com/coreyauger/agent-smith
Owner: coreyauger
License: mit
Created: 2014-10-28T23:06:37.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2016-02-09T05:43:24.000Z (almost 9 years ago)
Last Synced: 2023-06-03T23:55:15.879Z (over 1 year ago)
Topics: connector, neo4j, scala, titan, titan-loader
Language: Scala
Homepage:
Size: 15.4 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        agent-smith

===========

### What is it?

Scala Spark dbpedia loader for both Neo4J and Titan.  

### Gaols

* Raad in csv dbpedia dump and extract features from the file

* Convert features into case class that we can use to load data into other formats.

* Load data into a Neo4J database

* Load data into a Titan database.

### Overview

For my purposes it was sufficiant to convert the data into the following structure

```scala

case class DbPediaThing(uri: String, label:String, wikiPageID:String, wikiPageRevisionID:String, comment:String, properties: Map[String,SchemaProperty]) extends DbPediaBaseThing

case class SchemaProperty(propertyName:String, propertyNameLabel:String,  propertyType: String, propertyTypeLabel: String, propertyValue: PropertyValue, propertyValueLabel: Option[PropertyValue]) extends com.nxtwv.graphs.common.Thing.Property

```

From here we can load the data using the Neo4J loader or the Titan loader.

### Neo4j

Note the ugly Await :(

Simply doing your work async will overflow the connector and lead to errors and failures to commit.  There are obviously better ways to do this then Await.. Which I may add in future

```scala

val cypher = things.map(a => DbpediaCypherLoader.toCypher(a).mkString("\n"))

    cypher.zipWithIndex().map{ case (s,i) => (i % factor,s) }.groupByKey.map{

      case (k, xs) =>

        println(s"working batch: $k of $factor")

        Await.ready(DbpediaCypherLoader.batchCypher(xs.toList), 30.minutes)

        "."

    }

```

### Titan

Again we use a blocking connection that makes sure we don't hammer the connector.  This needs to be fixed :(

```scala

val gremlin = things.map(a => DbpediaTitanLoader.toGremlin(a).mkString("\n") )

    gremlin.zipWithIndex.foreach{ case (gg, ind) =>

      println((ind.toDouble/count.toDouble)*100.0)

      gg.split("\n\n").foreach { g =>

        val rs = DbpediaTitanLoader.execGremlin(g)

        print(".")

      }

    }

```

### Product

![http://www.coreyauger.com/images/agent-smith.png](http://www.coreyauger.com/images/agent-smith.png)