Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sarmbruster/neo4j-csv-firehose


https://github.com/sarmbruster/neo4j-csv-firehose

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

        

# neo4j-csv-firehose

image::https://travis-ci.org/sarmbruster/neo4j-csv-firehose.svg?branch=master[alt="Travis CI Status", link="https://travis-ci.org/sarmbruster/neo4j-csv-firehose"]

neo4j-csv-firehose enables Neo4j's `LOAD CSV` Cypher command to load other from other datasources as well. It provides on-the-fly conversion of the other datasource to csv - and can therefore act as input for `LOAD CSV`.

## Installation

This project uses [gradle](http://www.gradle.org) as build system. Since the gradle wrapper in included, there is no need to download anything beforehand.

There are three distinct ways to install neo4j-csv-firehose:

* directly inside Neo4j server as unmanaged extension
* as external server with undertow
* as URLStreamHandler hooking directly into to JVM

### Installation within a Neo4j server instance (unmanaged extension)

To build the project just type:

./gradlew jar

Copy (or symlink) the resulting file `./build/libs/neo4j-csv-firehose-0.1-SNAPSHOT.jar` to Neo4j's `plugins` folder.

Copy any JDBC driver jar files for your relational databases into the `plugins` folder as well. E.g. for mysql use http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.36/mysql-connector-java-5.1.36.jar.

Change configuration in `$NEO4J/conf/neo4j-server.properties`:

org.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.extension.firehose=/csv

If your Neo4j server uses authentication (which it does by default), amend to `neo4j.properties`:

extension.csv.http.username=neo4j
extension.csv.http.password=

After a restart using `$NEO4J_HOME/bin/neo4j restart` firehose is active.

### Installation as a separate server

Under the hoods a undertow server with jaxrs support is start. Before running the server be sure to add your jdbc drivers to the runtime dependencies of `build.gradle`. To start up the server:

./gradlew run

This starts up the server on localhost:8080.

### Installation as URLStreamHandler

Type

./gradlew jar

to build and copy (or symlink) the resulting file `./build/libs/neo4j-csv-firehose-0.1-SNAPSHOT.jar` to Neo4j's `plugins` folder.

Copy any JDBC driver jar files for your relational databases into the `plugins` folder as well. E.g. for mysql use http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.36/mysql-connector-java-5.1.36.jar.

In `neo4j-wrapper.conf` configure URL handlers:

wrapper.java.additional=-Djava.protocol.handler.pkgs=org.neo4j.extension.firehose

and restart Neo4j.

## Usage

A new REST endpoint for rendering JDBC datasource as csv is added:

http://localhost:7474/csv/jdbc?parameter1=value1&parameter2=value2....

For separate server deployment the URL is (hint `csv` is omitted):

http://localhost:8080/jdbc?parameter1=value1&parameter2=value2....

For URLStreamHandler style deployment the URLs look like

jdbc:mysql://localhost/mydb?parameter=value&...

The following query parameters are available - remember to use proper url encoding for their values:

|===
| parameter name | description | mandatory

| url | the jdbc connection string, syntax depends on your jdbc driver, e.g. `jdbc:mysql://localhost/mydb` | yes, not used for URLStreamHandler variant
| user | database login for jdbc connection | no
| password | database password for jdbc connection | no
| table | name of the database table to convert to csv. All columns and rows are used | either `table` or `sql`
| sql | a sql statement to be rendered to csv | either `table` or `sql`
|===

This endpoint can then be used to run LOAD CSV in Cypher. We assume having a mysql database on localhost, dbname is `mydb`, credentials are `mydb:123`. We want to import from table `person` which has 2 columns: `lastName` and `firstName`. Using the following Cypher statement hits directly the mysql database and creates a node for each person. Here's the version for running via unmanaged extensions:

neo4j-sh (?)$ load csv with headers from "http://localhost:7474/csv/jdbc?url=jdbc%3Amysql%3A%2F%2Flocalhost%2Fmydb&table=person&user=mydb&password=123" as line create (:Person {firstname: line.firstName, lastname: line.lastName});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2142
Properties set: 4284
Labels added: 2142
1422 ms

When running as separate server use:

neo4j-sh (?)$ load csv with headers from "http://localhost:8080/jdbc?url=jdbc%3Amysql%3A%2F%2Flocalhost%2Fmydb&table=person&user=mydb&password=123" as line create (:Person {firstname: line.firstName, lastname: line.lastName});

Running as URLStreamHandler:

neo4j-sh (?)$ load csv with headers from "jdbc:mysql://localhost/mydb?table=person&user=mydb&password=123" as line create (:Person {firstname: line.firstName, lastname: line.lastName});

## MetaData Endpoint

There is an endpoint which returns the metadata from the database schema:

----
curl -i 'http://localhost:8080/jdbc/meta?url=jdbc:mysql://localhost/northwind?user=root&database=northwind'
----

It uses SchemaCrawler to access the information and returns it as JSON in this format:

----
{
"nodes": [
{
"filename": "Categories",
"labels": [
"Categories"
],
"properties": [
{
"headerKey": "CategoryID",
"dataType": "string",
"neoKey": "CategoryID",
"primaryKey": true,
"index": true
},
....
{
"headerKey": "Picture",
"dataType": "string",
"neoKey": "Picture",
"primaryKey": false,
"index": false
}
]
},
....
],
"relationships": [
{
"filename": "CustomerCustomerDemo",
"name": "CUSTOMER_TYPE_ID",
"from": {
"fileKey": "CustomerTypeID",
"neoKey": "CustomerTypeID",
"filename": "CustomerCustomerDemo",
"label": "CustomerCustomerDemo"
},
"to": {
"fileKey": "CustomerTypeID",
"neoKey": "CustomerTypeID",
"filename": "CustomerDemographics",
"label": "CustomerDemographics"
}
},
....
]
}
----