Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jexp/neo4j-shell-tools
A bunch of import/export tools for the neo4j-shell
https://github.com/jexp/neo4j-shell-tools
Last synced: 3 months ago
JSON representation
A bunch of import/export tools for the neo4j-shell
- Host: GitHub
- URL: https://github.com/jexp/neo4j-shell-tools
- Owner: jexp
- Archived: true
- Created: 2013-07-05T07:54:49.000Z (over 11 years ago)
- Default Branch: 3.0
- Last Pushed: 2019-01-04T22:44:26.000Z (about 6 years ago)
- Last Synced: 2024-08-02T07:11:05.253Z (6 months ago)
- Language: Java
- Size: 268 KB
- Stars: 288
- Watchers: 29
- Forks: 55
- Open Issues: 65
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
- awesome-neo4j - neo4j-shell-tools - A bunch of import/export tools for the neo4j-shell. (REST API / Other)
README
# Import data into your neo4j database from the neo4j-shell command
`neo4j-shell-tools` adds a number of commands to [neo4j-shell](http://neo4j.com/docs/stable/shell.html) which easily allow import and export data into running [Neo4j](http://www.neo4j.com/) database.
### Installation
Download [neo4j-shell-tools_3.0.1.zip](http://dist.neo4j.org/jexp/shell/neo4j-shell-tools_3.0.1.zip) and extract it in your
Neo4j Server's lib directory e.g.```
cd /path/to/neo4j-community-3.0.1
curl http://dist.neo4j.org/jexp/shell/neo4j-shell-tools_3.0.1.zip -o neo4j-shell-tools.zip
unzip neo4j-shell-tools.zip -d lib
```#### Easier installation on Unix
The following script does the above installation automatically, and sets the `$NEO4J_HOME` path; works on Unix:
```
NEO4J_HOME=$(neo4j-shell -c 'dbinfo -g Kernel StoreDirectory' | grep -oE '\/.*lib.*?\/') && curl -Lk -o neo4j-shell-tools.zip http://dist.neo4j.org/jexp/shell/neo4j-shell-tools_3.0.1.zip && unzip neo4j-shell-tools.zip -d ${NEO4J_HOME}lib
```### Before you start
Restart neo4j and then launch the neo4j-shell:
```
cd /path/to/neo4j-community-3.0.1
./bin/neo4j restart
./bin/neo4j-shell
```That assumes a default neo4j instance running on port 7474. You can call `./bin/neo4j-shell --help` to get a list of other ways to connect to a neo4j instance.
### Importing workflow
Before importing data, make sure that you create the necessary [indexes or constraints](http://neo4j.com/docs/stable/cypher-schema.html) for your data, so that you can quickly access the starting points for your graph patterns afterwards.
For instance:```
create constraint on (p:Person) assert p.email is unique;
create index on :Person(age);
```### Import
Then choose a suitable import command, depending on how your data is structured.
* If your data is formatted as CSV and you want to use [cypher](http://neo4j.com/docs/stable/cypher-query-lang.html) statements for importing it, use the [Cypher Import](#cypher-import) command.
* If your data is in [GraphML](http://graphml.graphdrawing.org/) format, use the [GraphML Import](#graphml-import) command.
* If your data is in [Geoff](http://nigelsmall.com/geoff) format, use the [Geoff Import](#geoff-import) command.
* If your data is in Binary format, use the [Binary Import](#binary-import) command.#### Cypher Import
Populate your database with [write clauses](http://neo4j.com/docs/stable/query-write.html) in the [cypher](http://neo4j.com/docs/stable/cypher-query-lang.html) query language.
```
$ import-cypher [-i in.csv] [-o out.csv] [-d ,] [-q] [-b 10000] create (n:#{label} {name: {name}, age: {age}}) return id(n) as id, n.name as name
```- -i file.csv: tab or comma separated input data file (or URL), with header. Header names are used as param-names. The cypher statement will be executed one per row
- -o file.csv: tab or comma separated output data file, all cypher result rows will be written to file, column labels become headers
- -q: input/output file with quotes
- -d delim: delim used to separate files (e.g. `-d " ", -d \t -d ,` )
- -b size: batch size for intermediate commitsExample input file: [in.csv](examples/in.csv)
```
name age
Michael 38
Selina 15
Rana 8
Selma 5
```Usage:
```
$ import-cypher -d"\t" -i in.csv -o out.csv create (n {name: {name}, age: {age}}) return id(n) as id, n.name as name
Query: create (n {name: {name}, age: {age}}) return id(n) as id, n.name as name; infile in.csv delim ' ' quoted false outfile out.csv batch-size 20000
Import statement execution created 4 rows of output.
```Output file: out.csv
```
id name
1 Michael
2 Selina
3 Rana
4 Selma
```Optionally support types for column headers: just use `prop:type` as header in your csv, e.g. `name:string,age:int`
Supported Types
* `int`
* `long`
* `double`
* `float`
* `boolean`
* `string`
* `byte`
* and arrays thereof with `_array`, e.g. `int_array`#### Geoff Import
Populate your database with [geoff](http://nigelsmall.com/geoff) - a declarative notation for representing graph data in a human-readable format.
```
$ import-geoff [-i in.geoff]
```- -i in.geoff: newline separated geoff rule file (or URL)
Example input file: in.geoff
```
(A) {"name": "Alice"}
(B) {"name": "Bob"}
(A)-[r:KNOWS]->(B)
```Usage:
```
$ import-geoff -i in.geoff
Geoff import of in.geoff created 3 entities.
```#### Binary Import
Populate your database from a binary dump of a neo4j database. Internally, Kyro is used to serialize the graph.
```
$ import-binary [-i in.bin] [-r REL_TYPE] [-b 20000] [-c]
```- -i in.bin: binary file from previous database export
- -r REL_TYPE default relationship-type for relationships without a label attribute
- -b batch-size batch-commit size
- -c uses a cache that spills to disk for very large importsUsage:
```
$ import-binary -b 10000 -i /tmp/export.bin -c
```#### GraphML Import
Populate your database from [GraphML](http://graphml.graphdrawing.org/) files. GraphML is an XML file format used to describe graphs.
```
$ import-graphml [-i in.xml] [-r REL_TYPE] [-b 20000] [-c] [-t]
```- -i in.xml: graphml file (or URL)
- -r REL_TYPE default relationship-type for relationships without a label attribute
- -t also import node labels, see the export format
- -b batch-size batch-commit size
- -c uses a cache that spills to disk for very large importsExample input file: in.xml
```xml
yellow
green
1.0
```
Usage:
```
$ import-graphml -i in.xml
GraphML-Import file in.xml rel-type RELATED_TO batch-size 40000 use disk-cache false
0. 100%: nodes = 1 rels = 0 properties = 0 time 11 ms
Finished: nodes = 2 rels = 1 properties = 2 total time 16 ms
GraphML import created 3 entities.
```### Export
Choose a suitable export command, depending on your requirement for the exported data
* To export your data as Cypher statements, use, use the [Cypher Export](#cypher-export) command.
* To export your data as CSV, use the [Cypher Import](#cypher-import) command with the `-o file` option which will output the results of your queries into a CSV file.
* To export your data as [GraphML](http://graphml.graphdrawing.org/), use the [GraphML Export](#graphml-export) command.
* To export your data as a binary file, use the [Export Binary](#binary-export) command.#### Cypher Export
If you have a populated database, you can dump it completely or partially (by using a Cypher statement) to a script-file which uses Cypher and neo4j-shell commands to control the import.
The export order is:
1. Nodes batched in transactions
1. Schema information like indexes and constraints
1. Relationships by looking up nodes by primary keys and connecting them, also batched in transactions```
$ export-cypher [-o export.cypher] [-r] [-b 10000] [MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN *]
```- -o file.cypher: output file for dump, if left blank the output is sent to the neo4j-shell output and has to be redirected
- -b size: batch size for commit batches for nodes and relationships
- -r add all nodes of selected relationships, also add relationships between selected nodes**Note:**
For nodes that don't have a unique constraint, a temporary label (`UNIQUE IMPORT LABEL`) and property (`UNIQUE IMPORT ID`) are added to allow later lookup in the relationship-generation section.
A constraint for `UNIQUE IMPORT LABEL`(`UNIQUE IMPORT ID`) is added too. All, the constraint, the label and the property are removed at the end of the import script again.Example dump file:
```
$ export-cypher -r -o test.cypher match (n)-[r]->() return n,r// create nodes
begin
CREATE (:`UNIQUE IMPORT LABEL` {`UNIQUE IMPORT ID`:0});
CREATE (:`User` {`age`:43, `name`:"User1"});
commit// add schema
begin
CREATE INDEX ON :`User`(`age`);
CREATE CONSTRAINT ON (node:`User`) ASSERT node.`name` IS UNIQUE;
CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT node.`UNIQUE IMPORT ID` IS UNIQUE;
commit
schema await// create relationships
begin
MATCH (n1:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`:0}), (n2:`User`{`name`:"User1"}) CREATE (n1)-[:`KNOWS` {`since`:2011}]->(n2);
commit// clean up temporary import keys
begin
MATCH (n:`UNIQUE IMPORT LABEL`) WITH n LIMIT 1000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
commit
begin
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT node.`UNIQUE IMPORT ID` IS UNIQUE;
commit
```#### Binary Export
Export your Neo4j graph database to a binary file.
```
$ export-binary -o out.bin
```- -o out.bin
Usage:
```
$ export-binary -o out.bin
```#### GraphML Export
Export your Neo4j graph database to [GraphML](http://graphml.graphdrawing.org/) files. GraphML is an XML file format used to describe graphs. Can be used to import and visualize your graph in [Gephi](http://gephi.org).
```
$ export-graphml [-t] [-r] [-o out.graphml] [match (n:Foo)-[r]->() return n,r]
```- -t write types, do a first pass over the data to determine property-types and write them to the graphml header
- -r add all nodes of selected relationships
- -o out.graphml: graphml file to write to
- optional cypher query to select a subgraph to exportExample output file: out.graphml
```xml
:FOOJohn Doe
:FOOJane Doe
KNOWS0
KNOWS1```
Usage:
```
$ export-graphml -o out.graphml
```### Prerequisites
An up and running neo4j database which you can [download from here](http://www.neo4j.org/download).
### Other Technical Details
#### Libraries used
* Cypher Import uses [opencsv-2.3.jar](http://opencsv.sourceforge.net/) for parsing CSV files.
* GraphML Import uses [mapdb-0.9.3.jar](http://www.mapdb.org/) as part of the cache (-c) flag for very large imports
* Geoff Import uses [neo4j-geoff-1.7-SNAPSHOT.jar](http://nigelsmall.com/geoff)#### More on GraphML
The 'import-graphml' command supports attributes, supports only single pass parsing, optimization for `parse.nodeids="canonical"`
An import of [@chrisdiehl](https://twitter.com/chrisdiehl)'s [Enron Dataset](http://www.infochimps.com/datasets/enron-email-data-with-manager-subordinate-relationship-metadata) took 5 minutes on a MBA:
`Finished: nodes = 343266 rels = 1903201 properties = 8888993 total time 313491 ms`
### Manual Build & Install
```
git clone [email protected]:jexp/neo4j-shell-tools.git
cd neo4j-shell-tools
mvn clean package dependency:copy-dependencies
```Then copy the jars that get generated into the neo4j lib directory:
```
cp target/import-tools-*.jar target/dependency/opencsv-*.jar target/dependency/geoff-*.jar target/dependency/mapdb-*.jar target/dependency/kryo-*.jar target/dependency/reflectasm-*.jar target/dependency/minlog-*.jar target/dependency/objenesis-*.jar /path/to/neo4j-community/lib
```or make those two files available on your neo4j-shell classpath
#### Setup auto indexing
**NOTE: Using the "old" auto-index is discouraged, except if you know what you're doing. Use the label-based schema indexes instead.**
The auto index command is used to automatically create indexes on certain properties defined on nodes or relationships. This is in addition to the properties defined in 'conf/neo4j.properties'.
```
auto-index [-t Node|Relationship] [-r] name age title
```- -r stops indexing those properties
Usage:
```
$ auto-index name age title
Enabling auto-indexing of Node properties: [name, age, title]
```