{"id":16853600,"url":"https://github.com/sim51/neo4j-dih","last_synced_at":"2025-03-18T10:24:58.121Z","repository":{"id":142401804,"uuid":"39223758","full_name":"sim51/neo4j-dih","owner":"sim51","description":" Neo4j Data Import Handler","archived":false,"fork":false,"pushed_at":"2015-10-20T11:38:57.000Z","size":1372,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-24T16:44:32.966Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sim51.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-07-16T22:36:35.000Z","updated_at":"2022-01-21T18:07:53.000Z","dependencies_parsed_at":"2023-03-13T14:30:52.040Z","dependency_job_id":null,"html_url":"https://github.com/sim51/neo4j-dih","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sim51%2Fneo4j-dih","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sim51%2Fneo4j-dih/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sim51%2Fneo4j-dih/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sim51%2Fneo4j-dih/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sim51","download_url":"https://codeload.github.com/sim51/neo4j-dih/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244199862,"owners_count":20414740,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T13:52:21.865Z","updated_at":"2025-03-18T10:24:58.097Z","avatar_url":"https://github.com/sim51.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"= Neo4j Data Import Handler\nBenoit Simard \u003cgithub@bsimard.com\u003e\nV1.0\n:experimental:\n:toc:\n:toc-placement: preamble\n:toc-title: pass:[\u003cb\u003eTable of Contents\u003c/b\u003e]\n:outfilesuffix-old: {outfilesuffix}\nifdef::env-github[:outfilesuffix: .adoc]\nifndef::env-github[]\n:idprefix:\n:idseparator: -\nendif::[]\n\nSynchronise Neo4j with an external datasource\n\nProject site : http://sim51.github.io/neo4j-dih/\n\n== Description\n\nThis project is inspired from https://wiki.apache.org/solr/DataImportHandler[SolR's DIH].\nWith a simple XML file that describe an import mechanism, you can synchronise your neo4j database with an external datasource Like a SQL database, CSV/XML/JSON file ...\n\n== How to install\n\nThere is only three steps to install this extension :\n\n * unzip all the content of the zip into `NEO4J_HOME/plugins` folder\n * Edit file `NEO4J_HOME/conf/neo4j-server.properties` and to add the following line :\n\n[source,properties]\n----\norg.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.dih.DataImportHandlerExtension=/dih\n----\n\n * Restart your Neo4j server\n\n\nNow if you open your browser at this url http://localhost:7474/dih/api/ping[], you should received a `pong` message.\n\nIMPORTANT: DIH plugin doesn't have any dependency on datasource driver.\nSo for example, if you want to connect to a mysql database, please add mysql driver jar in this directory `NEO4J_HOME/plugins`.\n\n== How to write an import\n\nAn import is just an XML file that must be placed into this directory : `NEO4J_HOME/conf/dih`.\nYou can see some examples of import file https://github.com/sim51/neo4j-dih/src/test/resources/conf/dih[here]\n\nThis XML file must be compliant with the following schema :\n\n[source,xml]\n----\ninclude::src/main/resources/schema/dataConfig.xsd[]\n----\n\nOn the next section, you will find the description of every elements.\n\nMoreover, you can externalize some variables of your import script into a property file.\nIt must be in the same directory, and with the same name (ie. change only the extension .xml by .properties).\nFor example, if you have a key `db.user` into property file, you can use it like this in the xml import file : `${db.user}\n\nIn fact this file is automatically created (if it's not there) when you run the first successful import.\nIt's because we save into it the date \u0026 time of the last successful import execution  into the property `last_index_time`.\nThis is useful for *delta-import* to retrieve data from a datasource that have an updated more recent than the last execution time.\n\n=== Definition of xml elements\n\n==== Element : dataConfig\n\nThis is the root XML element of the file.\n\n*list of attributes :*\n\n * *clean (optional)* : A cypher query that is uses for the clean option.\n\n==== Element : dataSource\n\nThis element define a datasource.\nA datasource is an external resources, on which you can retrieve some data.\n\nHere is the list of available datasource with theirs attributes :\n\n * *CSVDatasource* : Define a CSV file  (This is just for testing purpose, it's better to use the `LOAD CSV` ferature).\n  ** type (mandatory): Must be `CSVDataSource `\n  ** name (mandatory): Name of the datasource. Must be unique in the file.\n  ** url (mandatory): Url of the file. It be local (file://) or distant (http://), it's just a java url.\n  ** encoding : Encoding of the CSV file (Default: \"UTF-8\").\n  ** separator : Character that is use as separator (Default: \";\").\n  ** timeout : Timeout (Default: 10000)\n  ** withHeaders : true/false, if CSV file has an header row (Default: false)\n\n * *XMLDatasource* : Define a XML file. You will be able to make *xpath* query on it.\n  ** type (mandatory): Must be `XMLDataSource `\n  ** name (mandatory): Name of the datasource. Must be unique in the file.\n  ** url (mandatory): Url of the file. It be local (file://) or distant (http://), it's just a java url.\n  ** encoding : Encoding of the CSV file (Default: \"UTF-8\").\n  ** timeout : Timeout (Default: 10000)\n\n * *JDBCDatasource* : Define a JDBC connection . You will be able to *SQL* query on it.\n  ** type (mandatory): Must be `JDBCDataSource `\n  ** name (mandatory): Name of the datasource. Must be unique in the file.\n  ** url (mandatory): JDBC connection Url.\n  ** user : User that is use to connect to the database\n  ** password : Password of user\n\n * *JSONDatasource* :  Define a JSON file. You will be able to make *JSON PATH* query (seee http://goessner.net/articles/JsonPath/).\n  ** type (mandatory): Must be `JSONDataSource `\n  ** name (mandatory): Name of the datasource. Must be unique in the file.\n  ** url (mandatory): Url of the file. It be local (file://) or distant (http://), it's just a java url.\n  ** encoding : Encoding of the CSV file (Default: \"UTF-8\").\n  ** timeout : Timeout (Default: 10000)\n\nNOTE: See paragraph 'How to write a new datasource type'\n\n==== Element : graph\n\nThis element describe an import process. You can define several sibling graph element into your file, this permit to create several independent imports phases.\n\n*list of attributes :*\n\n * *periodicCommit (optional)* : Define a periodic commit number. It correspond to the number of iteration on cypher part. By default, all is done in the same transaction.\n\n==== Element : entity\n\nThis element permit to define a list of object, retrieve from a datasource: it's a resultset.\nThe import process will iterate on on it.\n\nThere is two mandatory attributes :\n\n * *datasource* : Name of the datasource to use for this entity\n * *name* : Name for this entity that will permit you can retrieve its value/field.\n\nOthers characteristics will depend on its datasource :\n\n* *CSVDatasource* :\n ** There is no additional attribute for this datasource, because we can't do any `query` operation on a CSV file.\n ** Type of this entity is just an array of String (String[]). So in your cypher script you can access to a value like this `${entity[0]}` or `${entity[\"columnName\"]}` with header.\n\n* *XMLDatasource* :\n ** xpath (mandatory) : An xpath query, for example : `xpath=\"/users/user[@id='${people.ID}']\"`. As you can see here, we use a previous entity value `${people.ID}`. Yes ! you can link your entities.\n ** Type of this entity is  `org.neo4j.dih.datasource.file.xml.XMLResult`. So in your cypher script you can access to a value like this `${entity.xpath(\"description\")}`\n\n* *JDBCDatasource* :\n ** sql (mandatory) : the SQL query\n ** ** Type of this entity is  `Map\u003cString, Object\u003e`. So in your cypher script you can access to a value like this `${entity.myColName}`\n\n* *JSONDatasource* :\n ** xpath (mandatory) : An JSONPath query (http://goessner.net/articles/JsonPath/), for example : `xpath=\"$..book[*]\"`.\n ** Type of this entity is  `Map\u003cString, Object\u003e`. So in your cypher script you can access to a value like this `${entity[\"key\"]}`\n\n==== Element : cypher\n\nIt's inside this element where you define your cypher template script.\nIn it, you can use all parent entity by their name like you have seen on the above paragraph.\n\nNOTE: This cypher script is parse with velocity, so you can use velocity power !\n\nIMPORTANT: If in your cypher script there is some variables (as on 99% of times) and `periodicCommit` is not equal to 1,\nyou should suffix all your variable with `$i` like this : MATCH (n$i:MyNode))`.\nOtherwise on the generated cypher script, there will be the same variable defined several times.\n\n== How to execute an import\n\nTo perform an import, you have to call REST endpoint :\n\n * url : `http://localhost:7474/dih/api/import`\n * method :  `POST`\n * form-param :\n ** name (mandatory) : the name of your import file (ex: example_csv.xml)\n ** clean (optional) : true / **false (default)**\n ** debug (optional) : true / **false (default)**\n\nNOTE: For each of your import files, you can execute them with two options : clean \u0026 debug.\n\nSo it's easy to schedule an import job (with delta-import it cool) with a cron and a curl command :\n\n[source,bash]\n----\ncurl http://localhost:7474/dih/api/import -d \"name=example_csv.xml\u0026clean=true\u0026debug=false\"\n----\n\n=== Clean\n\nThis execute a clean cypher query before starting the import.\nBy default it's a `MATCH (n) OPTIONAL MATCH (n)-[r]-(m) DELETE n,r,m;` query, ie a reset of the database.\nBut you can specify your own query on the `dataConfig` xml entity like in this example :\n\n[source, xml]\n----\ninclude::src/test/resources/conf/dih/csv/example_csv.xml[lines=2]\n----\n\n=== Debug\n\nDebug just make a dry run of your import, this won't modify your database (even with the clean option).\nMoreover, this adding some debug output.\n\n=== Administration interface\n\nOpen your browser at this location `http://localhost:7474/dih/index.html` to see the administration interface.\n\nimage::administration-interface.png[]\n\nThis interface permit to :\n\n * see all configuration file\n * to execute an import\n\n== How to write a new datasource type\n\nIf you want to synchronise your database with an other datasource than CSV, XML, JSON or SQL, for example mongodb, yo have to write a new datasource type.\n\nBut this is easy, because there is 3 steps to follow :\n\n * Modify XML schema to add some attribute on `dataSource` and/or `entity` elements\n * Create a java class to know how to connect \u0026 query this new type of datasource\n * Create a java class to know how to iterate on its result\n\n=== XML Schema\n\nBecause you create a new datasource, it's possible that you need to define an other attribute on the *dataSource* entity.\nTo do it, follow those instructions :\n\n * Edit the file `src/main/resources/schema/dataConfig.xsd`\n * Search the XSD description for *DataSourceType* (ie. line :  `\u003cxs:complexType name=\"DataSourceType\"\u003e`)\n * Add your new attribute (for example :`myAttribute`) into the element like this : `\u003cxs:attribute name=\"myAttribute\" type=\"xs:string\"/\u003e` (replace `xs:string` by the type of your attribute)\n * Save the file and compile the project\n\nYou can do the same for *entity* element :\n\n * Edit the file `src/main/resources/schema/dataConfig.xsd`\n * Search the XSD description for *EntityType* (ie. line :  `\u003cxs:complexType name=\"EntityType\"\u003e`)\n * Add your new attribute (for example :`myAttribute`) into the element like this : `\u003cxs:attribute name=\"myAttribute\" type=\"xs:string\"/\u003e` (replace `xs:string` by the type of your attribute)\n * Save the file and compile the project\n\nNOTE: Project use *JAXB* to parse XML files.\n\nNOTE: All Java Bean that match XML element are generated at compilation time with **jaxb2 plugin**.\n\n=== Java class : how to connect \u0026 query\n\nThere is three points to respect :\n\n * you must extend `org.neo4j.dih.datasource.AbstractDataSource` class\n * your class must be into the package `org.neo4j.dih.datasource`\n * name of your class is what you will put into the *type* attribute of *datasource* element\n\nThis is the definition of `AbstractDataSource` :\n\n[source,java]\n----\ninclude::src/main/java/org/neo4j/dih/datasource/AbstractDataSource.java[]\n----\n\nYou can take example on those following datasource :\n\n * src/main/java/org/neo4j/dih/datasource/jdbc/JDBCDataSource.java\n * src/main/java/org/neo4j/dih/datasource/file/xml/XMLDataSource.java\n\n=== Java class : result format\n\nThere is one point to respect : you must extend `org.neo4j.dih.datasource.AbstractResultList` class\n\nThis is the definition of `AbstractResultList` :\n\n[source,java]\n----\ninclude::src/main/java/org/neo4j/dih/datasource/AbstractResultList.java[]\n----\n\nSo as you can see, it's just an auto-closeable iterator ! You can take example on those files:\n\n * src/main/java/org/neo4j/dih/datasource/jdbc/JDBCResultList.java\n * src/main/java/org/neo4j/dih/datasource/file/xml/XMLResultList.java\n\n== Project todo list\n\n* Adding some stat about number of query by datasource \u0026 number of iteration\n* Debug mode : Adding some debug value to the return\n  -\u003e returning cypher generated script ?\n  -\u003e returning datasource val \u0026 query ?\n* Make update sync or async ?\n  -\u003e Sync permit to keep ACID but how to do a rollback ? There no 2-way-commit, so for ACID ..\n  -\u003e Sync keep an open web thread till the end of the import (it's what is implement now, but if you call 100 time the import endpoint, this generate 100 imports)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsim51%2Fneo4j-dih","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsim51%2Fneo4j-dih","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsim51%2Fneo4j-dih/lists"}