{"id":13529604,"url":"https://github.com/graphaware/neo4j-importer","last_synced_at":"2025-07-11T06:33:17.141Z","repository":{"id":32875011,"uuid":"36469208","full_name":"graphaware/neo4j-importer","owner":"graphaware","description":"Java importer skeleton for complicated, business-logic-heavy high-performance Neo4j imports directly from SQL databases, CSV files, etc.","archived":false,"fork":false,"pushed_at":"2017-04-30T15:31:04.000Z","size":205,"stargazers_count":26,"open_issues_count":1,"forks_count":8,"subscribers_count":33,"default_branch":"master","last_synced_at":"2024-11-02T16:35:03.614Z","etag":null,"topics":["java","neo4j","neo4j-graphaware-framework"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/graphaware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-28T22:17:33.000Z","updated_at":"2024-07-30T10:22:41.000Z","dependencies_parsed_at":"2022-08-25T23:22:16.628Z","dependency_job_id":null,"html_url":"https://github.com/graphaware/neo4j-importer","commit_stats":null,"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphaware%2Fneo4j-importer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphaware%2Fneo4j-importer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphaware%2Fneo4j-importer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphaware%2Fneo4j-importer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/graphaware","download_url":"https://codeload.github.com/graphaware/neo4j-importer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225700876,"owners_count":17510448,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","neo4j","neo4j-graphaware-framework"],"created_at":"2024-08-01T07:00:37.790Z","updated_at":"2024-11-21T09:14:56.560Z","avatar_url":"https://github.com/graphaware.png","language":"Java","funding_links":[],"categories":["Import","REST API"],"sub_categories":["REST API","Other"],"readme":"GraphAware Neo4j Importer - RETIRED\n=========================\n\nImporter Has Been Retired\n-------------------------\n\nAs of April 1st 2017, this module is retiring in favour of [GraphAware Databridge](https://neo4j.com/blog/graphaware-databridge-graph-data-import/). This means it will no longer be maintained and released together with new versions of the GraphAware Framework and Neo4j. The last compatible Neo4j version is 3.1.0.\n\nThis repository will remain public.\n\nIntroduction\n============\n\n[![Build Status](https://travis-ci.org/graphaware/neo4j-importer.png)](https://travis-ci.org/graphaware/neo4j-importer) | \u003ca href=\"http://graphaware.com/products/\" target=\"_blank\"\u003eProducts\u003c/a\u003e | Latest Release: 3.1.0.44.3\n\nGraphAware Importer is a high-performance importer for importing data from any data source to Neo4j. It is intended\nfor initial one-off imports of large amounts of data (millions to billions of nodes and relationships), which needs\nto be cleansed, normalised, or transformed during the import. Depending on many things (connection speed, database speed,\nquery complexity, data quality,...), you'll be able to import millions of nodes and relationships in minutes.\n\n## Another Importer?\n\nThere are a number of ways of getting data into Neo4j.\n\n* If you have small amounts of CSV data, use [Neo4j's LOAD CSV](http://neo4j.com/docs/stable/query-load-csv.html)\n* If you have large amounts of clean CSV data where you can separate nodes and relationships into different files, use [Neo4j's Import Tool](http://neo4j.com/docs/stable/import-tool.html)\n* If you have large amounts of ready-to-be imported (i.e. not too dirty) data in any tabular form and don't want do code, use GraphAware's Neo4j DataBridge (coming soon)\n* For all other scenarios, especially if you have large volumes of data from any source (CSV, MySQL, Oracle, HBase, you name it!) that need to be cleansed, normalised or transformed in some way, use this importer. **You will need to code** in Java.\n\n## Tutorial\n\nThis tutorial will guide you through writing an efficient one-off importer of data into Neo4j in a short amount of time.\n**You need to be able to write some Java.** What you will get at the end of the process is a standalone Java application\nthat you can invoke from the command line. It will import data from a data source of your choice and create a brand new\nfully usable Neo4j database on disk. It is using [Neo4j's BatchInserter](http://neo4j.com/docs/stable/batchinsert.html)\nunder the hood.\n\nThis tool **will not** be able to import into an existing database, or a running Neo4j instance (yet).\n\n### Step 0: Get Data\n\nYou need some data of course. For this tutorial, we're going to be importing from 2 CSV files:\n\npeople-file.csv:\n```\n\"id\",\"name\",\"location\",\"age\"\n\"1\",\"Michal Bachman\",\"1\",30\n\"2\",\"Adam George\",\"2\",29\n\"\",\"PersonWithNoId\",\"2\",99\n\"4\",\"  \",\"2\",100\n```\n\nlocations-file.csv:\n```\n\"id\",\"name\"\n\"1\",\"London\"\n\"2\",\"Watnall\"\n\"3\",\"Prague\"\n```\n\nIn practice, these could be tables from (or queries against) a relational database, column families from Cassandra, you name it.\n\nThe graph we're looking to get by importing the files above is:\n```\nCREATE\n(m:Person {id:1, name:'Michal Bachman', age:30}),\n(a:Person {id:2, name:'Adam George', age:29}),\n(l:City {id:1, name:'London'}),\n(w:City {id:2, name:'Watnall'}),\n(p:City {id:3, name:'Prague'}),\n(m)-[:LIVES_IN]-\u003e(l),\n(a)-[:LIVES_IN]-\u003e(w)\n```\n\nNote that the last two lines in people-file.csv are bad data, we don't want to import these.\n\n### Step 1: New Project\n\nCreate a brand new Java project and bring this project as its dependency. Assuming you're using Maven, declare the following\ndependency in your pom.xml\n\n```\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.graphaware.neo4j\u003c/groupId\u003e\n    \u003cartifactId\u003eprogrammatic-importer\u003c/artifactId\u003e\n    \u003cversion\u003e3.1.0.44.2\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nYou will also need to make sure that the .jar file produced at the end of the process is a \"fat jar\", i.e. that it contains\nall the needed dependencies. For this to happen, you need something like this in your pom.xml:\n\n```\n\u003cbuild\u003e\n    \u003cplugins\u003e\n        \u003cplugin\u003e\n            \u003cartifactId\u003emaven-assembly-plugin\u003c/artifactId\u003e\n            \u003cversion\u003e2.4.1\u003c/version\u003e\n            \u003cexecutions\u003e\n                \u003cexecution\u003e\n                    \u003cphase\u003epackage\u003c/phase\u003e\n                    \u003cgoals\u003e\n                        \u003cgoal\u003eattached\u003c/goal\u003e\n                    \u003c/goals\u003e\n                \u003c/execution\u003e\n            \u003c/executions\u003e\n            \u003cconfiguration\u003e\n                \u003cfinalName\u003emy-importer\u003c/finalName\u003e\n                \u003cdescriptorRefs\u003e\n                    \u003cdescriptorRef\u003ejar-with-dependencies\u003c/descriptorRef\u003e\n                \u003c/descriptorRefs\u003e\n                \u003cappendAssemblyId\u003efalse\u003c/appendAssemblyId\u003e\n            \u003c/configuration\u003e\n        \u003c/plugin\u003e\n    \u003c/plugins\u003e\n\u003c/build\u003e\n```\n\n### Step 2: Data Reader\n\nImplement a `DataReader` that is able to read from your data source. Most readers will be `TabularDataReader`s. If you're\nimporting from a CSV file, you can skip this step and use the provided `CsvDataReader`. If you're importing from a relational database,\nyou can save some time by extending `DbDataReader` or `QueueDbDataReader` (recommended).\n\nFor example, it you're reading from Oracle, your data reader will look something like this:\n\n```java\n/**\n * {@link com.graphaware.importer.data.access.DbDataReader} for Oracle.\n */\npublic class OracleDataReader extends QueueDbDataReader {\n\n    private final String db;\n    private final int prefetchSize;\n    private final int fetchSize;\n\n    public OracleDataReader(String dbHost, String dbPort, String user, String password, String db, int prefetchSize, int fetchSize) {\n        super(dbHost, dbPort, user, password);\n        this.db = db;\n        this.prefetchSize = prefetchSize;\n        this.fetchSize = fetchSize;\n    }\n\n    @Override\n    protected String getDriverClassName() {\n        return \"oracle.jdbc.OracleDriver\";\n    }\n\n    @Override\n    protected String getUrl(String host, String port) {\n        return \"jdbc:oracle:thin:@//\" + host + \":\" + port + \"/\" + db;\n    }\n\n    @Override\n    protected void additionalConfig(JdbcTemplate template) {\n        template.setFetchSize(fetchSize);\n    }\n\n    @Override\n    protected void additionalConfig(DataSource dataSource) {\n        ((BasicDataSource) dataSource).addConnectionProperty(\"defaultRowPrefetch\", String.valueOf(prefetchSize));\n        ((BasicDataSource) dataSource).setInitialSize(10);\n    }\n}\n```\n\nNote that you will have to add the driver (Oracle JDBC driver in this case) into your Maven dependencies.\n\nIf you're writing an importer for a non-relational database, for example HBase, you will need to do a bit more work. An\nexample HBase data reader would look like this:\n\n```java\nimport com.graphaware.importer.data.access.DataReader;\nimport org.apache.hadoop.conf.Configuration;\nimport org.apache.hadoop.hbase.TableName;\nimport org.apache.hadoop.hbase.client.*;\nimport org.apache.hadoop.hbase.util.Bytes;\n\nimport java.io.IOException;\nimport java.util.*;\n\npublic class HbaseDataReader implements DataReader\u003cMap\u003cString, Collection\u003cString\u003e\u003e\u003e {\n\n    private final Configuration configuration;\n    private final String columnFamily;\n    private Connection connection;\n    private ResultScanner scanner;\n    private Iterator\u003cResult\u003e results = null;\n    private Result result = null;\n    private int row = 0;\n\n    public HbaseDataReader(Configuration configuration, String columnFamily) {\n        this.configuration = configuration;\n        this.columnFamily = columnFamily;\n    }\n\n    @Override\n    public void initialize() {\n\n    }\n\n    @Override\n    public Map\u003cString, Collection\u003cString\u003e\u003e readObject(String columnFamily) {\n        Set\u003cString\u003e cells = new HashSet\u003c\u003e();\n\n        for (byte[] cell : result.getFamilyMap(Bytes.toBytes(columnFamily)).keySet()) {\n            cells.add(Bytes.toString(cell));\n        }\n\n        String key = Bytes.toString(result.getRow());\n\n        return Collections.\u003cString, Collection\u003cString\u003e\u003esingletonMap(key, cells);\n    }\n\n    @Override\n    public void read(String connectionString, String hint) {\n        if (results != null) {\n            throw new IllegalStateException(\"Previous reader hasn't been closed\");\n        }\n\n        try {\n            connection = ConnectionFactory.createConnection(configuration);\n            Table table = connection.getTable(TableName.valueOf(connectionString));\n            Scan scan = new Scan();\n            scan.addFamily(Bytes.toBytes(columnFamily));\n            scanner = table.getScanner(scan);\n            results = scanner.iterator();\n        } catch (IOException e) {\n            throw new RuntimeException(e);\n        }\n    }\n\n    @Override\n    public void close() {\n        scanner.close();\n\n        try {\n            connection.close();\n        } catch (IOException e) {\n            throw new RuntimeException(e);\n        }\n\n        results = null;\n        result = null;\n    }\n\n    @Override\n    public int getRow() {\n        return row;\n    }\n\n    @Override\n    public boolean readRecord() {\n        if (!results.hasNext()) {\n            return false;\n        }\n        result = results.next();\n        row++;\n\n        return true;\n    }\n\n    @Override\n    public String getRawRecord() {\n        return result.toString();\n    }\n```\n\n### Step 3: Domain\n\nYou now need to define some Java classes that represent the things you are going to be importing. The data from the reader\nwill be translated to these classes. Validation, normalization, and transformation can be applied to these classes, before\nthey are translated into Neo4j nodes and relationships.\n\nIf you don't need to apply much logic to the data, you can choose to go with `Map\u003cString, Object\u003e` instead of concrete objects.\nThe `String` in the map is some property key and the `Object` is that property's value.\n\nLet's assume location data is clean, so we'll go with the `Map` approach. For importing people, we choose to create a class like this:\n\n```java\npublic class Person extends Neo4jPropertyContainer {\n\n    @Neo4jProperty\n    private final Long id;\n    @Neo4jProperty\n    private final String name;\n    @Neo4jProperty\n    private final Integer age;\n\n    private final Long location;\n\n    public Person(Long id, String name, Integer age, Long location) {\n        this.id = id;\n        this.name = name;\n        this.age = age;\n        this.location = location;\n    }\n\n    public Long getId() {\n        return id;\n    }\n\n    public String getName() {\n        return name;\n    }\n\n    public Integer getAge() {\n        return age;\n    }\n\n    public Long getLocation() {\n        return location;\n    }\n}\n```\n\nIn this case, we're expecting each row from the data source to contain four pieces of information (id, name, age, location).\nThe ones that we want to become a node's properties in Neo4j, we annotate with `@Neo4jProperty`. The `location` property will not\nbe stored in Neo4j, it will be used to link the person to a location, so it is not annotated. Choose the names of the properties\naccording to how they will be called in Neo4j - it doesn't matter at this point what they are called in your source database.\n\n### Step 4: Importers\n\nNow you define the actual import logic. For each domain class from the previous step, there should be one `Importer`.\nImporters should extend `BaseImporter`. If using `TabularDataReader`, you can extend `TabularImporter` instead.\nFor locations and people, we will write the two importers. Don't get scared, we will explain all aspects of\nwriting such importers step-by-step.\n\n```java\npublic class LocationImporter extends TabularImporter\u003cMap\u003cString, Object\u003e\u003e {\n\n    @InjectCache(name = \"locations\", creator = true)\n    private Cache\u003cLong, Long\u003e locationCache;\n\n    @Override\n    public Data inputData() {\n        return DynamicData.withName(\"locations\");\n    }\n\n    @Override\n    public Map\u003cString, Object\u003e produceObject(TabularDataReader record) {\n        Map\u003cString, Object\u003e result = new HashMap\u003c\u003e();\n\n        result.put(\"id\", record.readLong(\"id\"));\n        result.put(\"name\", record.readObject(\"name\"));\n\n        return result;\n    }\n\n    @Override\n    public void processObject(Map\u003cString, Object\u003e object) {\n        locationCache.put((Long) object.get(\"id\"), context.inserter().createNode(object, Label.label(\"Location\")));\n    }\n\n    @Override\n    protected void createCache(Caches caches, String name) {\n        if (\"locations\".equals(name)) {\n            caches.createCache(name, Long.class, Long.class);\n        }\n        else {\n            super.createCache(caches, name);\n        }\n    }\n}\n```\n\nLet's start with the `LocationImporter` above. We've decided earlier not to create a dedicated \"domain\" object for locations.\nWe're importing from tabular data (CSV), therefore we will extend `TabularImporter\u003cMap\u003cString, Object\u003e\u003e`.\n\nThere are two important methods that need to be implemented first. `produceObject(..)` will produce a \"domain\" object from\na tabular record. `processObject(..)` should validate and normalize the object and insert it into Neo4j.\n\nProducing the object should be a trivial mapping exercise, reading values from the (database/csv) record and populating\nour object with it. Populating it with dirty data is fine at this point, but `null` can be returned if we don't really\nwant to produce an object from the record, because it is apparently wrong.\n\nProcessing the object means a couple of things. The minimum we should do is create a Location node from the object\nby writing: `context.inserter().createNode(object, Label.label(\"Location\")`. This will create a new node with label \"Location\"\nand properties in the `Map` - \"id\" and \"name\" in this case. This method call returns the Neo4j node ID of the newly created\nnode.\n\nSince we will need to link people to locations later on, we should remember what Neo4j node ID was assigned to our the each\nlocation. Remember the \"id\" property of the location is coming from our relational data. For this reason, we need to have\nan (off-heap) `Cache` in place:\n\n```java\n@InjectCache(name = \"locations\", creator = true)\nprivate Cache\u003cLong, Long\u003e locationCache;\n```\n\nThis tells the importer infrastructure that a cache called \"locations\" is used by this importer and that the key (own ID)\nis a `Long`. The value is usually a `Long`, because it is the Neo4j node ID. Moreover, `creator=true` tells the infrastructure\nthat this importer creates this cache. That means other importers that need this cache will need to run after this one\nhas finished. For each cache, there can only ever be a single creator.\n\nWhen an importer is a cache creator, it needs to actually create the cache by implementing the `createCache(..)` method.\nIt should check that it is asked to create the right one. If not, it should delegate to super-class, e.g.:\n\n```java\n@Override\nprotected void createCache(Caches caches, String name) {\n    if (\"locations\".equals(name)) {\n        caches.createCache(name, Long.class, Long.class);\n    }\n    else {\n        super.createCache(caches, name);\n    }\n}\n```\n\nWith the caches explained, we will refine our node creating method to populate the cache with each new location:\n\n```java\n@Override\npublic void processObject(Map\u003cString, Object\u003e object) {\n    locationCache.put((Long) object.get(\"id\"), context.inserter().createNode(object, Label.label(\"Location\")));\n}\n```\n\nFinally, each importer needs to implement the `inputData()` method to indicate, what sort of input data it works with.\nThis is later used to actually find the data. So \"locations\" here could represent a CSV file called \"locations-file.csv\", or\na SQL query \"SELECT * FROM locations\", etc...\n\nWith this in mind, let's have a look at the slightly more complicated implementation of `PersonImporter`:\n\n```java\npublic class PersonImporter extends TabularImporter\u003cPerson\u003e {\n\n    @InjectCache(name = \"people\", creator = true)\n    private Cache\u003cLong, Long\u003e personCache;\n\n    @InjectCache(name = \"locations\")\n    private Cache\u003cLong, Long\u003e locationCache;\n\n    @Override\n    public Data inputData() {\n        return DynamicData.withName(\"people\");\n    }\n\n    @Override\n    public Person produceObject(TabularDataReader record) {\n        //for demo purposes, let's say we can't construct a person without ID\n        if (record.readLong(\"id\") == null) {\n            return null;\n        }\n        return new Person(record.readLong(\"id\"), record.readObject(\"name\"), record.readInt(\"age\"), record.readLong(\"location\"));\n    }\n\n    @Override\n    public void processObject(Person person) {\n        //for demo purposes, let's say people with empty names are invalid.\n        if (StringUtils.isEmpty(person.getName())) {\n            throw new RuntimeException(\"Person has empty name\");\n        }\n\n        personCache.put(person.getId(), context.inserter().createNode(person.getProperties(), Label.label(\"Person\")));\n        context.inserter().createRelationship(personCache.get(person.getId()), locationCache.get(person.getLocation()), withName(\"LIVES_IN\"), Collections.\u003cString, Object\u003eemptyMap());\n    }\n\n    @Override\n    protected void createCache(Caches caches, String name) {\n        if (\"people\".equals(name)) {\n            caches.createCache(name, Long.class, Long.class);\n        } else {\n            super.createCache(caches, name);\n        }\n    }\n\n    @Override\n    public void createIndices() {\n        createIndex(Label.label(\"Person\"), \"name\");\n    }\n}\n```\n\nThis importer is producing a person cache and using a location cache to create relationships between people and locations.\nIt also overrides to `createIndices()` method to create an index on people's names.\n\n### Step 5: Wiring it all together\n\nFinally, we need to create the actual main importer class that will be called when data is to be imported. In our simple\ncase, it will look as follows:\n\n```java\npublic class MyBatchImporter extends FileBatchImporter {\n\n    public static void main(String[] args) {\n        new MyBatchImporter().run(args);\n    }\n\n    @Override\n    protected Set\u003cImporter\u003e createImporters() {\n        //list all importers, order does not matter\n        return new HashSet\u003c\u003e(Arrays.\u003cImporter\u003easList(\n                new PersonImporter(),\n                new LocationImporter()\n        ));\n    }\n\n    @Override\n    protected Map\u003cData, String\u003e input() {\n        //map logical input names to physical ones (file names, queries,...)\n\n        Map\u003cData, String\u003e map = new HashMap\u003c\u003e();\n        map.put(DynamicData.withName(\"people\"), \"people-file\");\n        map.put(DynamicData.withName(\"locations\"), \"locations-file\");\n        return map;\n    }\n}\n```\n\n### Step 6: Tests\n\nWe should now test our importer. This isn't hard. We will be using GraphUnit to do that, so you should have that in your\ndependencies:\n\n```\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.graphaware.neo4j\u003c/groupId\u003e\n    \u003cartifactId\u003etests\u003c/artifactId\u003e\n    \u003cversion\u003e3.1.0.44\u003c/version\u003e\n    \u003cscope\u003etest\u003c/scope\u003e\n\u003c/dependency\u003e\n```\n\nThe test would use the inserter on our csv data and verify the contents of the produced database:\n\n```java\n@Test\npublic void testImport() throws IOException, InterruptedException {\n    TemporaryFolder temporaryFolder = new TemporaryFolder();\n    temporaryFolder.create();\n    String tmpFolder = temporaryFolder.getRoot().getAbsolutePath();\n\n    String cp = new ClassPathResource(\"people-file.csv\").getFile().getAbsolutePath();\n    String path = cp.substring(0, cp.length() - \"people-file.csv\".length());\n\n    try {\n        TestBatchImporter.main(new String[]{\"-g\", tmpFolder + \"/graph.db\", \"-i\", path, \"-o\", tmpFolder, \"-r\", \"neo4j.properties\"});\n    } catch (Throwable t) {\n        fail();\n    }\n\n    GraphDatabaseService database = new GraphDatabaseFactory().newEmbeddedDatabase(tmpFolder + \"/graph.db\");\n\n    GraphUnit.assertSameGraph(database, \"CREATE \" +\n                    \"(p1:Person {id: 1, name: 'Michal Bachman', age:30}),\" +\n                    \"(p2:Person {id: 2, name: 'Adam George', age:29}),\" +\n                    \"(l1:Location {id: 1, name: 'London'}),\" +\n                    \"(l2:Location {id: 2, name: 'Watnall'}),\" +\n                    \"(l3:Location {id: 3, name: 'Prague'}),\" +\n                    \"(p1)-[:LIVES_IN]-\u003e(l1),\" +\n                    \"(p2)-[:LIVES_IN]-\u003e(l2)\"\n    );\n\n    database.shutdown();\n    temporaryFolder.delete();\n}\n```\n\n### Step 7: Use\n\n`java -cp ./path/to/importer/importer.jar com.graphaware.importer.MyBatchImporter`\n\nusage:\n\n```\n -g,--graph \u003carg\u003e        use given directory to output the graph\n -i,--input \u003carg\u003e        use given directory to find input files\n -o,--output \u003carg\u003e       use given directory to output auxiliary files, such as statistics\n -r,--properties \u003carg\u003e   use given file as neo4j properties\n -c,--cachefile \u003carg\u003e    use given file as temporary on-disk cache\n```\n\n### Step 8: Further Customization\n\n#### Custom Config\n\nThe import process can be further customised. First of all, if additional configuration needs to be passed into the process,\nit is possible to implement a custom `CommandLineParser`. Typically, this is needed to somehow customise the data reading\ncomponents. Depending of where you're importing from and what configuration you need, you may choose to extend\n`BaseCommandLineParser`, `FileCommandLineParser`, or `DbCommandLineParser`.\n\nClosely tied to `CommandLineParser` is the `ImportConfig` that it produces. Again, for custom import configuration, you\ncan implement `ImportConfig` by extending `BaseImportConfig`, `FileImportConfig`, or `DbImportConfig`.\n`ImportConfig` then produces a `DataReader`.\n\nLet's illustrate using an example. If we were importing from Oracle and wanted the user to specify the fetchSize for the\nJdbcTemplate and prefetchSize for the Oracle connection, we would need to implement the following classes:\n\n```java\npublic class OracleCommandLineParser extends DbCommandLineParser {\n\n    @Override\n    protected DbImportConfig doProduceConfig(CommandLine line, String graphDir, String outputDir, String props, String host, String port, String user, String password) throws ParseException {\n        int prefetchSize = Integer.valueOf(getOptionalValue(line, \"pfs\", \"10000\"));\n        int fetchSize = Integer.valueOf(getOptionalValue(line, \"fs\", \"10000\"));\n\n        return new OracleImportConfig(\n                graphDir,\n                outputDir,\n                props,\n                host,\n                port,\n                user,\n                password,\n                prefetchSize,\n                fetchSize);\n    }\n\n    @Override\n    protected void addOptions(Options options) {\n        super.addOptions(options);\n\n        options.addOption(new Option(\"pfs\", \"prefetchSize\", true, \"Oracle row prefetch size (default 10000)\"));\n        options.addOption(new Option(\"fs\", \"fetchSize\", true, \"JDBC driver row fetch size (default 10000)\"));\n    }\n}\n```\n\n```java\npublic class OracleImportConfig extends DbImportConfig {\n\n    private final int prefetchSize;\n    private final int fetchSize;\n\n    public OracleImportConfig(String graphDir, String outputDir, String props, String dbHost, String dbPort, String user, String password, int prefetchSize, int fetchSize) {\n        super(graphDir, outputDir, props, dbHost, dbPort, user, password);\n        this.prefetchSize = prefetchSize;\n        this.fetchSize = fetchSize;\n    }\n\n    @Override\n    public DataReader createReader() {\n        return new OracleDataReader(getDbHost(), getDbPort(), getUser(), getPassword(), prefetchSize, fetchSize);\n    }\n}\n```\n\nOnce we have these two classes, we can wire them into the top-level importer by overriding a single method:\n\n```java\n@Override\nprotected CommandLineParser\u003cDbImportConfig\u003e commandLineParser() {\n    return new OracleCommandLineParser();\n}\n```\n\n#### Custom Context\n\nThroughout the import process, an `ImportContext` is available to the `Inserter`s by accessing the protected `context`\nfield. This context provides access to the actual `BatchInserter` used for creating nodes and relationships, to `caches()`,\netc. In case more context is needed, for example an external validator (e.g. some JSR-303 validator implementation), you\ncan implement a custom `ImportContext` by extending the default `SimpleImportContext`.\n\n```java\npublic class MyImportContext extends SimpleImportContext {\n\n    private ObjectNormalizer normalizer;\n    private ObjectValidator validator;\n\n    public MyImportContext(ImportConfig config, Caches caches, DataLocator inputLocator, DataLocator outputLocator) {\n        super(config, caches, inputLocator, outputLocator);\n    }\n\n    public ObjectNormalizer normalizer() {\n        return normalizer;\n    }\n\n    public ObjectValidator validator() {\n        return validator;\n    }\n\n    @Override\n    protected void postBootstrap() {\n        super.postBootstrap();\n\n        normalizer = createNormalizer();\n        validator = createValidator();\n    }\n\n    protected ObjectNormalizer createNormalizer() {\n        return new AnnotationObjectNormalizer();\n    }\n\n    protected ObjectValidator createValidator() {\n        return new StandardObjectValidator();\n    }\n}\n```\n\nAgain, this custom context is wired into the import process in the top-level `BatchImporter`:\n\n```java\n@Override\nprotected ImportContext createContext(T config) {\n    return new MyImportContext(config, createCaches(), createInputDataLocator(config), createOutputDataLocator(config));\n}\n```\n\nFor further customisations, please have a look at the [Javadoc](http://graphaware.com/site/importer/latest/apidocs) or the code in this repo.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraphaware%2Fneo4j-importer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraphaware%2Fneo4j-importer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraphaware%2Fneo4j-importer/lists"}