{"id":18725175,"url":"https://github.com/vesoft-inc/nebula-importer","last_synced_at":"2025-04-06T20:12:59.905Z","repository":{"id":36057305,"uuid":"215484250","full_name":"vesoft-inc/nebula-importer","owner":"vesoft-inc","description":"Nebula Graph Importer with Go","archived":false,"fork":false,"pushed_at":"2024-04-19T12:20:20.000Z","size":838,"stargazers_count":92,"open_issues_count":31,"forks_count":60,"subscribers_count":30,"default_branch":"master","last_synced_at":"2025-03-30T18:09:38.966Z","etag":null,"topics":["csv","csv-import","golang","nebula-graph"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vesoft-inc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-16T07:25:32.000Z","updated_at":"2025-02-10T01:25:58.000Z","dependencies_parsed_at":"2023-10-16T16:37:33.303Z","dependency_job_id":"93013e93-de96-4aa9-aa81-c7a9fa0fa728","html_url":"https://github.com/vesoft-inc/nebula-importer","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vesoft-inc%2Fnebula-importer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vesoft-inc%2Fnebula-importer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vesoft-inc%2Fnebula-importer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vesoft-inc%2Fnebula-importer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vesoft-inc","download_url":"https://codeload.github.com/vesoft-inc/nebula-importer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247543593,"owners_count":20955865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","csv-import","golang","nebula-graph"],"created_at":"2024-11-07T14:09:22.022Z","updated_at":"2025-04-06T20:12:59.888Z","avatar_url":"https://github.com/vesoft-inc.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![codecov.io](https://codecov.io/gh/vesoft-inc/nebula-importer/branch/master/graph/badge.svg)](https://codecov.io/gh/vesoft-inc/nebula-importer)\n[![Go Report Card](https://goreportcard.com/badge/github.com/vesoft-inc/nebula-importer)](https://goreportcard.com/report/github.com/vesoft-inc/nebula-importer)\n[![GolangCI](https://golangci.com/badges/github.com/vesoft-inc/nebula-importer.svg)](https://golangci.com/r/github.com/vesoft-inc/nebula-importer)\n[![GoDoc](https://godoc.org/github.com/vesoft-inc/nebula-importer?status.svg)](https://godoc.org/github.com/vesoft-inc/nebula-importer)\n\n# What is NebulaGraph Importer?\n\n**NebulaGraph Importer** is a tool to import data into [NebulaGraph](https://github.com/vesoft-inc/nebula).\n\n## Features\n\n* Support multiple data sources, currently supports `local`, `s3`, `oss`, `ftp`, `sftp`, `hdfs`, and `gcs`.\n* Support multiple file formats, currently only `csv` files are supported.\n* Support files containing multiple tags, multiple edges, and a mixture of both.\n* Support data transformations.\n* Support record filtering.\n* Support multiple modes, including `INSERT`, `UPDATE`, `DELETE`.\n* Support connect multiple Graph with automatically load balance.\n* Support retry after failure.\n* Humanized status printing.\n\n_See configuration instructions for more features._\n\n## How to Install\n\n### From Releases\n\nDownload the packages on the [Releases page](https://github.com/vesoft-inc/nebula-importer/releases), and give execute permissions to it.\n\nYou can choose according to your needs, the following installation packages are supported:\n\n* binary\n* archives\n* apk\n* deb\n* rpm\n\n### From go install\n\n```shell\n$ go install github.com/vesoft-inc/nebula-importer/cmd/nebula-importer@latest\n```\n\n### From docker\n\n```shell\n$ docker pull vesoft/nebula-importer:\u003cversion\u003e\n$ docker run --rm -ti \\\n      --network=host \\\n      -v \u003cconfig_file\u003e:\u003cconfig_file\u003e \\\n      -v \u003cdata_dir\u003e:\u003cdata_dir\u003e \\\n      vesoft/nebula-importer:\u003cversion\u003e\n      --config \u003cconfig_file\u003e\n\n# config_file: the absolute path to the configuration file.\n# data_dir: the absolute path to the data directory, ignore if not a local file.\n# version: the version of NebulaGraph Importer.\n```\n\n### From Source Code\n\n```shell\n$ git clone https://github.com/vesoft-inc/nebula-importer\n$ cd nebula-importer\n$ make build\n```\n\nYou can find a binary named `nebula-importer` in `bin` directory.\n\n## Configuration Instructions\n\n`NebulaGraph Importer`'s configuration file is in YAML format. You can find some examples in [examples](examples/).\n\nConfiguration options are divided into four groups:\n\n* `client` is configuration options related to the NebulaGraph connection client.\n* `manager` is global control configuration options related to NebulaGraph Importer.\n* `log` is configuration options related to printing logs.\n* `sources` is the data source configuration items.\n\n### client\n\n```yaml\nclient:\n  version: v3\n  address: \"127.0.0.1:9669\"\n  user: root\n  password: nebula\n  ssl:\n    enable: true\n    certPath: \"your/cert/file/path\"\n    keyPath: \"your/key/file/path\"\n    caPath: \"your/ca/file/path\"\n    insecureSkipVerify: false\n  concurrencyPerAddress: 16\n  reconnectInitialInterval: 1s\n  retry: 3\n  retryInitialInterval: 1s\n```\n\n* `client.version`: **Required**. Specifies which version of NebulaGraph, currently only `v3` is supported.\n* `client.address`: **Required**. The address of graph in NebulaGraph.\n* `client.user`: **Optional**. The user of NebulaGraph. The default value is `root`.\n* `client.password`: **Optional**. The password of NebulaGraph. The default value is `nebula`.\n* `client.ssl`: **Optional**. SSL related configuration.\n* `client.ssl.enable`: **Optional**. Specifies whether to enable ssl authentication. The default value is `false`.\n* `client.ssl.certPath`: **Required**. Specifies the path of the certificate file.\n* `client.ssl.keyPath`: **Required**. Specifies the path of the private key file.\n* `client.ssl.caPath`: **Required**. Specifies the path of the certification authority file.\n* `client.ssl.insecureSkipVerify`: **Optional**. Specifies whether a client verifies the server's certificate chain and host name. The default value is `false`.\n* `client.concurrencyPerAddress`: **Optional**. The number of client connections to each graph in NebulaGraph. The default value is `10`.\n* `client.reconnectInitialInterval`: **Optional**. The initialization interval for reconnecting NebulaGraph. The default value is `1s`.\n* `client.retry`: **Optional**. The failed retrying times to execute nGQL queries in NebulaGraph client. The default value is `3`.\n* `client.retryInitialInterval`: **Optional**. The initialization interval retrying. The default value is `1s`.\n\n### manager\n\n```yaml\n  spaceName: basic_int_examples\n  batch: 128\n  readerConcurrency: 50\n  importerConcurrency: 512\n  statsInterval: 10s\n  hooks:\n    before:\n      - statements:\n          - UPDATE CONFIGS storage:wal_ttl=3600;\n          - UPDATE CONFIGS storage:rocksdb_column_family_options = { disable_auto_compactions = true };\n      - statements:\n          - |\n            DROP SPACE IF EXISTS basic_int_examples;\n            CREATE SPACE IF NOT EXISTS basic_int_examples(partition_num=5, replica_factor=1, vid_type=int);\n            USE basic_int_examples;\n        wait: 10s\n    after:\n      - statements:\n          - |\n            UPDATE CONFIGS storage:wal_ttl=86400;\n            UPDATE CONFIGS storage:rocksdb_column_family_options = { disable_auto_compactions = false };\n```\n\n* `manager.spaceName`: **Required**. Specifies which space the data is imported into.\n* `manager.batch`: **Optional**. Specifies the batch size for all sources of the inserted data. The default value is `128`.\n* `manager.readerConcurrency`: **Optional**. Specifies the concurrency of reader to read from sources. The default value is `50`.\n* `manager.importerConcurrency`: **Optional**. Specifies the concurrency of generating inserted nGQL statement, and then call client to import. The default value is `512`.\n* `manager.statsInterval`: **Optional**. Specifies the interval at which statistics are printed. The default value is `10s`.\n* `manager.hooks.before`: **Optional**. Configures the statements before the import begins.\n  * `manager.hooks.before.[].statements`: Defines the list of statements.\n  * `manager.hooks.before.[].wait`: **Optional**. Defines the waiting time after executing the above statements.\n* `manager.hooks.after`: **Optional**. Configures the statements after the import is complete.\n  * `manager.hooks.after.[].statements`: **Optional**. Defines the list of statements.\n  * `manager.hooks.after.[].wait`: **Optional**. Defines the waiting time after executing the above statements.\n\n### log\n\n```yaml\nlog:\n  level: INFO\n  console: true\n  files:\n    - logs/nebula-importer.log\n```\n\n* `log.level`: **Optional**. Specifies the log level, optional values is `DEBUG`, `INFO`, `WARN`, `ERROR`, `PANIC` or `FATAL`. The default value is `INFO`.\n* `log.console`: **Optional**. Specifies whether to print logs to the console. The default value is `true`.\n* `log.files`: **Optional**. Specifies which files to print logs to.\n\n### sources\n\n`sources` is the configuration of the data source list, each data source contains data source information, data processing and schema mapping.\n\nThe following are the relevant configuration items.\n\n* `batch` specifies the batch size for this source of the inserted data. The priority is greater than `manager.batch`.\n* `path`, `s3`, `oss`, `ftp`, `sftp`, `hdfs`, and `gcs` are information configurations of various data sources, and only one of them can be configured.\n* `csv` describes the csv file format information.\n* `tags` describes the schema definition for tags.\n* `edges` describes the schema definition for edges.\n\n#### path\n\nIt only needs to be configured for local file data sources.\n\n```yaml\npath: ./person.csv\n```\n\n* `path`: **Required**. Specifies the path where the data files are stored. If a relative path is used, the path and current configuration file directory are spliced. Wildcard filename is also supported, for example: ./follower-*.csv, please make sure that all matching files with the same schema.\n\n#### s3\n\nIt only needs to be configured for s3 data sources.\n\n```yaml\ns3:\n  endpoint: \u003cendpoint\u003e\n  region: \u003cregion\u003e\n  bucket: \u003cbucket\u003e\n  key: \u003ckey\u003e\n  accessKeyID: \u003cAccess Key ID\u003e\n  accessKeySecret: \u003cAccess Key Secret\u003e\n```\n\n* `endpoint`: **Optional**. The endpoint of s3 service, can be omitted if using aws s3.\n* `region`: **Required**. The region of s3 service.\n* `bucket`: **Required**. The bucket of file in s3 service.\n* `key`: **Required**. The object key of file in s3 service.\n* `accessKeyID`: **Optional**. The `Access Key ID` of s3 service. If it is public data, no need to configure.\n* `accessKeySecret`: **Optional**. The `Access Key Secret` of s3 service. If it is public data, no need to configure.\n\n#### oss\n\nIt only needs to be configured for oss data sources.\n\n```yaml\noss:\n  endpoint: \u003cendpoint\u003e\n  bucket: \u003cbucket\u003e\n  key: \u003ckey\u003e\n  accessKeyID: \u003cAccess Key ID\u003e\n  accessKeySecret: \u003cAccess Key Secret\u003e\n```\n\n* `endpoint`: **Required**. The endpoint of oss service.\n* `bucket`: **Required**. The bucket of file in oss service.\n* `key`: **Required**. The object key of file in oss service.\n* `accessKeyID`: **Required**. The `Access Key ID` of oss service.\n* `accessKeySecret`: **Required**. The `Access Key Secret` of oss service.\n\n#### ftp\n\nIt only needs to be configured for ftp data sources.\n\n```yaml\nftp:\n  host: 192.168.0.10\n  port: 21\n  user: \u003cuser\u003e\n  password: \u003cpassword\u003e\n  path: \u003cpath of file\u003e\n```\n\n* `host`: **Required**. The host of ftp service.\n* `port`: **Required**. The port of ftp service.\n* `user`: **Required**. The user of ftp service.\n* `password`: **Required**. The password of ftp service.\n* `path`: **Required**. The path of file in the ftp service.\n\n#### sftp\n\nIt only needs to be configured for sftp data sources.\n\n```yaml\nsftp:\n  host: 192.168.0.10\n  port: 22\n  user: \u003cuser\u003e\n  password: \u003cpassword\u003e\n  keyFile: \u003ckeyFile\u003e\n  keyData: \u003ckeyData\u003e\n  passphrase: \u003cpassphrase\u003e\n  path: \u003cpath of file\u003e\n```\n\n* `host`: **Required**. The host of sftp service.\n* `port`: **Required**. The port of sftp service.\n* `user`: **Required**. The user of sftp service.\n* `password`: **Optional**. The password of sftp service.\n* `keyFile`: **Optional**. The ssh key file path of sftp service.\n* `keyData`: **Optional**. The ssh key file content of sftp service.\n* `passphrase`: **Optional**. The ssh key passphrase of sftp service.\n* `path`: **Required**. The path of file in the sftp service.\n\n#### hdfs\n\nIt only needs to be configured for hdfs data sources.\n\n```yaml\nhdfs:\n  address: 192.168.0.10:8020\n  user: \u003cuser\u003e\n  servicePrincipalName: \u003cKerberos Service Principal Name\u003e\n  krb5ConfigFile: \u003cKerberos config file\u003e\n  ccacheFile: \u003cKerberos ccache file\u003e\n  keyTabFile: \u003cKerberos keytab file\u003e\n  password: \u003cKerberos password\u003e\n  dataTransferProtection: \u003cKerberos Data Transfer Protection\u003e\n  disablePAFXFAST: false\n  path: \u003cpath of file\u003e\n```\n\n* `address`: **Required**. The address of hdfs service.\n* `user`: **Optional**. The user of hdfs service.\n* `servicePrincipalName`: **Optional**. The kerberos service principal name of hdfs service when enable kerberos.\n* `krb5ConfigFile`: **Optional**. The kerberos config file of hdfs service when enable kerberos, default is `/etc/krb5.conf`.\n* `ccacheFile`: **Optional**. The ccache file of hdfs service when enable kerberos.\n* `keyTabFile`: **Optional**. The keytab file of hdfs service when enable kerberos.\n* `password`: **Optional**. The kerberos password of hdfs service when enable kerberos.\n* `dataTransferProtection`: **Optional**. The data transfer protection of hdfs service.\n* `disablePAFXFAST`: **Optional**. Whether to prohibit the client to use PA_FX_FAST.\n* `path`: **Required**. The path of file in the sftp service.\n\n#### gcs\n\nIt only needs to be configured for gcs data sources.\n\n```yaml\ngcs:\n  endpoint: \u003cendpoint\u003e\n  bucket: \u003cbucket\u003e\n  key: \u003ckey\u003e\n  credentialsFile: \u003cService account or refresh token JSON credentials file\u003e\n  credentialsJSON: \u003cService account or refresh token JSON credentials\u003e\n  withoutAuthentication: \u003cfalse | true\u003e\n```\n\n* `endpoint`: **Optional**. The endpoint of GCS service.\n* `bucket`: **Required**. The bucket of file in GCS service.\n* `key`: **Required**. The object key of file in GCS service.\n* `credentialsFile`: **Optional**. Path to the service account or refresh token JSON credentials file. Not required for public data.\n* `credentialsJSON`: **Optional**. Content of the service account or refresh token JSON credentials file. Not required for public data.\n* `withoutAuthentication`: **Optional**. Specifies that no authentication should be used, defaults to `false`.\n\n#### batch\n\n```yaml\nbatch: 256\n```\n\n* `batch`: **Optional**. Specifies the batch size for this source of the inserted data. The priority is greater than `manager.batch`.\n\n#### csv\n\n```yaml\ncsv:\n  delimiter: \",\"\n  withHeader: false\n  lazyQuotes: false\n  comment: \"\"\n```\n\n* `delimiter`: **Optional**. Specifies the delimiter for the CSV files. The default value is `\",\"`. And only a 1-character string delimiter is supported.\n* `withHeader`: **Optional**. Specifies whether to ignore the first record in csv file. The default value is `false`.\n* `lazyQuotes`: **Optional**. If lazyQuotes is true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.\n* `comment`: **Optional**. Specifies the comment character. Lines beginning with the Comment character without preceding whitespace are ignored.\n\n#### tags\n\n```yaml\ntags:\n- name: Person\n  mode: INSERT\n  filter:\n    expr: (Record[1] == \"Mahinda\" or Record[1] == \"Michael\") and Record[3] == \"male\"\n  id:\n    type: \"STRING\"\n    function: \"hash\"\n    index: 0\n  ignoreExistedIndex: true\n  props:\n    - name: \"firstName\"\n      type: \"STRING\"\n      index: 1\n    - name: \"lastName\"\n      type: \"STRING\"\n      index: 2\n    - name: \"gender\"\n      type: \"STRING\"\n      index: 3\n      nullable: true\n      defaultValue: male\n    - name: \"birthday\"\n      type: \"DATE\"\n      index: 4\n      nullable: true\n      nullValue: _NULL_\n    - name: \"creationDate\"\n      type: \"DATETIME\"\n      index: 5\n    - name: \"locationIP\"\n      type: \"STRING\"\n      index: 6\n    - name: \"browserUsed\"\n      type: \"STRING\"\n      index: 7\n      nullable: true\n      alternativeIndices:\n        - 6\n\n# concatItems examples\ntags:\n- name: Person\n  id:\n    type: \"STRING\"\n    concatItems:\n      - \"abc\"\n      - 1\n    function: hash\n```\n\n* `name`: **Required**. The tag name.\n* `mode`: **Optional**. The mode for processing data, optional values is `INSERT`, `UPDATE` or `DELETE`, default `INSERT`.\n* `filter`: **Optional**. The data filtering configuration.\n  * `expr`: **Required**. The filter expression. See the [Filter Expression](docs/filter-expression.md) for details.\n* `id`: **Required**. Describes the tag ID information.\n  * `type`: **Optional**. The type for ID. The default value is `STRING`.\n  * `index`: **Optional**. The column number in the records. Required if `concatItems` is not configured.\n  * `concatItems`: **Optional**. The concat items to generate for IDs. The concat item can be string, int or mixed. string represents a constant, and int represents an index column. Then connect all items. If set, the above index will have no effect.\n  * `function`: **Optional**. Functions to generate the IDs. Currently, we only support function `hash`.\n* `ignoreExistedIndex`: **Optional**. Specifies whether to enable `IGNORE_EXISTED_INDEX`. The default value is `true`.\n* `props`: **Required**. Describes the tag props definition.\n  * `name`: **Required**. The property name, must be the same with the tag property in NebulaGraph.\n  * `type`: **Optional**. The property type, currently `BOOL`, `INT`, `FLOAT`, `DOUBLE`, `STRING`, `TIME`, `TIMESTAMP`, `DATE`, `DATETIME`, `GEOGRAPHY`, `GEOGRAPHY(POINT)`, `GEOGRAPHY(LINESTRING)` and `geography(polygon)` are supported. The default value is `STRING`.\n  * `index`: **Required**. The column number in the records.\n  * `nullable`: **Optional**. Whether this prop property can be `NULL`, optional values is `true` or `false`, default `false`.\n  * `nullValue`: **Optional**. Ignored when `nullable` is `false`. The value used to determine whether it is a `NULL`. The property is set to `NULL` when the value is equal to `nullValue`, default `\"\"`.\n  * `alternativeIndices`: **Optional**. Ignored when `nullable` is `false`. The property is fetched from records according to the indices in order until not equal to `nullValue`.\n  * `defaultValue`: **Optional**. Ignored when `nullable` is `false`. The property default value, when all the values obtained by `index` and `alternativeIndices` are `nullValue`.\n\n#### edges\n\n```yaml\nedges:\n- name: KNOWS\n  mode: INSERT\n  filter:\n    expr: (Record[1] == \"Mahinda\" or Record[1] == \"Michael\") and Record[3] == \"male\"\n  src:\n    id:\n      type: \"INT\"\n      index: 0\n  dst:\n    id:\n      type: \"INT\"\n      index: 1\n  rank:\n    index: 0\n  ignoreExistedIndex: true\n  props:\n    - name: \"creationDate\"\n      type: \"DATETIME\"\n      index: 2\n      nullable: true\n      nullValue: _NULL_\n      defaultValue: 0000-00-00T00:00:00\n```\n\n* `name`: **Required**. The edge name.\n* `mode`: **Optional**. The `mode` here is similar to `mode` in the `tags` above.\n* `filter`: **Optional**. The `filter` here is similar to `filter` in the `tags` above.\n* `src`: **Required**. Describes the source definition for the edge.\n* `src.id`: **Required**. The `id` here is similar to `id` in the `tags` above.\n* `dst`: **Required**. Describes the destination definition for the edge.\n* `dst.id`: **Required**. The `id` here is similar to `id` in the `tags` above.\n* `rank`: **Optional**. Describes the rank definition for the edge.\n* `rank.index`: **Required**. The column number in the records.\n* `props`: **Optional**. Similar to the `props` in the `tags`, but for edges.\n\nSee the [Configuration Reference](docs/configuration-reference.md) for details on the configurations.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvesoft-inc%2Fnebula-importer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvesoft-inc%2Fnebula-importer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvesoft-inc%2Fnebula-importer/lists"}