https://github.com/igor-suhorukov/arrow_to_database
Import data from Arrow Dataset API into relational DB via JDBC
https://github.com/igor-suhorukov/arrow_to_database
arrow h2-database jdbc-connector orc parquet postgresql questdb
Last synced: about 2 months ago
JSON representation
Import data from Arrow Dataset API into relational DB via JDBC
- Host: GitHub
- URL: https://github.com/igor-suhorukov/arrow_to_database
- Owner: igor-suhorukov
- Created: 2022-08-18T13:51:06.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2023-11-27T17:27:30.000Z (over 2 years ago)
- Last Synced: 2026-02-13T09:52:51.223Z (4 months ago)
- Topics: arrow, h2-database, jdbc-connector, orc, parquet, postgresql, questdb
- Language: Java
- Homepage:
- Size: 184 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Import data from Arrow Dataset API into relational DB via JDBC
This CLI utility allow import data from file system in PARQUET and ARROW_IPC file format into relational database.
Apache Arrow dependencies and postgresql JDBC driver included in result executable jar file
Build command:
mvn package
Import utility usage:
java -jar arrow_to_database-1.0-SNAPSHOT.jar --help
Options:
-dataset
file:// prefexed URI of Arrow dataset
https://arrow.apache.org/docs/python/dataset.html
-dataset_format
Dataset binary file format: PARQUET, ARROW_IPC, ORC, CSV, JSON
Default: PARQUET
-batch_size
Batch size to fetch and sent into DB
https://arrow.apache.org/docs/python/dataset.html#customizing-the-batch-size
Default: 10000
-db_dialect
Database dialect for Arrow->DB type mapping: POSTGRESQL, H2, QUESTDB
Default: POSTGRESQL
-jdbc_driver
Jdbc driver class name, provide only in case of 'No suitable driver
found' issue. For example for H2 org.h2.Driver , for PostgreSQL
org.postgresql.Driver
-jdbc_url
JDBC connection URL
-username
JDBC connection user name
-password
JDBC password. You can enter password in console interactive in case if not provide value after '-password' parameter
-table_name
Table name to import dataset into it
Default: arrow_import
-create_table
If parameter provided then create table in database before import.
'temporary' create temporary table - it can be useful for testing
purpose like dry run.
-insert_sql_query
Custom SQL insert query
Parameters example:
java -jar arrow_to_database-1.0-SNAPSHOT.jar -jdbc_url jdbc:postgresql://127.0.0.1:5432/osmworld -username postgres -dataset file:///home/build/dev/arrow/nodes -table_name arrow_import_nodes -create_table yes -password
