Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/apache/doris-streamloader
Stream Loader for Apache Doris
https://github.com/apache/doris-streamloader
bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql
Last synced: 4 months ago
JSON representation
Stream Loader for Apache Doris
- Host: GitHub
- URL: https://github.com/apache/doris-streamloader
- Owner: apache
- License: apache-2.0
- Created: 2024-01-17T09:47:53.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-08-08T11:39:30.000Z (6 months ago)
- Last Synced: 2024-10-01T01:08:42.225Z (4 months ago)
- Topics: bigquery, database, dbt, delta-lake, elt, etl, hadoop, hive, hudi, iceberg, lakehouse, olap, query-engine, real-time, redshift, snowflake, spark, sql
- Language: Go
- Homepage: https://doris.apache.org
- Size: 64.5 KB
- Stars: 14
- Watchers: 40
- Forks: 13
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Apache Doris Streamloader
A robust, high-performance and user-friendly alternative to the traditional curl-based Stream Load.
## Key Features
- **Parallel Loading**: Split data files automatically and perform parallel loading
- **Support for Multiple Files and Directories**: Support multiple files and directories load with one shot
- **Path Traversal Support**: Support path traversal when the source files are in directories
- **Resilience and Continuity**: Resume loading from previous failures and cancellations
- **Automatic Retry Mechanism**: Retry automatically when failure
- **Comprehensive and Concise Input Parameters**## Usage
```shell
doris-streamloader --source_file={FILE_LIST} --url={FE_OR_BE_SERVER_URL}:{PORT} --header={STREAMLOAD_HEADER} --db={TARGET_DATABASE} --table={TARGET_TABLE}
```- `FILE_LIST`: directory or file list, support \* wildcard
- `FE_OR_BE_SERVER_URL` & `PORT`: Doris FE or BE hostname or IP and HTTP port
- `STREAMLOAD_HEADER`: supports all headers as `curl` Stream Load does,multiple headers are separated by '?'
- `TARGET_DATABASE` & `TARGET_TABLE`: indicate the target database and table where the data will be loadede.g.:
```shell
doris-streamloader --source_file="data.csv" --url="http://localhost:8330" --header="column_separator:|?columns:col1,col2" --db="testdb" --table="testtbl"
```For additional details and options, refer to our comprehensive docs below.
## Docs
[User Guide](https://doris.apache.org/docs/ecosystem/doris-streamloader)
[中文使用文档](https://doris.apache.org/zh-CN/docs/ecosystem/doris-streamloader)
## Build
To build Streamloader, ensure you have golang installed (version >= 1.19.9). For example, on CentOS:
```
yum install golang
```Then, navigate to the doris-streamloader directory and execute:
```
cd doris-streamloader && sh build.sh
```## License
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)