Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wgzhao/addax
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
https://github.com/wgzhao/addax
clickhouse database etl excel hadoop hdfs hive impala influxdb kudu mysql oracle postgresql sqlserver trino
Last synced: 6 days ago
JSON representation
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
- Host: GitHub
- URL: https://github.com/wgzhao/addax
- Owner: wgzhao
- License: apache-2.0
- Created: 2019-07-17T13:58:01.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-10-29T08:43:40.000Z (3 months ago)
- Last Synced: 2024-10-29T09:52:20.327Z (3 months ago)
- Topics: clickhouse, database, etl, excel, hadoop, hdfs, hive, impala, influxdb, kudu, mysql, oracle, postgresql, sqlserver, trino
- Language: Java
- Homepage: https://wgzhao.github.io/Addax/
- Size: 38 MB
- Stars: 1,184
- Watchers: 33
- Forks: 300
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
- Security: SECURITY.md
- Support: support_data_sources.md
Awesome Lists containing this project
README
Addax is a versatile open-source ETL tool
The documentation describes in detail how to install and use the plugins. It provides detailed instructions and sample configuration documentation for each plugin.
English | [简体中文](README_zh.md)
The project's initial code originated from Ali's [DataX](https://github.com/alibaba/datax), and has been greatly improved on this basis.
It also provides more read and write plugins. For more details, please refer to the [difference document](difference.md).## Supported Data Sources
Addax supports more than 20 SQL and NoSQL [data sources](support_data_sources.md). It can also be extended to support more.
## Getting Started
### Use docker image
```shell
docker pull quay.io/wgzhao/addax:latest
docker run -ti --rm --name addax quay.io/wgzhao/addax:latest /opt/addax/bin/addax.sh /opt/addax/job/job.json
```
If you want to use common reader and writer plugins, you can pull the image whose name ends with `-lite`, it's very small.```shell
docker pull quay.io/wgzhao/addax:latest-lite
docker run -ti --rm --name addax quay.io/wgzhao/addax:latest-lite /opt/addax/bin/addax.sh /opt/addax/job/job.json
```[here][lite-vs-default.md] is the difference between the default image and the lite image.
### Use install script
```shell
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/wgzhao/Addax/master/install.sh)"
```This script installs Addax to its preferred prefix (/usr/local for macOS Intel, /opt/addax for Apple Silicon and /opt/addax/ for Linux)
### Compile and Package
```shell
git clone https://github.com/wgzhao/addax.git addax
cd addax
mvn clean package
mvn package assembly:single
```After successful compilation and packaging, a `addax-` folder will be created in the `target/datax` directory of the project directory, where `
Click to expand```shell
$ bin/addax.sh job/job.json
___ _ _
/ _ \ | | | |
/ /_\ \ __| | __| | __ ___ __
| _ |/ _` |/ _` |/ _` \ \/ /
| | | | (_| | (_| | (_| |> <
\_| |_/\__,_|\__,_|\__,_/_/\_\:: Addax version :: (v4.0.13-SNAPSHOT)
2023-05-14 11:43:38.040 [ main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-05-14 11:43:38.062 [ main] INFO Engine -
{
"setting":{
"speed":{
"byte":-1,
"channel":1,
"record":-1
}
},
"content":{
"reader":{
"name":"streamreader",
"parameter":{
"sliceRecordCount":10,
"column":[
{
"value":"addax",
"type":"string"
},
{
"value":19890604,
"type":"long"
},
{
"value":"1989-06-04 11:22:33 123456",
"type":"date",
"dateFormat":"yyyy-MM-dd HH:mm:ss SSSSSS"
},
{
"value":true,
"type":"bool"
},
{
"value":"test",
"type":"bytes"
}
]
}
},
"writer":{
"name":"streamwriter",
"parameter":{
"print":true,
"encoding":"UTF-8"
}
}
}
}2023-05-14 11:43:38.092 [ main] INFO JobContainer - The jobContainer begins to process the job.
2023-05-14 11:43:38.107 [ job-0] INFO JobContainer - The Reader.Job [streamreader] perform prepare work .
2023-05-14 11:43:38.107 [ job-0] INFO JobContainer - The Writer.Job [streamwriter] perform prepare work .
2023-05-14 11:43:38.108 [ job-0] INFO JobContainer - Job set Channel-Number to 1 channel(s).
2023-05-14 11:43:38.108 [ job-0] INFO JobContainer - The Reader.Job [streamreader] is divided into [1] task(s).
2023-05-14 11:43:38.108 [ job-0] INFO JobContainer - The Writer.Job [streamwriter] is divided into [1] task(s).
2023-05-14 11:43:38.130 [ job-0] INFO JobContainer - The Scheduler launches [1] taskGroup(s).
2023-05-14 11:43:38.138 [ taskGroup-0] INFO TaskGroupContainer - The taskGroupId=[0] started [1] channels for [1] tasks.
2023-05-14 11:43:38.141 [ taskGroup-0] INFO Channel - The Channel set byte_speed_limit to -1, No bps activated.
2023-05-14 11:43:38.141 [ taskGroup-0] INFO Channel - The Channel set record_speed_limit to -1, No tps activated.
addax 19890604 1989-06-04 11:24:36 true test
addax 19890604 1989-06-04 11:24:36 true test
addax 19890604 1989-06-04 11:24:36 true test
addax 19890604 1989-06-04 11:24:36 true test
addax 19890604 1989-06-04 11:24:36 true test
addax 19890604 1989-06-04 11:24:36 true test
addax 19890604 1989-06-04 11:24:36 true test
addax 19890604 1989-06-04 11:24:36 true test
addax 19890604 1989-06-04 11:24:36 true test
addax 19890604 1989-06-04 11:24:36 true test
2023-05-14 11:43:41.157 [ job-0] INFO AbstractScheduler - The scheduler has completed all tasks.
2023-05-14 11:43:41.158 [ job-0] INFO JobContainer - The Writer.Job [streamwriter] perform post work.
2023-05-14 11:43:41.159 [ job-0] INFO JobContainer - The Reader.Job [streamreader] perform post work.
2023-05-14 11:43:41.162 [ job-0] INFO StandAloneJobContainerCommunicator - Total 10 records, 260 bytes | Speed 86B/s, 3 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-05-14 11:43:41.596 [ job-0] INFO JobContainer -
Job start at : 2023-05-14 11:43:38
Job end at : 2023-05-14 11:43:41
Job took secs : 3ss
Average bps : 86B/s
Average rps : 3rec/s
Number of rec : 10
Failed record : 0
```[Here](core/src/main/job) and [Here](docs/assets/jobs) provides all kinds of job configuration examples
## Runtime Requirements
- JDK 1.8+
- Python 2.7+ / Python 3.7+ (Windows)## Documentation
- [online](https://wgzhao.github.io/Addax/)
- [project](docs/index.md)### compile
First, you need install the following python3 modules
```python
python3 -m pip install mkdocs-material
```you can using `mkdocs` command to build or preview on local
```shell
mkdocs build
mkdocs serve -a 0.0.0.0:8888
```using the following command to publish release doc
```shell
export version=4.1.5
git checkout $version
mike deploy $version
git checkout gh-pages
git push -u origin gh-pages
```## Code Style
We recommend you use IntelliJ as your IDE. The code style template for the project can be found in the [codestyle](https://github.com/airlift/codestyle) repository along with our general programming and Java guidelines. In addition to those you should also adhere to the following:
* Alphabetize sections in the documentation source files (both in table of contents files and other regular documentation files). In general, alphabetize methods/variables/sections if such ordering already exists in the surrounding code.
* When appropriate, use the Java 8 stream API. However, note that the stream implementation does not perform well so avoid using it in inner loops or otherwise performance sensitive sections.
* Categorize errors when throwing exceptions. For example, AddaxException takes an error code and error message as arguments, `AddaxException(REQUIRE_VALUE, "lack of required item")`. This categorization lets you generate reports, so you can monitor the frequency of various failures.
* Ensure that all files have the appropriate license header; you can generate the license by running `mvn license:format`.
* Consider using String formatting (printf style formatting using the Java `Formatter` class): `format("Session property %s is invalid: %s", name, value)` (note that `format()` should always be statically imported). Sometimes, if you only need to append something, consider using the `+` operator.
* Avoid using the ternary operator except for trivial expressions.
* Use an assertion from Airlift's `Assertions` class if there is one that covers your case rather than writing the assertion by hand. Over time, we may move over to more fluent assertions like AssertJ.
* When writing a Git commit message, follow these [guidelines](https://chris.beams.io/posts/git-commit/).## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=wgzhao/Addax&type=Date)](https://star-history.com/#wgzhao/Addax&Date)
## License
This software is free to use under the Apache License [Apache license](/LICENSE).
## Special Thanks
Special thanks to [JetBrains](https://jb.gg/OpenSource) for his supports to this project.