https://github.com/dyrnq/cdc-vagrant
CDC(Change Data Capture) is made up of two components, the CDD and the CDT. CDD is stand for Change Data Detection and CDT is stand for Change Data Transfer.
https://github.com/dyrnq/cdc-vagrant
apache-doris cdc debezium etl flink flink-cdc flink-doris-connector flink-sql flink-stream-processing hadoop minio-cluster vagrantfile
Last synced: 3 months ago
JSON representation
CDC(Change Data Capture) is made up of two components, the CDD and the CDT. CDD is stand for Change Data Detection and CDT is stand for Change Data Transfer.
- Host: GitHub
- URL: https://github.com/dyrnq/cdc-vagrant
- Owner: dyrnq
- Created: 2022-11-26T13:19:59.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-30T08:51:51.000Z (6 months ago)
- Last Synced: 2024-11-30T09:27:31.892Z (6 months ago)
- Topics: apache-doris, cdc, debezium, etl, flink, flink-cdc, flink-doris-connector, flink-sql, flink-stream-processing, hadoop, minio-cluster, vagrantfile
- Language: Shell
- Homepage: https://nightlies.apache.org/flink/flink-cdc-docs-stable/
- Size: 136 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 29
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# cdc-vagrant
- [cdc-vagrant](#cdc-vagrant)
- [introduce](#introduce)
- [architecture](#architecture)
- [zookeeper cluster](#zookeeper-cluster)
- [flink cluster](#flink-cluster)
- [flink cdc](#flink-cdc)
- [flink-cdc version matrix](#flink-cdc-version-matrix)
- [ref](#ref)## introduce
This project is for experiment of flink-cdc and doris.
CDC(Change Data Capture) is made up of two components, the CDD and the CDT. CDD is stand for Change Data Detection and CDT is stand for Change Data Transfer.
Extract, Load, Transform (ELT) is a data integration process for transferring raw data from a source server to a data system (such as a data warehouse or data lake) on a target server and then preparing the information for downstream uses.
Streaming ETL (Extract, Transform, Load) is the processing and movement of real-time data from one place to another. ETL is short for the database functions extract, transform, and load.
## architecture
### zookeeper cluster
| vm | role | ip | xxx_home |
|-------|-----------|----------------|----------------|
| vm116 | zookeeper | 192.168.56.116 | /opt/zookeeper |
| vm117 | zookeeper | 192.168.56.117 | /opt/zookeeper |
| vm118 | zookeeper | 192.168.56.118 | /opt/zookeeper |```bash
cd /opt/zookeeper
./bin/zkServer.sh status
``````bash
echo stat | nc 127.0.0.1 2181
Zookeeper version: 3.8.4-9316c2a7a97e1666d8f4593f34dd6fc36ecc436c, built on 2024-02-12 22:16 UTC
Clients:
/127.0.0.1:39370[0](queued=0,recved=1,sent=0)Latency min/avg/max: 0/6.375/41
Received: 10
Sent: 9
Connections: 1
Outstanding: 0
Zxid: 0x100000003
Mode: follower
Node count: 5
```### flink cluster
| vm | role | ip | xxx_home |
|-------|--------|----------------|------------|
| vm116 | minio | 192.168.56.116 | /opt/minio |
| vm117 | minio | 192.168.56.117 | /opt/minio |
| vm118 | minio | 192.168.56.118 | /opt/minio |
| vm119 | minio | 192.168.56.119 | /opt/minio || vm | role | ip | xxx_home |
|-------|----------------------------------|----------------|------------|
| vm116 | sidekick, flink(masters+workers) | 192.168.56.116 | /opt/flink |
| vm117 | sidekick, flink(masters+workers) | 192.168.56.117 | /opt/flink |
| vm118 | sidekick, flink(masters+workers) | 192.168.56.118 | /opt/flink |
| vm119 | sidekick, flink(workers) | 192.168.56.119 | /opt/flink |> ssh init
```bash
vagrant ssh vm116
## vm116 vm117 vm118 vm119 四台分别执行
su -l root
bash /vagrant/scripts/ssh-copy-id.sh --iface enp0s8 --ips "192.168.56.116,192.168.56.117,192.168.56.118,192.168.56.119"
```> minio init
```bash
vagrant ssh vm116
## 只需要在vm116上执行,且执行一次即可
curl -o /usr/local/bin/mc -# -fSL https://files.m.daocloud.io/dl.min.io/client/mc/release/linux-amd64/mc
chmod +x /usr/local/bin/mcmc alias set myminio http://localhost:9000 minioadmin minioadmin
mc admin user svcacct add --access-key "u5SybesIDVX9b6Pk" --secret-key "lOpH1v7kdM6H8NkPu1H2R6gLc9jcsmWM" myminio minioadmin
# mc admin user svcacct add --access-key "myuserserviceaccount" --secret-key "myuserserviceaccountpassword" myminio minioadminmc mb myminio/flink
mc mb myminio/flink-state
``````bash
## flink集群启动
su -l hduser
cd /opt/flink## start-cluster
bin/start-cluster.sh
## stop-cluster
bin/stop-cluster.sh
## 测试一下
bin/flink run /opt/flink/examples/streaming/WordCount.jar
```### flink cdc
vagrant ssh vm211
```bash
cd /vagrant/doris
docker compose exec doris-fe mysql -uroot -P9030 -h127.0.0.1 -e "show backends; show frontends;"
``````bash
vagrant ssh vm116flink_cdc_home="/opt/flink-cdc"
pushd $flink_cdc_home || exit 1
./bin/flink-cdc.sh /vagrant/doris/mysql-to-doris.yaml --jar lib/mysql-connector-java-8.0.27.jar --flink-home /opt/flink
popd || exit 1```
### fink-sql-client
```bash
bin/sql-client.sh
SET execution.checkpointing.interval = 6000;```
### flink-cdc version matrix
see
## ref
- [Deprecated Properties](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html)
- [HDFS High Availability](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html)
- [HDFS High Availability Using the Quorum Journal Manager](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html)
- [ResourceManager High Availability](https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html)
- [High Performance HTTP Sidecar Load Balancer](https://github.com/minio/sidekick)
- [https://github.com/apache/doris/tree/master/samples/doris-demo](https://github.com/apache/doris/tree/master/samples/doris-demo)
- [Streaming ELT from MySQL to Doris](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/get-started/quickstart/mysql-to-doris/)