https://github.com/yuhexiong/airflow-dag-kafka-flink-doris-python
https://github.com/yuhexiong/airflow-dag-kafka-flink-doris-python
airflow airflow-dags airflow-docker apache-airflow apache-doris apache-flink apache-kafka doris flink kafka workflow
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/yuhexiong/airflow-dag-kafka-flink-doris-python
- Owner: yuhexiong
- Created: 2024-05-29T03:21:48.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-25T01:33:17.000Z (12 months ago)
- Last Synced: 2025-01-30T01:14:42.711Z (4 months ago)
- Topics: airflow, airflow-dags, airflow-docker, apache-airflow, apache-doris, apache-flink, apache-kafka, doris, flink, kafka, workflow
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Airflow Dag Kafka/Flink/Doris
use airflow to maintain operation of pipeline and database.
## Overview
- Tool: Apache Airflow v2.9.1
- Language: Python v3.8## Run
### Run Docker
```
docker compose up -d
```### ENV
setting config/airflow.cfg
```
smtp_host = smtp.gmail.com
smtp_starttls = True
smtp_ssl = False
smtp_user = [email protected]
smtp_password = pass
smtp_port = 587
smtp_mail_from = [email protected]
```### UI Page
Open http://localhost:8080 to run DAGs in UI.
## DAGs
### 1. Check Kafka Updates
confirm that new data comes in every day in kafka's topic.- setting variables
```
bootstrap_servers: localhost:9093
topic_name: topic1
log_file: local log file path to collect errors and send them once.
```### 2. Check Flink Job State
check all jobs on flink are running every day.- setting variables
```
flink_url: localhost:8081
log_file: local log file path to collect errors and send them once.
```### 3. Check Flink Job And Upload
check the jobs are running normally according to the configuration file yaml every day.
If there are errors, cancel and rerun them.- setting variables
```
log_file: local log file path to collect errors and send them once.
```- flink_pipeline.yaml
```
host: http://localhost:8081
jars:
- jar_id: xxx_xxx_flink-1.0.jar
jobs:
- entry_class: com.examples.entry.KafkaToKafka
program_file: https://storage.googleapis.com/xxx/KafkaToKafka.yaml
job_name: kafka(topic-1) -> kafka(topic-2)
- entry_class: com.examples.entry.KafkaToDoris
program_file: https://storage.googleapis.com/xxx/KafkaToDoris.yaml
job_name: kafka(topic-1) -> doris(table-1)
```### 4. Check Doris Updates
check doris has new data coming in every day.- setting variables
```
doris_host: localhost
username: root
password: password
database: database
doris_table: doris_table
doris_timestamp_column: timestamp
log_file: local log file path to collect errors and send them once.
```### 5. Send Exception Email
check there is any data in the log file every day. If there is any data in the log file, send an email.- setting variables
```
log_file: local log file path to collect errors and send them once.
```