https://github.com/kbo-data-portal/pipeline
Automates KBO data collection and deployment with Airflow.
https://github.com/kbo-data-portal/pipeline
airflow dbt kbo lightgbm python scikit-learn
Last synced: 9 months ago
JSON representation
Automates KBO data collection and deployment with Airflow.
- Host: GitHub
- URL: https://github.com/kbo-data-portal/pipeline
- Owner: kbo-data-portal
- License: mit
- Created: 2025-06-24T08:56:30.000Z (12 months ago)
- Default Branch: master
- Last Pushed: 2025-09-02T04:01:06.000Z (10 months ago)
- Last Synced: 2025-09-02T05:36:11.526Z (10 months ago)
- Topics: airflow, dbt, kbo, lightgbm, python, scikit-learn
- Language: Python
- Homepage: https://pf.kakao.com/_xnxeYBn
- Size: 110 KB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# KBO Data Portal - Pipeline
This repository automates the collection and deployment of KBO data using Apache Airflow.
It manages the entire data flow from collection to visualization in the KBO Data Portal.
## Feature
- Automates data collection and processing using Apache Airflow
- Docker Compose-based deployment for local development
- Integrates with Collector and API Server via Git submodules
- Manages ETL workflows and data flow to visualization layer
## Installation
1. **Clone the repository with submodules**
```bash
git clone --recurse-submodules https://github.com/kbo-data-portal/pipeline.git
cd pipeline
```
Ensure that your GCP service account key is placed in the `config` folder and renamed to `key.json`.
2. **Initialize submodules (if not cloned with --recurse-submodules)**
```bash
git submodule update --init --remote
```
## Usage
1. **Start Airflow services using Docker Compose**
```bash
docker-compose up -d
```
2. **Access the Airflow Web UI**
- Open your browser and navigate to http://localhost:8080/
- Login credentials:
- Username: `admin`
- Password: `admin`
3. **Update submodules (if needed)**
```bash
git submodule update --remote
```
## Submodules
This project uses Git submodules to manage external components:
- [`collector`](https://github.com/kbo-data-portal/collector): Handles data collection logic.
- [`api-server`](https://github.com/kbo-data-portal/api-server): Flask-based web application for data visualization.
## License
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.