https://github.com/eea/eea-crawler
EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).
https://github.com/eea/eea-crawler
airflow-dags crawler elasticsearch etl-pipeline indexing
Last synced: 7 months ago
JSON representation
EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).
- Host: GitHub
- URL: https://github.com/eea/eea-crawler
- Owner: eea
- Created: 2021-05-31T11:41:24.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2025-02-03T13:41:29.000Z (8 months ago)
- Last Synced: 2025-02-03T14:40:26.204Z (8 months ago)
- Topics: airflow-dags, crawler, elasticsearch, etl-pipeline, indexing
- Language: Python
- Homepage:
- Size: 483 KB
- Stars: 1
- Watchers: 4
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Airflow and Logstash configurations for EEA-Crawler
See https://github.com/eea/eea.docker.airflow for integration and deployment.
Building this docker image will create a volume that contains the Python code
DAGs required for the crawler operation.[Documentation for Airflow EEA Semantic Search Indexer](./DOCUMENTATION.md)