Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aragonski97/fenrir-infra
IN DEVELOPMENT! Complete data infrastructure on Docker Swarm exposed on Tailscale network
https://github.com/aragonski97/fenrir-infra
airflow debezium-connector docker docker-swarm kafdrop kafka kafka-connect kafka-registry metabase portainer postgresql scrappy spark tailscale zookeeper zoonavigator
Last synced: about 11 hours ago
JSON representation
IN DEVELOPMENT! Complete data infrastructure on Docker Swarm exposed on Tailscale network
- Host: GitHub
- URL: https://github.com/aragonski97/fenrir-infra
- Owner: Aragonski97
- Created: 2024-12-17T20:13:59.000Z (5 days ago)
- Default Branch: main
- Last Pushed: 2024-12-17T22:03:56.000Z (5 days ago)
- Last Synced: 2024-12-17T23:18:39.732Z (5 days ago)
- Topics: airflow, debezium-connector, docker, docker-swarm, kafdrop, kafka, kafka-connect, kafka-registry, metabase, portainer, postgresql, scrappy, spark, tailscale, zookeeper, zoonavigator
- Language: Python
- Homepage:
- Size: 41 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fenrir
Welcome to the **Fenrir Data Platform** — a modern data infrastructure stack designed for real-time data ingestion, processing, and visualization. Built with **Docker Swarm** and powered by some of the most robust open-source data tools available, this platform aims to simplify complex data workflows while remaining flexible, modular, and easy to deploy.
## 📦 **Technologies Used**
This platform integrates the following open-source technologies:
- **Docker Swarm**: Orchestrates and manages multiple containers across nodes.
- **Tailscale**: Secure, private network overlay to enable secure remote access.
- **Portainer**: Simple, visual container management.
- **Airflow**: Workflow orchestration and scheduling.
- **Kafka**: Real-time event streaming and message brokering.
- **Kafka Connect**: Enables data integration between Kafka and external systems.
- **Kafka Registry**: Manages and enforces schema versions for Kafka topics.
- **Kafdrop**: A web-based UI for visualizing and monitoring Kafka topics.
- **Scrapy**: Web scraping framework used to ingest data.
- **Spark**: Distributed big data processing and analytics.
- **PostgreSQL**: Relational database for persistent storage.
- **Metabase**: Business intelligence and analytics dashboard for visualizing data.## 🌐 **What Does This Platform Do?**
This platform is a full-featured **data infrastructure stack** that can:
- **Ingest data** from web scrapers (Scrapy), relational databases (PostgreSQL via Debezium, etc.), and other third-party systems using Kafka Connect.
- **Process data** in real-time using Kafka, Spark, and streaming workflows.
- **Schedule workflows** using Airflow, enabling batch and continuous processing.
- **Manage infrastructure** using Docker Swarm for orchestration and Portainer for visual container management.
- **Visualize data** with Metabase, providing a no-code way to explore and visualize processed data.Whether you need to scrape, ingest, process, or visualize data, this platform is ready to support modern data engineering needs.
### **Prerequisites**
- Docker-ce Engine
- Tailscale if you want secure remote access, otherwise, please modify setup.sh for advertised address of docker swarm manager node.### **IN DEVELOPMENT, NOT PRODUCTION-READY**
### **Deploy the Platform**
```bash
# Clone the repository
git clone https://github.com/Aragonski97/fenrir-infra.git ~/.fenrir# Navigate to the project directory
cd ~/.fenrir# Deploy the platform
source setup.sh
```