Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists by Stefen-Taime
A curated list of projects in awesome lists by Stefen-Taime .
https://github.com/stefen-taime/scalable-rss-feed-pipeline
In this article, we'll walk through how to build a scalable ETL pipeline using Apache Airflow, Kafka, and Python, Mongo and Flask
Last synced: 16 Nov 2024
https://github.com/stefen-taime/nifi-etl-data-pipeline
This post will demonstrate the creation of a containerized data engineer environment using Docker Stacks.
apache api big-data cloud data-analysis data-engineering docker-compose etl-pipeline machine-learning nifi postgresql-database slack zookeeper
Last synced: 16 Nov 2024
https://github.com/stefen-taime/projet_data
Utilizing of open source technologies for the implementation of a data pipeline
Last synced: 16 Nov 2024
https://github.com/stefen-taime/kafka-pipeline
In the following post, we will learn how to build a data pipeline using a combination of open-source software (OSS), including Debezium, Apache Kafka, Kafka Connect.
bash data docker elasticsearch etl-pipeline k kafka kafka-connect kafka-streams kafka-topic kibana ksqldb masking mongodb mysql pii pipeline postgresql
Last synced: 16 Nov 2024
https://github.com/stefen-taime/stream-ingestion-redpanda-minio
In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO, and Apache Spark.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/uber_projet
Unveiling the true cost of your ride-sharing and food delivery habits with an ELT data pipeline, PostgreSQL, dbt, and Power BI.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/investissement
Jenkins Delta pipeline
delta-lake jenkins-pipeline minio spark
Last synced: 13 Oct 2024
https://github.com/stefen-taime/build_api_devops_pipeline
cicd docker jenkins-pipeline kubernetes
Last synced: 13 Oct 2024
https://github.com/stefen-taime/airflow_etl
The Pipeline for updating data between OLTP and OLAP environments
Last synced: 16 Nov 2024
https://github.com/stefen-taime/iceberg-dbt-trino-hive-modern-open-source-data-stack
To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a music streaming platform, let’s delve into the detailed workflow and benefits of each component.
data dbt hive iceberg modern trinodb
Last synced: 16 Nov 2024
https://github.com/stefen-taime/moderndataengineerpipeline
Building a Robust Data Pipeline: Integrating Proxy Rotation, Kafka, MongoDB, Redis, Logstash, Elasticsearch, and MinIO for Efficient Web Scraping
auth0 connect dataengineering docker-compose elasticsearch fastapi kafka logstash minio mongodb proxy redis
Last synced: 16 Nov 2024
https://github.com/stefen-taime/real-time-data-processing-and-analysis-with-kafka-connect-ksql-elasticsearch-and-flask
The project aims to demonstrate how to work with real-time data using Kafka, KSQL, Elasticsearch, and Flask. It shows how to perform joins on Kafka topics, ingest data into Elasticsearch using Kafka Connect, and build a REST API to provide real-time metrics to end-users.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/etl-data-pipeline-rdbms-to-hdfs-using-airflow-apache-sqoop-spark-postgres-and-hive
This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)
airflow big-data data docker-compose etl-pipeline hdfs hive infrastructure-as-code rdbms spark sql sqoop
Last synced: 16 Nov 2024
https://github.com/stefen-taime/docsearch
Our project is a testament to this need, offering a comprehensive solution that combines modern technologies and architectures to create a powerful document search engine. This engine is not just a tool but a sophisticated ecosystem designed to handle complex data processing and retrieval tasks.
docker elasticsearch fastapi hdfs kafka logstash mongodb nifi sftp tika-server
Last synced: 16 Nov 2024
https://github.com/stefen-taime/stefen-taime
Config files for my GitHub profile.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/modern-data-pipeline
reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/master-airflow
big data, the ability to extract, transform, and store data from the web into different storage systems like MongoDB, PostgreSQL, MinIO, and Elasticsearch is a crucial skill for developers and data scientists
Last synced: 16 Nov 2024
https://github.com/stefen-taime/ia_data_pipeline
The goal is to develop an intuitive platform where users can search for Airbnb apartments based on a target city, budget, and duration of stay, all powered by the intelligent language model, GPT-3.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/free-real-time-flight-status-pipeline
real-time flight status data pipeline using a myriad of technologies such as Kafka, Schema Registry, Avro, GraphQL, Postgres, and React.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/real-time-data-pipeline-snake-game
Dynamic Snake Game: Unleashing Real-Time Streaming Analytics with Redis, Kafka, Flink, ClickHouse & Chart.js in an Online Snake Game via Flask API
chartjs clickhouse confluent-cloud data flask kafka-streams pipeline redis
Last synced: 16 Nov 2024
https://github.com/stefen-taime/us-election
Creating a Real-Time Election Monitoring System Using MongoDB, Spark, SMS Notifications, and Dash
Last synced: 16 Nov 2024
https://github.com/stefen-taime/sample_dbt_project
The goal of this dbt project is to analyze music streaming data to determine listening trends, user preferences and popular genres. The project consists of SQL templates, tests and macros to transform and verify the data
Last synced: 16 Nov 2024
https://github.com/stefen-taime/devops-bash-script
This repository contains a collection of bash scripts for common DevOps tasks, such as installing software, setting up environments, and managing resources.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/visualizing-bitcoin-pipeline
Visualize the exchange rate of Bitcoin to USD using FastAPI, Prometheus, Grafana Docker And Jenkins
Last synced: 16 Nov 2024
https://github.com/stefen-taime/how-to-automatically-deploy-a-flask-application-on-an-ec2-instance-with-a-bash-script
The main motivation for this mini-project is to get familiar with using Bash Scripting and the AWS CLI to automate command line tasks. This particular repo contains a configuration script that automatically creates an EC2 instance, accesses it via SSH, installs dependencies and hosts a simple Flask application using the image taken from Docker Hub.
autonation bash-scripting cloud-computing devops docker-image ec2-instance flask-application iac
Last synced: 16 Nov 2024
https://github.com/stefen-taime/big-o-algorithm
we’ll explain Big O notation an real-world Python examples to illustrate how it can be applied to various time complexities.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/ingest-data
Big data application for multi-source data ingestion
Last synced: 16 Nov 2024
https://github.com/stefen-taime/docker-stack
directory with different docker-compose file to quickly start an infrastructure
Last synced: 16 Nov 2024
https://github.com/stefen-taime/eventmusic
EventMusic Producer is a Dockerized application designed to read data and output them to a Kafka topic, using Avro schemas for data serialization. It integrates seamlessly with Kafka and the Schema Registry to manage the flow of event data linked to music event information.
confluent-kafka docker events music open-source real-time
Last synced: 16 Nov 2024
https://github.com/stefen-taime/car-price-predictor
Predicting Car Prices with FastAPI, Streamlit, MLflow, Kafka, and Debezium: A Practical Demonstration
data data-science dataanalysis-projects engineering machine-learning mlops predictive-modeling
Last synced: 16 Nov 2024
https://github.com/stefen-taime/llm-rag-mtl-public-hospital
Ce projet développe un modèle de type Retrieve-Augment-Generate (RAG) pour répondre aux questions en utilisant les données publiques des avis laissés sur Google pour des hôpitaux à Montréal
data google-reviews hopital hospital hub ia llm montreal open-source quebec rag
Last synced: 16 Nov 2024
https://github.com/stefen-taime/real-time-extraction-transformation-and-exposure-architecture-for-rail-data
we are thrilled to announce our new PoC project aimed at providing a complete real-time extraction, transformation, and exposure architecture for the new provincial transportation systems.
api backend datacontract dataengineering elasticsearch flink-stream-processing frontend kafka kibana microservices nextjs python3 reactjs schemas software-engineering sql
Last synced: 16 Nov 2024
https://github.com/stefen-taime/-google-analytics-360
Welcome to the Google Analytics 360 Dataset Project! This repository is designed for anyone interested in working with realistic Google Analytics data. Whether you're a data scientist, a student, or a marketing analyst
analytics datasets google googleanalytics
Last synced: 16 Nov 2024
https://github.com/stefen-taime/myubereats_datapipeline
Building a Modern Uber Eats Data Pipeline
airflow api data datawarehouse mongodb pipeline powerbi snowflake
Last synced: 16 Nov 2024
https://github.com/stefen-taime/azurepipeline
Azure Data Pipeline
azure databricks datalake http terraform vault
Last synced: 16 Nov 2024
https://github.com/stefen-taime/mongoelasticmigrator
This tool migrates data from MongoDB collections to Elasticsearch indices. It's built using Rust and supports configurable migrations.
migration-tool nosql-database rust
Last synced: 16 Nov 2024
https://github.com/stefen-taime/gmail-to-mongodb-script
This script facilitates the automation of fetching emails from a user's Gmail account and storing them into a MongoDB database. The emails fetched are filtered by specific labels such as Promotions, Social, Updates, and Forums. The script is intended to run continuously, checking for new emails every minute.
Last synced: 16 Nov 2024
https://github.com/stefen-taime/realtime-race-mapper
In this rendition, Elastic and Kibana have been replaced with the powerful Splunk, MQTT has been swapped out for ActiveMQ, and instead of the traditional Kafka, we’ve integrated Confluent Cloud.
Last synced: 16 Nov 2024