Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists by Stefen-Taime

A curated list of projects in awesome lists by Stefen-Taime .

https://github.com/stefen-taime/scalable-rss-feed-pipeline

In this article, we'll walk through how to build a scalable ETL pipeline using Apache Airflow, Kafka, and Python, Mongo and Flask

Last synced: 16 Nov 2024

https://github.com/stefen-taime/nifi-etl-data-pipeline

This post will demonstrate the creation of a containerized data engineer environment using Docker Stacks.

apache api big-data cloud data-analysis data-engineering docker-compose etl-pipeline machine-learning nifi postgresql-database slack zookeeper

Last synced: 16 Nov 2024

https://github.com/stefen-taime/projet_data

Utilizing of open source technologies for the implementation of a data pipeline

Last synced: 16 Nov 2024

https://github.com/stefen-taime/kafka-pipeline

In the following post, we will learn how to build a data pipeline using a combination of open-source software (OSS), including Debezium, Apache Kafka, Kafka Connect.

bash data docker elasticsearch etl-pipeline k kafka kafka-connect kafka-streams kafka-topic kibana ksqldb masking mongodb mysql pii pipeline postgresql

Last synced: 16 Nov 2024

https://github.com/stefen-taime/open-source-data

This repository contains structured datasets in various categories

csv data json python3 xml

Last synced: 16 Nov 2024

https://github.com/stefen-taime/stream-ingestion-redpanda-minio

In this article, you will learn how to set up a real-time data processing and analytics environment using Docker, MySQL, Redpanda, MinIO, and Apache Spark.

Last synced: 16 Nov 2024

https://github.com/stefen-taime/uber_projet

Unveiling the true cost of your ride-sharing and food delivery habits with an ELT data pipeline, PostgreSQL, dbt, and Power BI.

Last synced: 16 Nov 2024

https://github.com/stefen-taime/airflow_etl

The Pipeline for updating data between OLTP and OLAP environments

Last synced: 16 Nov 2024

https://github.com/stefen-taime/iceberg-dbt-trino-hive-modern-open-source-data-stack

To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a music streaming platform, let’s delve into the detailed workflow and benefits of each component.

data dbt hive iceberg modern trinodb

Last synced: 16 Nov 2024

https://github.com/stefen-taime/moderndataengineerpipeline

Building a Robust Data Pipeline: Integrating Proxy Rotation, Kafka, MongoDB, Redis, Logstash, Elasticsearch, and MinIO for Efficient Web Scraping

auth0 connect dataengineering docker-compose elasticsearch fastapi kafka logstash minio mongodb proxy redis

Last synced: 16 Nov 2024

https://github.com/stefen-taime/real-time-data-processing-and-analysis-with-kafka-connect-ksql-elasticsearch-and-flask

The project aims to demonstrate how to work with real-time data using Kafka, KSQL, Elasticsearch, and Flask. It shows how to perform joins on Kafka topics, ingest data into Elasticsearch using Kafka Connect, and build a REST API to provide real-time metrics to end-users.

Last synced: 16 Nov 2024

https://github.com/stefen-taime/etl-data-pipeline-rdbms-to-hdfs-using-airflow-apache-sqoop-spark-postgres-and-hive

This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)

airflow big-data data docker-compose etl-pipeline hdfs hive infrastructure-as-code rdbms spark sql sqoop

Last synced: 16 Nov 2024

https://github.com/stefen-taime/docsearch

Our project is a testament to this need, offering a comprehensive solution that combines modern technologies and architectures to create a powerful document search engine. This engine is not just a tool but a sophisticated ecosystem designed to handle complex data processing and retrieval tasks.

docker elasticsearch fastapi hdfs kafka logstash mongodb nifi sftp tika-server

Last synced: 16 Nov 2024

https://github.com/stefen-taime/uego_search_engine

UeGo_Search_Engine

Last synced: 16 Nov 2024

https://github.com/stefen-taime/stefen-taime

Config files for my GitHub profile.

config github-config

Last synced: 16 Nov 2024

https://github.com/stefen-taime/modern-data-pipeline

reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.

Last synced: 16 Nov 2024

https://github.com/stefen-taime/master-airflow

big data, the ability to extract, transform, and store data from the web into different storage systems like MongoDB, PostgreSQL, MinIO, and Elasticsearch is a crucial skill for developers and data scientists

Last synced: 16 Nov 2024

https://github.com/stefen-taime/ia_data_pipeline

The goal is to develop an intuitive platform where users can search for Airbnb apartments based on a target city, budget, and duration of stay, all powered by the intelligent language model, GPT-3.

Last synced: 16 Nov 2024

https://github.com/stefen-taime/free-real-time-flight-status-pipeline

real-time flight status data pipeline using a myriad of technologies such as Kafka, Schema Registry, Avro, GraphQL, Postgres, and React.

Last synced: 16 Nov 2024

https://github.com/stefen-taime/real-time-data-pipeline-snake-game

Dynamic Snake Game: Unleashing Real-Time Streaming Analytics with Redis, Kafka, Flink, ClickHouse & Chart.js in an Online Snake Game via Flask API

chartjs clickhouse confluent-cloud data flask kafka-streams pipeline redis

Last synced: 16 Nov 2024

https://github.com/stefen-taime/pyjsoncsv

convert_json_to_csv

Last synced: 16 Nov 2024

https://github.com/stefen-taime/tdd

Developpement piloté par des test

Last synced: 16 Nov 2024

https://github.com/stefen-taime/us-election

Creating a Real-Time Election Monitoring System Using MongoDB, Spark, SMS Notifications, and Dash

Last synced: 16 Nov 2024

https://github.com/stefen-taime/sample_dbt_project

The goal of this dbt project is to analyze music streaming data to determine listening trends, user preferences and popular genres. The project consists of SQL templates, tests and macros to transform and verify the data

Last synced: 16 Nov 2024

https://github.com/stefen-taime/devops-bash-script

This repository contains a collection of bash scripts for common DevOps tasks, such as installing software, setting up environments, and managing resources.

Last synced: 16 Nov 2024

https://github.com/stefen-taime/visualizing-bitcoin-pipeline

Visualize the exchange rate of Bitcoin to USD using FastAPI, Prometheus, Grafana Docker And Jenkins

Last synced: 16 Nov 2024

https://github.com/stefen-taime/how-to-automatically-deploy-a-flask-application-on-an-ec2-instance-with-a-bash-script

The main motivation for this mini-project is to get familiar with using Bash Scripting and the AWS CLI to automate command line tasks. This particular repo contains a configuration script that automatically creates an EC2 instance, accesses it via SSH, installs dependencies and hosts a simple Flask application using the image taken from Docker Hub.

autonation bash-scripting cloud-computing devops docker-image ec2-instance flask-application iac

Last synced: 16 Nov 2024

https://github.com/stefen-taime/big-o-algorithm

we’ll explain Big O notation an real-world Python examples to illustrate how it can be applied to various time complexities.

Last synced: 16 Nov 2024

https://github.com/stefen-taime/ingest-data

Big data application for multi-source data ingestion

Last synced: 16 Nov 2024

https://github.com/stefen-taime/docker-stack

directory with different docker-compose file to quickly start an infrastructure

Last synced: 16 Nov 2024

https://github.com/stefen-taime/eventmusic

EventMusic Producer is a Dockerized application designed to read data and output them to a Kafka topic, using Avro schemas for data serialization. It integrates seamlessly with Kafka and the Schema Registry to manage the flow of event data linked to music event information.

confluent-kafka docker events music open-source real-time

Last synced: 16 Nov 2024

https://github.com/stefen-taime/car-price-predictor

Predicting Car Prices with FastAPI, Streamlit, MLflow, Kafka, and Debezium: A Practical Demonstration

data data-science dataanalysis-projects engineering machine-learning mlops predictive-modeling

Last synced: 16 Nov 2024

https://github.com/stefen-taime/llm-rag-mtl-public-hospital

Ce projet développe un modèle de type Retrieve-Augment-Generate (RAG) pour répondre aux questions en utilisant les données publiques des avis laissés sur Google pour des hôpitaux à Montréal

data google-reviews hopital hospital hub ia llm montreal open-source quebec rag

Last synced: 16 Nov 2024

https://github.com/stefen-taime/real-time-extraction-transformation-and-exposure-architecture-for-rail-data

we are thrilled to announce our new PoC project aimed at providing a complete real-time extraction, transformation, and exposure architecture for the new provincial transportation systems.

api backend datacontract dataengineering elasticsearch flink-stream-processing frontend kafka kibana microservices nextjs python3 reactjs schemas software-engineering sql

Last synced: 16 Nov 2024

https://github.com/stefen-taime/-google-analytics-360

Welcome to the Google Analytics 360 Dataset Project! This repository is designed for anyone interested in working with realistic Google Analytics data. Whether you're a data scientist, a student, or a marketing analyst

analytics datasets google googleanalytics

Last synced: 16 Nov 2024

https://github.com/stefen-taime/myubereats_datapipeline

Building a Modern Uber Eats Data Pipeline

airflow api data datawarehouse mongodb pipeline powerbi snowflake

Last synced: 16 Nov 2024

https://github.com/stefen-taime/mongoelasticmigrator

This tool migrates data from MongoDB collections to Elasticsearch indices. It's built using Rust and supports configurable migrations.

migration-tool nosql-database rust

Last synced: 16 Nov 2024

https://github.com/stefen-taime/gmail-to-mongodb-script

This script facilitates the automation of fetching emails from a user's Gmail account and storing them into a MongoDB database. The emails fetched are filtered by specific labels such as Promotions, Social, Updates, and Forums. The script is intended to run continuously, checking for new emails every minute.

Last synced: 16 Nov 2024

https://github.com/stefen-taime/realtime-race-mapper

In this rendition, Elastic and Kibana have been replaced with the powerful Splunk, MQTT has been swapped out for ActiveMQ, and instead of the traditional Kafka, we’ve integrated Confluent Cloud.

Last synced: 16 Nov 2024