Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists by san089
A curated list of projects in awesome lists by san089 .
https://github.com/san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
airflow airflow-operators aws aws-ec2 aws-s3 aws-sdk cassandra cassandra-database cloudformation cluster data data-engineering data-engineering-pipeline data-lake data-modeling data-warehouse etl-pipeline infrastructure postgres postgresql-database
Last synced: 08 Nov 2024
https://github.com/san089/udacity-data-engineering-projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
airflow airflow-operators aws aws-ec2 aws-s3 aws-sdk cassandra cassandra-database cloudformation cluster data data-engineering data-engineering-pipeline data-lake data-modeling data-warehouse etl-pipeline infrastructure postgres postgresql-database
Last synced: 29 Oct 2024
https://github.com/san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
airflow airflow-dag apache-airflow apache-spark data-engineering data-engineering-pipeline data-lake data-migration emr-cluster etl-framework etl-job etl-pipeline goodreads-data-pipeline livy python redshift s3 scheduler spark warehouse
Last synced: 12 Oct 2024
https://github.com/san089/optimizing-public-transportation
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
faust kafka-api kafka-application kafka-cluster kafka-connect kafka-consumer kafka-producer kafka-schema-registry kafka-sql kafka-streams kafka-topic
Last synced: 28 Oct 2024
https://github.com/san089/cloudera_material
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
big-data bigdata cca cca175 certification cloudera flume hadoop hive hive-metastore pyspark spark sqoop sqoop-export sqoop-import sqoop-session
Last synced: 12 Oct 2024
https://github.com/san089/big_data_project
Fake News Detection - Feature Extraction using Vectorization such as Count Vectorizer, TFIDF Vectorizer, Hash Vectorizer,. Then used an Ensemble model to classify whether the news is fake or not.
classifiers ensemble-model fakenewsdetection machine-learning news-classification scikit-learn text-mining textclassification vectorization vectorizers
Last synced: 12 Oct 2024
https://github.com/san089/sf-crime-statistics
A Kafka and Spark Streaming Integration project : SF Crime Statistics with Spark Streaming
kafka kafka-consumer kafka-producer kafka-python spark-sql spark-streaming
Last synced: 12 Oct 2024
https://github.com/san089/spark_packaged_project
This project contains pyspark jobs to create data pipelines and shows how to distribute the project package on Cluster.
data-pipeline etl etl-framework etl-pipeline job pyspark spark
Last synced: 12 Oct 2024
https://github.com/san089/ipl-analysis-with-python-pandas
This project provides an analysis on IPL(Indian premier League) stats from Year 2008 to 2017.
Last synced: 12 Oct 2024
https://github.com/san089/yelp_project
This project is to create a Data lake for Yelp data-set and further using the it to create an Analytical Sandbox Data Science purpose and also creating a data warehouse for reporting purpose.
data-lake data-pipeline etl etl-pipeline ingestion load pyspark recommender-system redshift
Last synced: 12 Oct 2024
https://github.com/san089/uppaal_model_checking
Model Checking For Automated Machine Learning Models
liveness machine-learning model-checking reachability safety uppaal
Last synced: 12 Oct 2024
https://github.com/san089/soen_6441
A multiplayer board Risk Game.
coding-standards design-patterns documentation organizing programming-game risk-game testing
Last synced: 12 Oct 2024
https://github.com/san089/black-friday-sales-analysis
This Project gives an insight into few statistics related to black Friday Sale.
custom data dataanalysis insights sales statistics
Last synced: 12 Oct 2024
https://github.com/san089/soen-6011
This Repository is for course SOEN 6011.
Last synced: 12 Oct 2024
https://github.com/san089/learning_machine_learning
Machine learning demo projects
Last synced: 12 Oct 2024