Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists by san089

A curated list of projects in awesome lists by san089 .

https://github.com/san089/cloudera_material

Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.

big-data bigdata cca cca175 certification cloudera flume hadoop hive hive-metastore pyspark spark sqoop sqoop-export sqoop-import sqoop-session

Last synced: 12 Oct 2024

https://github.com/san089/big_data_project

Fake News Detection - Feature Extraction using Vectorization such as Count Vectorizer, TFIDF Vectorizer, Hash Vectorizer,. Then used an Ensemble model to classify whether the news is fake or not.

classifiers ensemble-model fakenewsdetection machine-learning news-classification scikit-learn text-mining textclassification vectorization vectorizers

Last synced: 12 Oct 2024

https://github.com/san089/sf-crime-statistics

A Kafka and Spark Streaming Integration project : SF Crime Statistics with Spark Streaming

kafka kafka-consumer kafka-producer kafka-python spark-sql spark-streaming

Last synced: 12 Oct 2024

https://github.com/san089/spark_packaged_project

This project contains pyspark jobs to create data pipelines and shows how to distribute the project package on Cluster.

data-pipeline etl etl-framework etl-pipeline job pyspark spark

Last synced: 12 Oct 2024

https://github.com/san089/ipl-analysis-with-python-pandas

This project provides an analysis on IPL(Indian premier League) stats from Year 2008 to 2017.

Last synced: 12 Oct 2024

https://github.com/san089/yelp_project

This project is to create a Data lake for Yelp data-set and further using the it to create an Analytical Sandbox Data Science purpose and also creating a data warehouse for reporting purpose.

data-lake data-pipeline etl etl-pipeline ingestion load pyspark recommender-system redshift

Last synced: 12 Oct 2024

https://github.com/san089/uppaal_model_checking

Model Checking For Automated Machine Learning Models

liveness machine-learning model-checking reachability safety uppaal

Last synced: 12 Oct 2024

https://github.com/san089/black-friday-sales-analysis

This Project gives an insight into few statistics related to black Friday Sale.

custom data dataanalysis insights sales statistics

Last synced: 12 Oct 2024

https://github.com/san089/soen-6011

This Repository is for course SOEN 6011.

Last synced: 12 Oct 2024

https://github.com/san089/learning_machine_learning

Machine learning demo projects

Last synced: 12 Oct 2024

https://github.com/san089/dbt_common_utils

dbt_common_utils

Last synced: 12 Oct 2024