Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists by divithraju

A curated list of projects in awesome lists by divithraju .

- Recently synced
- Stars

https://github.com/divithraju/divith-raju-openmetadata

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

automation bigdata bigdataanalytics data data-structures dataengineering datascience hacktoberfest2022 metadata metadata-extraction

Last synced: 31 Dec 2024

https://github.com/divithraju/divith-raju-hadoop-connectors-master

apache bigdata bigquery cloud connector dataengineering gcs hadoop linux opensource software-engineering ubuntu

Last synced: 31 Dec 2024

https://github.com/divithraju/divith-raju-searchengine-wikipedia

search engine optimizationA complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.

algorithms data dataengineering inverted-index linux merge-sort nlp project project-repository python3 serchengine software-engineering ubuntu wikipedia

Last synced: 31 Dec 2024

https://github.com/divithraju/divith-raju-immigration-data-engineering

A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)

apachespark bigdata bigdataprocessing bigdataproject capstone-project datacleaning dataengineering datalake datamodeling datapipeline dataprocessing dataschema dataset datawherehouse pandas sql

Last synced: 31 Dec 2024

https://github.com/divithraju/divith-raju-web-server-log-analysis-pyspark

Playground for pyspark (RDDs, DStreams) and Apache Airflow. Based on the example of parsing (including incorrectly formated strings) web server log data

Last synced: 31 Dec 2024

https://github.com/divithraju/divith-aju-hadoop-pyspark-pipeline

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

apache-hadoop-framework apache-spark bigdata client data database dataengineering dataingestionframework datapreprocessing documentation ecommerce-platform hdfs pipeline project project-repository pyspark python3 software-engineering

Last synced: 16 Nov 2024

https://github.com/divithraju/divith-raju-data-mining

This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.

algorthims analytics apache business client connector data dataarchitecture database dataengineering datamining datascience hadoop k-means-clustering mysql project project-repository pyspark python3 spark

Last synced: 16 Nov 2024