Projects in Awesome Lists tagged with sqoop
A curated list of projects in awesome lists tagged with sqoop .
https://github.com/WeBankFinTech/Exchangis
Exchangis is a lightweight,highly extensible data exchange platform that supports data transmission between structured and unstructured heterogeneous data sources
dataspherestudio datax etl exchangis flink linkis sqoop transmission-engine wedatasphere
Last synced: 27 Mar 2025
https://github.com/dimajix/spark-training
Repository used for Spark Trainings
hadoop hadoop-training hive pyspark python scala spark spark-ml spark-streaming spark-training sqoop
Last synced: 21 Apr 2025
https://github.com/Cigna/ibis
IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
cigna hadoop hadoop-ecosystem hadoop-framework ibis ingestion oozie sqoop sqoop2 workflow workflow-automation workflow-scheduler
Last synced: 19 Jul 2025
https://github.com/san089/cloudera_material
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
big-data bigdata cca cca175 certification cloudera flume hadoop hive hive-metastore pyspark spark sqoop sqoop-export sqoop-import sqoop-session
Last synced: 29 Oct 2025
https://github.com/stefen-taime/etl-data-pipeline-rdbms-to-hdfs-using-airflow-apache-sqoop-spark-postgres-and-hive
This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)
airflow big-data data docker-compose etl-pipeline hdfs hive infrastructure-as-code rdbms spark sql sqoop
Last synced: 03 Jul 2025
https://github.com/tritondatacenter/hadoop-manta
Hadoop Filesystem Driver for Manta
drill hadoop hadoop-filesystem joyent manta sqoop triton
Last synced: 13 Jun 2025
https://github.com/dadananjesha/redshift-etl-project
The project covers the complete data pipeline—from importing data from an RDS source to HDFS using Sqoop, processing data with Spark, to executing analytical queries on an AWS Redshift cluster.
apache-spark aws data-engineering-etl-assignment data-ingestion data-pipeline etl-processes hdfs rds redshift spark sqoop
Last synced: 08 May 2026
https://github.com/thdaraujo/cheat
A handful of cheatsheets and programming tips.
bash cheat-sheets cheatsheet dms hadoop postgresql spark sqoop
Last synced: 16 Apr 2026
https://github.com/vladimirzelenokor1/big-data-project---predicting-trip-fares-with-spark-hive
A CRISP-DM–based big data pipeline for predicting NYC ride-sharing trip fares: ingesting 2024 TLC data via Sqoop into HDFS/Hive, performing ETL and feature engineering with Spark & PySpark, training and tuning Linear Regression & Gradient Boosted Tree models, and outlining end-to-end deployment.
big-data data-engineering etl hadoop hive jupyter-notebook machine-learning predictive-modeling pyspark python spark spark-ml sql sqoop
Last synced: 06 May 2026
https://github.com/tejaswirupa/big-data-systems-project-hadoop-hive-mapreduce-sqoop-workflows
Designed and implemented scalable data workflows using Hadoop, Hive, and Sqoop. This project involved log aggregation, airline delay analysis, word frequency processing, and TF-IDF computation across multiple datasets using MapReduce, Hive queries, and Hadoop Streaming.
big-data hadoop log-processing mapreduce sqoop streaming tf-idf
Last synced: 27 Jan 2026
https://github.com/leisurelyleon/mastercard-lead-data-engineer
A tailored list of exemplified files corresponding to required skills for an appliable career position at Mastercard Inc.
apache apache-spark big-data hive impala java kafka nifi nosql nosql-database nosql-databases object-oriented object-oriented-programming oozie postgresql python scala spark sqoop
Last synced: 08 Apr 2026
https://github.com/ankit21111/patient-alert-etl
The Patient Alert ETL 🚑 project creates a real-time data pipeline to monitor vital health parameters from IoT devices in hospitals. Using Apache Kafka, Spark, and HBase, it processes streaming data and sends immediate alerts via Amazon SNS when vitals exceed normal thresholds, enhancing patient care through timely interventions.
apache-kafka apache-spark awssns hadoop-hdfs hbase hive java-8 mysql python3 rdbms sqoop
Last synced: 18 Apr 2026
https://github.com/marco-gallegos/sqoopit
A python package that lets you sqoop into HDFS/Hive/HBase data from RDBMS using sqoop
hadoop hbase hdfs hive py python python3 sqoop sqoop-import
Last synced: 03 Oct 2025
https://github.com/raja9283/hadoopscd
A data pipeline on GCP Dataproc using Sqoop, HDFS, Hive, and PySpark to implement SCD Type 2 for an e-commerce use case. Tracks customer and product changes (e.g., address, price) and their impact on sales, demonstrating scalable data warehousing and processing.
hadoop hdfs hive scd spark sqoop
Last synced: 17 May 2026