An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with sqoop

A curated list of projects in awesome lists tagged with sqoop .

https://github.com/WeBankFinTech/Exchangis

Exchangis is a lightweight,highly extensible data exchange platform that supports data transmission between structured and unstructured heterogeneous data sources

dataspherestudio datax etl exchangis flink linkis sqoop transmission-engine wedatasphere

Last synced: 27 Mar 2025

https://github.com/v5tech/cloud

云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件

flume flume-ng hadoop hbase hive hue oozie pig sqoop zookeeper

Last synced: 30 Apr 2025

https://github.com/Cigna/ibis

IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.

cigna hadoop hadoop-ecosystem hadoop-framework ibis ingestion oozie sqoop sqoop2 workflow workflow-automation workflow-scheduler

Last synced: 19 Jul 2025

https://github.com/san089/cloudera_material

Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.

big-data bigdata cca cca175 certification cloudera flume hadoop hive hive-metastore pyspark spark sqoop sqoop-export sqoop-import sqoop-session

Last synced: 29 Oct 2025

https://github.com/zenoyang/web-click-flow

网站点击流离线日志分析

etl flume hadoop hive mapreduce sqoop

Last synced: 03 Jul 2025

https://github.com/stefen-taime/etl-data-pipeline-rdbms-to-hdfs-using-airflow-apache-sqoop-spark-postgres-and-hive

This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)

airflow big-data data docker-compose etl-pipeline hdfs hive infrastructure-as-code rdbms spark sql sqoop

Last synced: 03 Jul 2025

https://github.com/tritondatacenter/hadoop-manta

Hadoop Filesystem Driver for Manta

drill hadoop hadoop-filesystem joyent manta sqoop triton

Last synced: 13 Jun 2025

https://github.com/lovnishverma/bigdataecosystem

Complete Big Data Ecosystem on Docker Desktop

bigdata docker flume hadoop hdfs hive mapreduce spark sqoop

Last synced: 19 Aug 2025

https://github.com/dadananjesha/redshift-etl-project

The project covers the complete data pipeline—from importing data from an RDS source to HDFS using Sqoop, processing data with Spark, to executing analytical queries on an AWS Redshift cluster.

apache-spark aws data-engineering-etl-assignment data-ingestion data-pipeline etl-processes hdfs rds redshift spark sqoop

Last synced: 08 May 2026

https://github.com/alokjani/bigdata-vagrant-devlab

Hadoop Software Development sandbox

centos flume hadoop hive pig sqoop zeppelin

Last synced: 09 Sep 2025

https://github.com/thdaraujo/cheat

A handful of cheatsheets and programming tips.

bash cheat-sheets cheatsheet dms hadoop postgresql spark sqoop

Last synced: 16 Apr 2026

https://github.com/vladimirzelenokor1/big-data-project---predicting-trip-fares-with-spark-hive

A CRISP-DM–based big data pipeline for predicting NYC ride-sharing trip fares: ingesting 2024 TLC data via Sqoop into HDFS/Hive, performing ETL and feature engineering with Spark & PySpark, training and tuning Linear Regression & Gradient Boosted Tree models, and outlining end-to-end deployment.

big-data data-engineering etl hadoop hive jupyter-notebook machine-learning predictive-modeling pyspark python spark spark-ml sql sqoop

Last synced: 06 May 2026

https://github.com/sebastianruizm/cca175-exam-preparation

Backup de mi preparación para el examen CCA175 de Cloudera

hdfs mysql python spark sqoop

Last synced: 30 Apr 2026

https://github.com/tejaswirupa/big-data-systems-project-hadoop-hive-mapreduce-sqoop-workflows

Designed and implemented scalable data workflows using Hadoop, Hive, and Sqoop. This project involved log aggregation, airline delay analysis, word frequency processing, and TF-IDF computation across multiple datasets using MapReduce, Hive queries, and Hadoop Streaming.

big-data hadoop log-processing mapreduce sqoop streaming tf-idf

Last synced: 27 Jan 2026

https://github.com/offthetab/vkapi-ml-dataharvester

Pipeline to harvest data via VK API for ML analysis with hadoop and spark

hadoop hdfs hive linux mariadb python requests spark sqoop

Last synced: 31 Jan 2026

https://github.com/leisurelyleon/mastercard-lead-data-engineer

A tailored list of exemplified files corresponding to required skills for an appliable career position at Mastercard Inc.

apache apache-spark big-data hive impala java kafka nifi nosql nosql-database nosql-databases object-oriented object-oriented-programming oozie postgresql python scala spark sqoop

Last synced: 08 Apr 2026

https://github.com/ankit21111/patient-alert-etl

The Patient Alert ETL 🚑 project creates a real-time data pipeline to monitor vital health parameters from IoT devices in hospitals. Using Apache Kafka, Spark, and HBase, it processes streaming data and sends immediate alerts via Amazon SNS when vitals exceed normal thresholds, enhancing patient care through timely interventions.

apache-kafka apache-spark awssns hadoop-hdfs hbase hive java-8 mysql python3 rdbms sqoop

Last synced: 18 Apr 2026

https://github.com/ccao-data/service-sqoop-iasworld

Service to continually import iasWorld backend data to Parquet using Apache Sqoop

docker hadoop service shell sqoop

Last synced: 07 May 2025

https://github.com/marco-gallegos/sqoopit

A python package that lets you sqoop into HDFS/Hive/HBase data from RDBMS using sqoop

hadoop hbase hdfs hive py python python3 sqoop sqoop-import

Last synced: 03 Oct 2025

https://github.com/raja9283/hadoopscd

A data pipeline on GCP Dataproc using Sqoop, HDFS, Hive, and PySpark to implement SCD Type 2 for an e-commerce use case. Tracks customer and product changes (e.g., address, price) and their impact on sales, demonstrating scalable data warehousing and processing.

hadoop hdfs hive scd spark sqoop

Last synced: 17 May 2026