Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/obenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.
https://github.com/obenner/data-engineering-interview-questions

airflow avro aws azure cassandra data-engineering data-structures flink flume hadoop hadoop-hdfs hbase hive impala interview interview-questions kafka nifi spark sql

Last synced: about 9 hours ago
JSON representation

More than 2000+ Data engineer interview questions.

Awesome Lists containing this project

README

        

More than 2000+ questions for preparing a Data Engineer interview.


Full list of questions


Interview questions for Data Engineer




Databases and Data Warehouses


GitHub Repo
Official page
Questions
Description
Useful links


Cassandra
Cassandra
Apache Cassandra
Cassandra is a distributed, wide-column store, NoSQL database management system.
Awesome Cassandra


Greenplum
Greenplum
Greenplum
Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology.
Awesome Greenplum


MongoDB
MongoDB
MongoDB
MongoDB is a document-oriented database.
Awesome MongoDB


Hbase
Hbase
Apache Hbase
HBase is an open-source non-relational distributed database.
Awesome HBase


Hive
Hive
Apache Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.
Awesome Hive


Amazon DynamoDB
Amazon DynamoDB
Amazon DynamoDB is a fully managed proprietary NoSQL database service.
Awesome DynamoDB
Awesome AWS


Amazon Redshift
Amazon Redshift
Amazon Redshift is a data warehouse product.
Amazon Redshift Utilities
Awesome AWS


BigQuery
BigQuery GCP
BigQuery is a fully-managed, serverless data warehouse.
Awesome BigQuery


Bigtable
Bigtable GCP
Bigtable is a fully managed wide-column and key-value NoSQL database service.
Awesome Bigtable



Data Formats


Avro
Avro
Apache Avro
Avro is a row-oriented remote procedure call and data serialization framework.
Awesome Avro


Parquet
Parquet
Apache Parquet
Apache Parquet is a column-oriented data file format designed for efficient data storage and retrieval.
TODO


Delta
Delta
Delta
Delta Lake is a storage framework that enables building a Lakehouse architecture with compute engines
Delta examples



Big Data Frameworks


Airflow
Airflow
Apache Airflow
Apache Airflow is a workflow management platform for data engineering pipelines.
Awesome Airflow


Flume
Flume
Apache Flume
Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data.
TODO


Hadoop
Hadoop
Apache Hadoop
Apache Hadoop is a collection of software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation.
Awesome Hadoop


Impala
Impala
Apache Impala
Apache Impala is a parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop.
TODO


Kafka
Kafka
Apache Kafka
Apache Kafka is a distributed event store and stream-processing platform.
Awesome Kafka


NiFi
NiFi
Apache NiFi
Apache NiFi is a software project designed to automate the flow of data between software systems.
Awesome NiFi


Spark
Spark
Apache Spark
Apache Spark is unified analytics engine for large-scale data processing.
Awesome Spark


Flink
Flink
Apache Flink
Apache Flink is unified stream-processing and batch-processing framework.
Awesome Flink


Kubernetes
Kubernetes
Kubernetes
Kubernetes is a system for managing containerized applications across multiple hosts.
Awesome Kubernetes



Cloud providers


AWS
AWS
Amazon Web Services
Amazon web service is an online platform that provides scalable and cost-effective cloud computing solutions.
Awesome AWS


Azure
Azure
Microsoft Azure
Microsoft Azure is Microsoft's public cloud computing platform.
Awesome Azure


GCP
GCP
Google Cloud Platform
Google Cloud Platform is a suite of cloud computing services.
Awesome GCP



Theory


DWHA
DWH Architectures
A data warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients computing within the enterprise.
Awesome databases


Airflow
Data Structures
A data structure is a specialized format for organizing, processing, retrieving and storing data.
TODO


SQL
SQL
SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS).
Awesome SQL



Data visualization tools/BI


Tableau
Tableau
Tableau is a powerful data visualization tool used in the Business Intelligence.
TODO

Looker
Looker
Looker is an enterprise platform for BI, data applications, and embedded analytics that helps you explore and share insights in real time.
TODO


Kafka
Apache Superset
Apache Superset
Superset is a modern data exploration and data visualization platform
TODO



Contribution


Please contribute to this repository to help it make better. Any change like new question, code improvement, doc improvement etc is very welcome.