Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/OBenner/data-engineering-interview-questions
More than 2000+ Data engineer interview questions.
https://github.com/OBenner/data-engineering-interview-questions
airflow avro aws azure cassandra data-engineering data-structures flink flume hadoop hadoop-hdfs hbase hive impala interview interview-questions kafka nifi spark sql
Last synced: 3 months ago
JSON representation
More than 2000+ Data engineer interview questions.
- Host: GitHub
- URL: https://github.com/OBenner/data-engineering-interview-questions
- Owner: OBenner
- Created: 2021-08-08T15:49:45.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-08-13T18:02:20.000Z (6 months ago)
- Last Synced: 2024-10-29T15:20:55.133Z (4 months ago)
- Topics: airflow, avro, aws, azure, cassandra, data-engineering, data-structures, flink, flume, hadoop, hadoop-hdfs, hbase, hive, impala, interview, interview-questions, kafka, nifi, spark, sql
- Homepage:
- Size: 1.02 MB
- Stars: 1,125
- Watchers: 21
- Forks: 404
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-ai-data-github-repos - Data Engineering Interview Questions
- awesome-ai-data-github-repos - Data Engineering Interview Questions
README
More than 2000+ questions for preparing a Data Engineer interview.
Full list of questions
Interview questions for Data Engineer
Databases and Data Warehouses
GitHub Repo
Official page
Questions
Description
Useful links
![]()
![]()
Apache Cassandra
Cassandra is a distributed, wide-column store, NoSQL database management system.
Awesome Cassandra
![]()
![]()
Greenplum
Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology.
Awesome Greenplum
![]()
![]()
MongoDB
MongoDB is a document-oriented database.
Awesome MongoDB
![]()
![]()
Apache Hbase
HBase is an open-source non-relational distributed database.
Awesome HBase
![]()
![]()
Apache Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.
Awesome Hive
![]()
Amazon DynamoDB
Amazon DynamoDB is a fully managed proprietary NoSQL database service.
Awesome DynamoDB
Awesome AWS
![]()
Amazon Redshift
Amazon Redshift is a data warehouse product.
Amazon Redshift Utilities
Awesome AWS
![]()
BigQuery GCP
BigQuery is a fully-managed, serverless data warehouse.
Awesome BigQuery
![]()
Bigtable GCP
Bigtable is a fully managed wide-column and key-value NoSQL database service.
Awesome Bigtable
Data Formats
![]()
![]()
Apache Avro
Avro is a row-oriented remote procedure call and data serialization framework.
Awesome Avro
![]()
![]()
Apache Parquet
Apache Parquet is a column-oriented data file format designed for efficient data storage and retrieval.
TODO
![]()
![]()
Delta
Delta Lake is a storage framework that enables building a Lakehouse architecture with compute engines
Delta examples
Big Data Frameworks
![]()
![]()
Apache Airflow
Apache Airflow is a workflow management platform for data engineering pipelines.
Awesome Airflow
![]()
![]()
Apache Flume
Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data.
TODO
![]()
![]()
Apache Hadoop
Apache Hadoop is a collection of software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation.
Awesome Hadoop
![]()
![]()
Apache Impala
Apache Impala is a parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop.
TODO
![]()
![]()
Apache Kafka
Apache Kafka is a distributed event store and stream-processing platform.
Awesome Kafka
![]()
![]()
Apache NiFi
Apache NiFi is a software project designed to automate the flow of data between software systems.
Awesome NiFi
![]()
![]()
Apache Spark
Apache Spark is unified analytics engine for large-scale data processing.
Awesome Spark
![]()
![]()
Apache Flink
Apache Flink is unified stream-processing and batch-processing framework.
Awesome Flink
![]()
![]()
Kubernetes
Kubernetes is a system for managing containerized applications across multiple hosts.
Awesome Kubernetes
Cloud providers
![]()
![]()
Amazon Web Services
Amazon web service is an online platform that provides scalable and cost-effective cloud computing solutions.
Awesome AWS
![]()
![]()
Microsoft Azure
Microsoft Azure is Microsoft's public cloud computing platform.
Awesome Azure
![]()
![]()
Google Cloud Platform
Google Cloud Platform is a suite of cloud computing services.
Awesome GCP
Theory
![]()
DWH Architectures
A data warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients computing within the enterprise.
Awesome databases
![]()
Data Structures
A data structure is a specialized format for organizing, processing, retrieving and storing data.
TODO
![]()
SQL
SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS).
Awesome SQL
Data visualization tools/BI
![]()
Tableau
Tableau is a powerful data visualization tool used in the Business Intelligence.
TODO
![]()
Looker
Looker is an enterprise platform for BI, data applications, and embedded analytics that helps you explore and share insights in real time.
TODO
![]()
![]()
Apache Superset
Superset is a modern data exploration and data visualization platform
TODO
Contribution
Please contribute to this repository to help it make better. Any change like new question, code improvement, doc improvement etc is very welcome.