Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/OBenner/data-engineering-interview-questions
More than 2000+ Data engineer interview questions.
https://github.com/OBenner/data-engineering-interview-questions
airflow avro aws azure cassandra data-engineering data-structures flink flume hadoop hadoop-hdfs hbase hive impala interview interview-questions kafka nifi spark sql
Last synced: about 1 month ago
JSON representation
More than 2000+ Data engineer interview questions.
- Host: GitHub
- URL: https://github.com/OBenner/data-engineering-interview-questions
- Owner: OBenner
- Created: 2021-08-08T15:49:45.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-08-13T18:02:20.000Z (4 months ago)
- Last Synced: 2024-10-29T15:20:55.133Z (about 1 month ago)
- Topics: airflow, avro, aws, azure, cassandra, data-engineering, data-structures, flink, flume, hadoop, hadoop-hdfs, hbase, hive, impala, interview, interview-questions, kafka, nifi, spark, sql
- Homepage:
- Size: 1.02 MB
- Stars: 1,125
- Watchers: 21
- Forks: 404
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-ai-data-github-repos - Data Engineering Interview Questions
- awesome-ai-data-github-repos - Data Engineering Interview Questions
README
More than 2000+ questions for preparing a Data Engineer interview.
Full list of questions
Interview questions for Data Engineer
Databases and Data Warehouses
GitHub Repo
Official page
Questions
Description
Useful links
Apache Cassandra
Cassandra is a distributed, wide-column store, NoSQL database management system.
Awesome Cassandra
Greenplum
Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology.
Awesome Greenplum
MongoDB
MongoDB is a document-oriented database.
Awesome MongoDB
Apache Hbase
HBase is an open-source non-relational distributed database.
Awesome HBase
Apache Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.
Awesome Hive
Amazon DynamoDB
Amazon DynamoDB is a fully managed proprietary NoSQL database service.
Awesome DynamoDB
Awesome AWS
Amazon Redshift
Amazon Redshift is a data warehouse product.
Amazon Redshift Utilities
Awesome AWS
BigQuery GCP
BigQuery is a fully-managed, serverless data warehouse.
Awesome BigQuery
Bigtable GCP
Bigtable is a fully managed wide-column and key-value NoSQL database service.
Awesome Bigtable
Data Formats
Apache Avro
Avro is a row-oriented remote procedure call and data serialization framework.
Awesome Avro
Apache Parquet
Apache Parquet is a column-oriented data file format designed for efficient data storage and retrieval.
TODO
Delta
Delta Lake is a storage framework that enables building a Lakehouse architecture with compute engines
Delta examples
Big Data Frameworks
Apache Airflow
Apache Airflow is a workflow management platform for data engineering pipelines.
Awesome Airflow
Apache Flume
Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data.
TODO
Apache Hadoop
Apache Hadoop is a collection of software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation.
Awesome Hadoop
Apache Impala
Apache Impala is a parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop.
TODO
Apache Kafka
Apache Kafka is a distributed event store and stream-processing platform.
Awesome Kafka
Apache NiFi
Apache NiFi is a software project designed to automate the flow of data between software systems.
Awesome NiFi
Apache Spark
Apache Spark is unified analytics engine for large-scale data processing.
Awesome Spark
Apache Flink
Apache Flink is unified stream-processing and batch-processing framework.
Awesome Flink
Kubernetes
Kubernetes is a system for managing containerized applications across multiple hosts.
Awesome Kubernetes
Cloud providers
Amazon Web Services
Amazon web service is an online platform that provides scalable and cost-effective cloud computing solutions.
Awesome AWS
Microsoft Azure
Microsoft Azure is Microsoft's public cloud computing platform.
Awesome Azure
Google Cloud Platform
Google Cloud Platform is a suite of cloud computing services.
Awesome GCP
Theory
DWH Architectures
A data warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients computing within the enterprise.
Awesome databases
Data Structures
A data structure is a specialized format for organizing, processing, retrieving and storing data.
TODO
SQL
SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS).
Awesome SQL
Data visualization tools/BI
Tableau
Tableau is a powerful data visualization tool used in the Business Intelligence.
TODO
Looker
Looker is an enterprise platform for BI, data applications, and embedded analytics that helps you explore and share insights in real time.
TODO
Apache Superset
Superset is a modern data exploration and data visualization platform
TODO
Contribution
Please contribute to this repository to help it make better. Any change like new question, code improvement, doc improvement etc is very welcome.