Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dr-saad-la/data-engineer-tools
Data Engineer production tools
https://github.com/dr-saad-la/data-engineer-tools
Last synced: 7 days ago
JSON representation
Data Engineer production tools
- Host: GitHub
- URL: https://github.com/dr-saad-la/data-engineer-tools
- Owner: dr-saad-la
- License: cc0-1.0
- Created: 2024-07-07T05:24:56.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-07-07T05:44:04.000Z (4 months ago)
- Last Synced: 2024-07-07T06:41:52.523Z (4 months ago)
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Data Engineering Tools
![Data Engineer Tools](https://img.shields.io/badge/Data%20Engineer%20Tools-Resource-blue)
![Forks](https://img.shields.io/github/forks/dr-saad-la/Data-Engineer-Tools?style=social)This repository contains a comprehensive list of tools commonly used in data engineering. These tools are categorized based on their functionality and usage.
## Table of Contents
1. [Data Storage](#data-storage)
2. [Data Integration and ETL](#data-integration-and-etl)
3. [Data Processing](#data-processing)
4. [Data Orchestration](#data-orchestration)
5. [Data Quality and Governance](#data-quality-and-governance)
6. [Data Visualization](#data-visualization)
7. [Big Data Technologies](#big-data-technologies)
8. [Cloud Platforms](#cloud-platforms)
9. [Monitoring and Logging](#monitoring-and-logging)
10. [Development and Version Control](#development-and-version-control)## Data Storage
- **Relational Databases**
- MySQL
- PostgreSQL
- Oracle Database
- Microsoft SQL Server- **NoSQL Databases**
- MongoDB
- Cassandra
- Redis
- DynamoDB- **Data Warehouses**
- Amazon Redshift
- Google BigQuery
- Snowflake
- Microsoft Azure Synapse- **Data Lakes**
- Apache Hadoop HDFS
- Amazon S3
- Azure Data Lake Storage
- Google Cloud Storage## Data Integration and ETL
- **ETL Tools**
- Apache Nifi
- Talend
- Informatica
- AWS Glue
- Azure Data Factory
- Google Dataflow- **Data Integration Platforms**
- Apache Camel
- MuleSoft
- Fivetran
- Stitch## Data Processing
- **Batch Processing**
- Apache Spark
- Apache Hadoop
- Google Dataflow
- Azure Synapse- **Stream Processing**
- Apache Kafka
- Apache Flink
- Apache Storm
- Confluent Platform- **Data Transformation**
- dbt (Data Build Tool)
- SQL
- Pandas (Python Library)## Data Orchestration
- **Workflow Orchestration**
- Apache Airflow
- Prefect
- Luigi
- Dagster- **Job Scheduling**
- Apache Oozie
- Kubernetes CronJobs## Data Quality and Governance
- **Data Quality**
- Great Expectations
- Deequ (Amazon)
- Talend Data Quality- **Data Governance**
- Apache Atlas
- Collibra
- Alation## Data Visualization
- **Visualization Tools**
- Tableau
- Power BI
- Looker
- Google Data Studio
- Apache Superset## Big Data Technologies
- **Big Data Frameworks**
- Apache Hadoop
- Apache Spark
- Apache Flink- **Data Serialization Formats**
- Apache Avro
- Apache Parquet
- JSON
- ORC## Cloud Platforms
- **Amazon Web Services (AWS)**
- S3
- RDS
- Redshift
- Glue
- EMR- **Microsoft Azure**
- Azure Data Lake Storage
- Azure SQL Database
- Azure Synapse
- Azure Data Factory- **Google Cloud Platform (GCP)**
- Google Cloud Storage
- BigQuery
- Dataflow
- Dataproc## Monitoring and Logging
- **Monitoring Tools**
- Prometheus
- Grafana
- Datadog- **Logging Tools**
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Splunk
- Fluentd## Development and Version Control
- **Version Control**
- Git
- GitHub
- GitLab
- Bitbucket- **Integrated Development Environments (IDEs)**
- PyCharm
- VS Code
- Jupyter Notebooks
- IntelliJ IDEA## Contributing
We welcome contributions! If you have suggestions for additional tools or improvements to this list, please open an issue or submit a pull request.
## License
This repository is licensed under the Creative Commons Attribution 4.0 International License. See the [LICENSE](LICENSE) file for more information.