Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-data-pipeline

Awesome list for datapipeline
https://github.com/KennethanCeyer/awesome-data-pipeline

Last synced: 5 days ago
JSON representation

  • Components

    • Data Warehouse

      • IBM DB2 - (IBM / On-prem / SQL-friendly / Subscription fee).
      • Aapache Hive - (Apache foundation / Hadoop-friendly / MapReduce / Free).
    • Workflow Management

      • Luigi - (Spotify / Open Source / Free).
      • Apache Airflow - (Apache foundation / Airbnb / Open Source / Free).
      • Apache NiFi - (Apache foundation / Dataflow / Free).
      • Apache Argo - (CNCF foundation / Kubernetes-friendly / Open Source / Free).
    • Data Ingestion

      • Apache Flume - (Apache foundation / Data Ingestion / Open Source / Free).
      • Stitch - (Talend / ETL / Subscription fee).
      • Filebeat - (Elastic / Data Ingestion / Cloud or On-prem / Hybrid fee).
      • Fluentd - (CNCF foundation / Open Source / Free or License fee).
      • Datadog - (Datadog / Cloud / APM / Subscription fee).
      • New Relic - (New Relic / Cloud / APM / Subscription fee).
    • Data Lake

    • Data Store

      • Apache Druid - (Apache foundation / Real-time datastore / Free).
      • Apache Pinot - (Apache foundation / Real-time datastore / Free).
      • GCP Cloud Spanner - (Google Cloud / HA datastore that breaks away from CAP / Subscription fee).
      • Azure Cosmos DB - (Azure Cloud / NoSQL datastore / Subscription fee).
      • Azure Cosmos DB - (Azure Cloud / NoSQL datastore / Subscription fee).
    • Query Engine

      • Presto - (Facebook / Open Source / SQL-friendly / Free or License fee).
      • Apache Impala - (Apache foundation / Cloudera / Open Source / SQL-friendly / Free or License fee).
      • AWS Redshift Spectrum - (AWS Cloud / SQL-friendly / On-demand fee).
    • Streaming

      • Apache Kafka - (Apache foundation / Confluent / Linkedin / Message Broker / Open Source / Free or License fee).
      • RabbitMQ - (VMWare / Messaging Queue / Free or License fee).
      • Azure Event Hub - (Azure Cloud / Messsage Borker / Subscription fee).
    • Data Transformation

      • Apache Spark - (Apache foundation / Databricks / In-memory processing / Open Source / Free or License fee).
      • Apache Beam - (Apache foundation / Google / Data processing / Open Source / Free or License fee).
      • Apache Storm - (Apache foundation / Backtype / Twitter / Stream processing / Open Source / Free).
      • Apache Flink - (Apache foundation / Stream processing / Open Source / Free).
    • Data Analysis

      • Apache Superset - (Apache foundation / Airbnb / Business Intelligence (BI) / Open Source / Free).
      • Apache Airpal - (Apache foundation / Airbnb / Query Editor / Open Source / Free).
      • Apache HUE - (Apache foundation / Cloudera / Query Editor / Open Source / Free).
      • Databricks Notebook - (Databricks / Notebook / Hybrid fee).
      • Jupyter Notebook - (Jupyter / Notebook / Open Source / Free).
      • Pandas - (NumFOCUS / Data processing / Open Source / Free).
      • Plotly - (Plotly / Data visualization / Hybrid fee).
    • Data Format

      • Apache Parquet - (Apache foundation / Data Format / Open Source / Free).
      • Apache ORC - (Apache foundation / Hortonworks / Facebook / Data Format / Open Source / Free).
      • Apache Avro - (Apache foundation / Data Format / Open Source / Free).
      • Apache Kudu - (Apache foundation / Cloudera / Data Format / Open Source / Free).
      • Apache Arrow - (Apache foundation / Data Format / Open Source / Free).
      • Delta - (Databricks / Data Format / Free or License fee).
      • JSON - (Data Format / Free).
      • CSV - (Data Format / Free).
      • TSV - (Data Format / Free).
      • HDF5 - (The HDF Group / Data Format / Open Source (licensed by [HDF5](https://www.hdfgroup.org/licenses.)) / Free).
    • Business Intelligence

      • Apache Zeppelin - (Apache foundation / Business Intelligence (BI) / Open Source / Free or License fee).
      • Tableau - (Salesforce / Business Intelligence (BI) / Hybrid fee).
      • Redash - (Redash Inc / Databricks / Business Intelligence (BI) / Hybrid fee).
      • Data Studio - (Google Cloud / Business Intelligence (BI) / Free).
    • AI/ML

      • Feast - (Tecton / Gojek / Feature Store / Open Source / Free).
      • Vertex AI - (Google Cloud / Hybrid Features for AI / Subscription fee).
      • Data Robot - (DataRobot Inc / Feature Engineering / Subscription fee).
  • Community

  • Materials