Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-hadoop

A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources
https://github.com/eric-erki/awesome-hadoop

Last synced: about 15 hours ago
JSON representation

  • Libraries and Tools

  • SQL on Hadoop

    • Apache Drill - Schema-free SQL Query Engine
    • Apache Impala - Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.
    • Presto - Distributed SQL Query Engine for Big Data. Open sourced by Facebook.
  • Websites

  • Books

  • Hadoop and Big Data Events

  • NoSQL

    • Apache Accumulo - The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system.
    • OpenTSDB - The Scalable Time Series Database
  • Data Management

    • Apache Kudu - Kudu provides a combination of fast inserts/updates and efficient columnar scans to enable multiple real-time analytic workloads across a single storage layer, complementing HDFS and Apache HBase.
    • Apache Calcite - A Dynamic Data Management Framework
  • Realtime Data Processing

    • Apache Flink - Apache Flink is a platform for efficient, distributed, general-purpose data processing. It supports exactly once stream processing.
    • Apache Storm
  • Distributed Computing and Programming

    • Apache Livy (incubating) - Apache Livy (incubating) is web service that exposes a REST interface for managing long running Apache Spark contexts in your cluster. With Livy, new applications can be built on top of Apache Spark that require fine grained interaction with many Spark contexts.
    • SparkHub - A community site for Apache Spark
  • Security

    • Apache Knox Gateway - A REST API Gateway for interacting with Hadoop clusters.
    • Project Rhino - Intel's open source effort to enhance the existing data protection capabilities of the Hadoop ecosystem to address security and compliance challenges, and contribute the code back to Apache.
  • Benchmark

  • Machine learning and Big Data analytics

    • MLlib - MLlib is Apache Spark's scalable machine learning library.
    • R - R is a free software environment for statistical computing and graphics.
    • RHadoop
    • Apache SINGA (incubating) - SINGA is a general distributed deep learning platform for training big deep learning models over large datasets
    • BigDL - BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.
  • Misc.