An open API service indexing awesome lists of open source software.

Apache-Spark-Guide

Apache Spark Guide
https://github.com/mikeroyal/Apache-Spark-Guide

Last synced: 15 days ago
JSON representation

  • Scala Learning Resources

  • Scala Tools and Libraries

    • Azure Databricks - based big data analytics service designed for data science and data engineering. Azure Databricks, sets up your Apache Spark environment in minutes, autoscale, and collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R, Java, and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch, and scikit-learn.
    • Scala.js
    • Polynote
    • Gitbucket
    • Gatling - Sent-Events and JMS.
    • Scalatra - performance, async web framework, inspired by [Sinatra](https://www.sinatrarb.com/).
    • Scala Native - of-time compiler and lightweight managed runtime designed specifically for Scala.
    • Play Framework
    • AWScala
    • Dotty
  • SQL/NoSQL Learning Resources

  • SQL/NoSQL Tools and Databases

    • MSSQL for Visual Studio Code
    • SQL Server Migration Assistant
    • SQL Server Business Intelligence(BI)
    • Tableau - releases/press-release-details/2019/Salesforce-Completes-Acquisition-of-Tableau/default.aspx).
    • DataGrip - sensitive code completion, helping you to write SQL code faster. Completion is aware of the tables structure, foreign keys, and even database objects created in code you're editing.
    • MySQL - native applications using the world's most popular open source database.
    • PostgreSQL - relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.
    • Amazon DynamoDB - value and document database that delivers single-digit millisecond performance at any scale. It is a fully managed, multiregion, multimaster, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications.
    • Apache HBase™ - source, NoSQL, distributed big data store. It enables random, strictly consistent, real-time access to petabytes of data. HBase is very effective for handling large, sparse datasets. HBase serves as a direct input and output to the Apache MapReduce framework for Hadoop, and works with Apache Phoenix to enable SQL-like queries over HBase tables.
    • ElasticSearch - capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java.
    • Trino - us/azure/architecture/data-guide/relational-data/etl), allow them all to use standard SQL statement, and work with numerous data sources and targets all in the same system.
    • Extract, transform, and load (ETL)
    • Redis(REmote DIctionary Server) - memory data structure store, used as a database, cache, and message broker. It provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams.
    • FoundationDB - value store and employs ACID transactions for all operations. It is especially well-suited for read/write workloads but also has excellent performance for write-intensive workloads. FoundationDB was acquired by [Apple in 2015](https://techcrunch.com/2015/03/24/apple-acquires-durable-database-company-foundationdb/).
    • MongoDB - like documents.
    • OracleDB - critical data with the highest availability, reliability, and security.
    • MariaDB - critical applications.
    • SQLite - language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.SQLite is the most used database engine in the world. SQLite is built into all mobile phones and most computers and comes bundled inside countless other applications that people use every day.
    • SQLite Database Browser
    • InfluxDB - us/azure/architecture/data-guide/relational-data/etl) or monitoring and alerting purposes, user dashboards, Internet of Things sensor data, and visualizing and exploring the data and more. It also has support for processing data from [Graphite](http://graphiteapp.org/).
    • CouchbaseDB - model NoSQL document-oriented database](https://en.wikipedia.org/wiki/Multi-model_database). It creates a key-value store with managed cache for sub-millisecond data operations, with purpose-built indexers for efficient queries and a powerful query engine for executing SQL queries.
    • dbWatch - premise, hybrid/cloud database environments.
    • Cosmos DB Profiler - time visual debugger allowing a development team to gain valuable insight and perspective into their usage of Cosmos DB database. It identifies over a dozen suspicious behaviors from your application’s interaction with Cosmos DB.
    • Toad - in expertise. This SQL management tool resolve issues, manage change and promote the highest levels of code quality for both relational and non-relational databases.
    • Sequel Pro
    • Netdata - fidelity infrastructure monitoring and troubleshooting, real-time monitoring Agent collects thousands of metrics from systems, hardware, containers, and applications with zero configuration. It runs permanently on all your physical/virtual servers, containers, cloud deployments, and edge/IoT devices, and is perfectly safe to install on your systems mid-incident without any preparation.
    • Azure Data Studio
    • Hadoop Distributed File System (HDFS) - yarn/hadoop-yarn-site/YARN.html).
    • Azure Synapse Analytics
    • Azure SQL Managed Instance - premises applications to the cloud with very few application and database changes. Managed instance has split compute and storage components.
    • Logstash
    • Kibana
    • Atlas - memory dimensional [time series database](https://en.wikipedia.org/wiki/Time_series_database).
    • Azure SQL Database - powered and automated features that optimize performance and durability for you. Serverless compute and Hyperscale storage options automatically scale resources on demand, so you can focus on building new applications without worrying about storage size or resource management.
    • Adminer
  • Uncategorized

Sub Categories