Distributed-Systems-Guide
Distributed Systems Guide
https://github.com/mikeroyal/Distributed-Systems-Guide
Last synced: 15 days ago
JSON representation
-
Apache Spark Learning Resources
- Apache Spark™ - scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
- Apache Spark Quick Start
- Apache Spark 3.0: For Analytics & Machine Learning | NVIDIA
- Apache Spark Basics | MATLAB & Simulink
- MATLAB Hadoop and Spark | MATLAB & Simulink
- Top Apache Spark Courses Online | Udemy
- Apache Spark In-Depth (Spark with Scala) | Udemy
- Learn Apache Spark with Online Courses | edX
- Apache Spark Training Courses | NobleProg
- Cloudera Developer Training for Apache Spark™ and Hadoop | Cloudera
- Databricks Certified Associate Developer for Apache Spark 3.0 certification | Databricks
-
Apache Spark Tools, Libraries, and Frameworks
- Spark SQL
- Spark Streaming - tolerant stream processing engine built on the Spark SQL engine. It can express your streaming computation the same way you would express a batch computation on static data from various sources including [Apache Kafka](https://kafka.apache.org/), [Apache Flume](https://flume.apache.org/), and [Amazon Kinesis](https://aws.amazon.com/kinesis/).
- MLib - level optimization primitives and higher-level pipeline APIs.
- Graphx - parallel computation. At a high-level, GraphX extends the [Spark RDD](https://spark.apache.org/docs/latest/rdd-programming-guide.html) by introducing the Resilient Distributed Property Graph: a directed multigraph with properties attached to each vertex and edge.
- PySpark
- MLflow
- Tracking component
- Projects component
- Models component
- Model Registry
- Apache PredictionIO
- BigDL
- Apache Flume
- Apache Arrow - independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs.
- Neo4j - strength graph database that combines native graph storage, advanced security, scalable speed-optimized architecture, and ACID compliance to ensure predictability and integrity of relationship-based queries.
- Apache Spark Connector for SQL Server and Azure SQL - performance connector that enables you to use transactional data in big data analytics and persists results for ad-hoc queries or reporting. The connector allows you to use any SQL database, on-premises or in the cloud, as an input data source or output data sink for Spark jobs.
- Koalas - docs/stable/reference/api/pandas.DataFrame.html) on top of [Apache Spark](https://spark.apache.org/).
- Cluster Manager for Apache Kafka(CMAK)
- Azure Databricks - based big data analytics service designed for data science and data engineering. Azure Databricks, sets up your Apache Spark environment in minutes, autoscale, and collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R, Java, and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch, and scikit-learn.
- Azure Databricks - based big data analytics service designed for data science and data engineering. Azure Databricks, sets up your Apache Spark environment in minutes, autoscale, and collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R, Java, and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch, and scikit-learn.
- Hadoop Distributed File System (HDFS) - yarn/hadoop-yarn-site/YARN.html).
- Logstash
- Kibana
- PlaidML
- OpenCV - time computer vision applications. The C++, Python, and Java interfaces support Linux, MacOS, Windows, iOS, and Android.
- Caffe
- Theano - dimensional arrays efficiently including tight integration with NumPy
- AutoGluon - accuracy deep learning models on tabular, image, and text data.
-
Bioinformatics Learning Resources
- Bioinformatics
- European Bioinformatics Institute
- Online Courses in Bioinformatics |ISCB - International Society for Computational Biology
- Bioinformatics | Coursera
- Top Bioinformatics Courses | Udemy
- Biometrics Courses | Udemy
- Learn Bioinformatics with Online Courses and Lessons | edX
- Bioinformatics Graduate Certificate | Harvard Extension School
- Introduction to Biometrics course - Biometrics Institute
- Bioinformatics and Biostatistics | UC San Diego Extension
-
Bioinformatics Tools, Libraries, and Frameworks
- Bioconductor - throughput genomic data. Bioconductor uses the [R statistical programming language](https://www.r-project.org/about.html), and is open source and open development. It has two releases each year, and an active user community. Bioconductor is also available as an [AMI (Amazon Machine Image)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) and [Docker images](https://docs.docker.com/engine/reference/commandline/images/).
- UniProt - quality and freely accessible set of protein sequences annotated with functional information.
- Bowtie 2 - efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (mammalian) genomes.
- Biopython
- BioRuby
- BioJava
- BioPHP
- Avogadro - platform use in computational chemistry, molecular modeling, bioinformatics, materials science, and related areas. It offers flexible high quality rendering and a powerful plugin architecture.
- Ascalaph Designer
- Galaxy - based platform for accessible, reproducible, and transparent computational biomedical research. It allows users without programming experience to easily specify parameters and run individual tools as well as larger workflows. It also captures run information so that any user can repeat and understand a complete computational analysis.
- Orange
- Basic Local Alignment Search Tool
- OSIRIS - domain, free, and open source STR analysis software designed for clinical, forensic, and research use, and has been validated for use as an expert system for single-source samples.
- NCBI BioSystems
- Anduril - thoughput data in biomedical research, and the platform is fully extensible by third parties. Ready-made tools support data visualization, DNA/RNA/ChIP-sequencing, DNA/RNA microarrays, cytometry and image analysis.
- Galaxy - based platform for accessible, reproducible, and transparent computational biomedical research. It allows users without programming experience to easily specify parameters and run individual tools as well as larger workflows. It also captures run information so that any user can repeat and understand a complete computational analysis.
-
C/C++ Learning Resources
- C - purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell Labs. It supports structured programming, lexical variable scope, and recursion, with a static type system. C also provides constructs that map efficiently to typical machine instructions, which makes it one was of the most widely used programming languages today.
- Embedded C - committee) to address issues that exist between C extensions for different [embedded systems](https://en.wikipedia.org/wiki/Embedded_system). The extensions hep enhance microprocessor features such as fixed-point arithmetic, multiple distinct memory banks, and basic I/O operations. This makes Embedded C the most popular embedded software language in the world.
- C & C++ Developer Tools from JetBrains
- Open source C++ libraries on cppreference.com
- C++ Graphics libraries
- C++ Libraries in MATLAB
- Google C++ Style Guide
- Introduction C++ Education course on Google Developers
- C++ style guide for Fuchsia
- Chromium C++ Style Guide
- C++ Core Guidelines
- C++ Style Guide for ROS
- Learn C++
- Learn C : An Interactive C Tutorial
- C++ Online Training Courses on LinkedIn Learning
- C++ Tutorials on W3Schools
- Learn C Programming Online Courses on edX
- Learn C++ with Online Courses on edX
- Learn C++ on Codecademy
- Coding for Everyone: C and C++ course on Coursera
- C++ For C Programmers on Coursera
- C++ Online Courses on Udemy
- Top C Courses on Udemy
- Basics of Embedded C Programming for Beginners on Udemy
- C++ For Programmers Course on Udacity
- C++ Fundamentals Course on Pluralsight
- C++ - platform language that can be used to build high-performance applications developed by Bjarne Stroustrup, as an extension to the C language.
- C++ Tools and Libraries Articles
- C++ Style Guide for ROS
-
C/C++ Tools and Frameworks
- Maven
- AWS SDK for C++
- Visual Studio - rich application that can be used for many aspects of software development. Visual Studio makes it easy to edit, debug, build, and publish your app. By using Microsoft software development platforms such as Windows API, Windows Forms, Windows Presentation Foundation, and Windows Store.
- ReSharper C++
- AppCode - fixes to resolve them automatically. AppCode provides lots of code inspections for Objective-C, Swift, C/C++, and a number of code inspections for other supported languages. All code inspections are run on the fly.
- CLion - platform IDE for C and C++ developers developed by JetBrains.
- Code::Blocks
- Conan
- High Performance Computing (HPC) SDK
- Boost - edge C++. Boost has been a participant in the annual Google Summer of Code since 2007, in which students develop their skills by working on Boost Library development.
- Automake
- Cmake - source, cross-platform family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice.
- GDB
- GCC - C, Fortran, Ada, Go, and D, as well as libraries for these languages.
- GSL - squares fitting. There are over 1000 functions in total with an extensive test suite.
- OpenGL Extension Wrangler Library (GLEW) - platform open-source C/C++ extension loading library. GLEW provides efficient run-time mechanisms for determining which OpenGL extensions are supported on the target platform.
- Libtool
- TAU (Tuning And Analysis Utilities) - based sampling. All C++ language features are supported including templates and namespaces.
- Clang - C, C++ and Objective-C++ compiler when targeting X86-32, X86-64, and ARM (other targets may have caveats, but are usually easy to fix). Clang is used in production to build performance-critical software like Google Chrome or Firefox.
- OpenCV - time applications. Cross-Platform C++, Python and Java interfaces support Linux, MacOS, Windows, iOS, and Android.
- ANTLR (ANother Tool for Language Recognition)
- Oat++ - efficient web application. It's zero-dependency and easy-portable.
- Cython
- Infer - C, and C. Infer is written in [OCaml](https://ocaml.org/).
- Azure SDK for C++
- Azure SDK for C
- C++ Client Libraries for Google Cloud Services
- Vcpkg
- CppSharp
- JavaCPP
- Spdlog - only/compiled, C++ logging library.
-
Cloud Native Learning Resources
- CNCF Cloud Native Interactive Landscape
- Cloud-Native application development for Google Cloud
- Cloud-Native development for Amazon Web Services
- Cloud Foundry Developer Training and Certification Program
- Cloud-Native Architecture Course on Pluralsight
- AWS Fundamentals: Going Cloud-Native on Coursera
- Developing Cloud-Native Apps w/ Microservices Architectures course on Udemy
- How load balancing works for cloud native applications with Azure Application Gateway on Linkedin Learning
- Developing Cloud Native Applications course on edX
-
Computer Vision Learning Resources
- Top Computer Vision Courses Online | Udemy
- Learn Computer Vision with Online Courses and Lessons | edX
- Computer Vision Nanodegree program | Udacity
- Computer Vision Training Courses | NobleProg
- Visual Computing Graduate Program | Stanford Online
- Computer Vision
- OpenCV Courses
- Computer Vision and Image Processing Fundamentals | edX
- Introduction to Computer Vision Courses | Udacity
-
Computer Vision Tools, Libraries, and Frameworks
- Microsoft AirSim - source, cross platform, and supports [software-in-the-loop simulation](https://www.mathworks.com/help///ecoder/software-in-the-loop-sil-simulation.html) with popular flight controllers such as PX4 & ArduPilot and [hardware-in-loop](https://www.ni.com/en-us/innovations/white-papers/17/what-is-hardware-in-the-loop-.html) with PX4 for physically and visually realistic simulations. It is developed as an Unreal plugin that can simply be dropped into any Unreal environment. AirSim is being developed as a platform for AI research to experiment with deep learning, computer vision and reinforcement learning algorithms for autonomous vehicles.
- Automated Driving Toolbox™ - eye-view plot and scope for sensor coverage, detections and tracks, and displays for video, lidar, and maps. The toolbox lets you import and work with HERE HD Live Map data and OpenDRIVE® road networks. It also provides reference application examples for common ADAS and automated driving features, including FCW, AEB, ACC, LKA, and parking valet. The toolbox supports C/C++ code generation for rapid prototyping and HIL testing, with support for sensor fusion, tracking, path planning, and vehicle controller algorithms.
- Data Acquisition Toolbox™
- LRSLibrary - Rank and Sparse Tools for Background Modeling and Subtraction in Videos. The library was designed for moving object detection in videos, but it can be also used for other computer vision and machine learning problems.
-
Containers
- Kubernetes - source container-orchestration system for automating application deployment, scaling, and management. It was originally designed by Google, and is now maintained by the Cloud Native Computing Foundation.
- Docker - level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels. All containers are run by a single operating-system kernel and are thus more lightweight than virtual machines.
- Rook - native storage orchestrator for Kubernetes that turns distributed storage systems into self-managing, self-scaling, self-healing storage services. It automates the tasks of a storage administrator: deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management.
- Open Container Initiative
- Buildah
- Podman
- Rancher
- Containerd - level storage to network attachments and beyond. It is available for Linux and Windows.
-
Continuous Integration/Continuous Delivery
-
CUDA Learning Resources
- CUDA - accelerated applications, the sequential part of the workload runs on the CPU, which is optimized for single-threaded. The compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers can program in popular languages such as C, C++, Fortran, Python and MATLAB.
- CUDA Toolkit Documentation
- CUDA Quick Start Guide
- CUDA on WSL
- NVIDIA Deep Learning cuDNN Documentation
- CUDA GPU support for TensorFlow
-
CUDA Tools Libraries, and Frameworks
- NVIDIA cuDNN - accelerated library of primitives for [deep neural networks](https://developer.nvidia.com/deep-learning). cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN accelerates widely used deep learning frameworks, including [Caffe2](https://caffe2.ai/), [Chainer](https://chainer.org/), [Keras](https://keras.io/), [MATLAB](https://www.mathworks.com/solutions/deep-learning.html), [MxNet](https://mxnet.incubator.apache.org/), [PyTorch](https://pytorch.org/), and [TensorFlow](https://www.tensorflow.org/).
- Chainer - based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using [CuPy](https://github.com/cupy/cupy) for high performance training and inference.
- CUDA Toolkit - accelerated applications. The CUDA Toolkit allows you can develop, optimize, and deploy your applications on GPU-accelerated embedded systems, desktop workstations, enterprise data centers, cloud-based platforms and HPC supercomputers. The toolkit includes GPU-accelerated libraries, debugging and optimization tools, a C/C++ compiler, and a runtime library to build and deploy your application on major architectures including x86, Arm and POWER.
- CUDA-X HPC - X HPC includes highly tuned kernels essential for high-performance computing (HPC).
- CuPy - compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it. It supports a subset of numpy.ndarray interface.
- cuDF - like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.
- ArrayFire - purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures including CPUs, GPUs, and other hardware acceleration devices.
- AresDB - powered real-time analytics storage and query engine. It features low query latency, high data freshness and highly efficient in-memory and on disk storage management.
- NVIDIA Container Toolkit - container) and utilities to automatically configure containers to leverage NVIDIA GPUs.
- CUTLASS - performance matrix-multiplication (GEMM) at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement cuBLAS.
- CUB
- Thrust - level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs.
- Arraymancer - dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing ecosystem.
- Kintinuous - time dense visual SLAM system capable of producing high quality globally consistent point and mesh reconstructions over hundreds of metres in real-time with only a low-cost commodity RGB-D sensor.
-
Deep Learning Learning Resources
-
Deep Learning Tools, Libraries, and Frameworks
- AMD FidelityFX Super Resolution (FSR) - quality solution for producing high resolution frames from lower resolution inputs. It uses a collection of cutting-edge Deep Learning algorithms with a particular emphasis on creating high-quality edges, giving large performance improvements compared to rendering at native resolution directly. FSR enables “practical performance” for costly render operations, such as hardware ray tracing for the AMD RDNA™ and AMD RDNA™ 2 architectures.
- Intel Xe Super Sampling (XeSS) - cores to run XeSS. The GPUs will have Xe Matrix eXtenstions matrix (XMX) engines for hardware-accelerated AI processing. XeSS will be able to run on devices without XMX, including integrated graphics, though, the performance of XeSS will be lower on non-Intel graphics cards because it will be powered by [DP4a instruction](https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/11th-gen-quick-reference-guide.pdf).
- CARLA - source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely.
- ROS/ROS2 bridge for CARLA(package) - way communication between ROS and CARLA. The information from the CARLA server is translated to ROS topics. In the same way, the messages sent between nodes in ROS get translated to commands to be applied in CARLA.
- Image Processing Toolbox™ - standard algorithms and workflow apps for image processing, analysis, visualization, and algorithm development. You can perform image segmentation, image enhancement, noise reduction, geometric transformations, image registration, and 3D image processing.
-
DevOps
- OpenStack - source software platform for cloud computing, mostly deployed as infrastructure-as-a-service that controls large pools of compute, storage, and networking resources throughout a datacenter, managed through a dashboard or via the OpenStack API. OpenStack works with popular enterprise and open source technologies making it ideal for heterogeneous infrastructure.
- Chef
- Salt - based, open-source software for event-driven IT automation, remote task execution, and configuration management. Supporting the "Infrastructure as Code" approach to data center system and network deployment and management, configuration automation, SecOps orchestration, vulnerability remediation, and hybrid cloud control.
- Microsoft Azure - managed data centers.
- Google Cloud Platform - leading tools(data management, hybrid & multi-cloud, and AI & ML) with Cloud Storage for enhanced support with everything from security and data transfer, to data backup and archive. Expand all . Backup, archival, and disaster recovery. Along with File systems and gateways.
- Cloud Foundry
- Terraform - source infrastructure as code software tool created by HashiCorp.It enables users to define and provision a datacenter infrastructure using a high-level configuration language known as Hashicorp Configuration Language (HCL), or optionally JSON.
- Microsoft Azure - managed data centers.
- Amazon web service(AWS) - to-use and cost-effective cloud computing solutions. The AWS platform is developed with a combination of infrastructure as a service (IaaS), platform as a service (PaaS) and packaged software as a service (SaaS) offerings.
Categories
ML Frameworks, Libraries, and Tools
34
C/C++ Tools and Frameworks
31
SQL/NoSQL Tools and Databases
31
MATLAB Tools, Libraries, Frameworks
30
C/C++ Learning Resources
29
Apache Spark Tools, Libraries, and Frameworks
28
Reinforcement Learning Learning Resources
27
Java Tools, Libraries, and Frameworks
26
Python Frameworks and Tools
25
Telco 5G Learning Resources
17
Virtualization
17
R Tools, Libraries, and Frameworks
17
Julia Tools, Libraries and Frameworks
17
Bioinformatics Tools, Libraries, and Frameworks
16
SQL/NoSQL Learning Resources
15
CUDA Tools Libraries, and Frameworks
14
MATLAB Learning Resources
14
NLP Learning Resources
13
Scala Learning Resources
12
Python Learning Resources
12
Apache Spark Learning Resources
11
Java Learning Resources
11
Network Learning Resources
11
Bioinformatics Learning Resources
10
Telco 5G Tools and Frameworks
10
DevOps
10
Computer Vision Learning Resources
9
Scala Tools and Libraries
9
Cloud Native Learning Resources
9
R Learning Resources
8
Containers
8
Julia Learning Resources
8
NLP Tools, Libraries, and Frameworks
8
Reinforcement Learning Tools, Libraries, and Frameworks
8
File systems & Storage
7
Microservices
7
Learning Resources for ML
7
Networking Tools & Concepts
7
CUDA Learning Resources
6
Continuous Integration/Continuous Delivery
6
Deep Learning Tools, Libraries, and Frameworks
5
Deep Learning Learning Resources
5
Computer Vision Tools, Libraries, and Frameworks
4
License
1
Network Protocols
1
Sub Categories
Keywords
python
17
machine-learning
9
java
8
deep-learning
8
cpp
8
cuda
8
nlp
6
gpu
6
cli
5
curl
5
natural-language-processing
5
data-science
5
julia
4
http
4
neural-network
4
pytorch
4
nvidia
4
matlab
3
ai
3
c
3
tensorflow
3
named-entity-recognition
3
azure
3
cxx14
3
artificial-intelligence
3
cpp14
3
cpp11
3
neural-networks
3
data-visualization
3
android
3
docker
3
visual-studio
2
algorithms
2
machine-learning-algorithms
2
semantic-role-labeling
2
big-data
2
kvm
2
performance
2
web-framework
2
compiler
2
devops
2
kubernetes
2
scala
2
cplusplus
2
virtualization
2
iot
2
numpy
2
azure-sdk
2
rest
2
visualization
2