Apache-Spark-Guide
Apache Spark Guide
https://github.com/mikeroyal/Apache-Spark-Guide
Last synced: 9 days ago
JSON representation
-
ML Frameworks, Libraries, and Tools
- wikimedia
- wikimedia
- wikimedia
- wikimedia
- wikimedia
- wikimedia
- wikimedia
- wikimedia
- nGraph - of-use to AI developers.
- Tensorman
- cuML - learn.
- IBM
- Recurrent neural networks (RNNs)
- Random forest - used machine learning algorithm, which combines the output of multiple decision trees to reach a single result. A decision tree in a forest cannot be pruned for sampling and therefore, prediction selection. Its ease of use and flexibility have fueled its adoption, as it handles both classification and regression problems.
- Support Vector Machine (SVM) - group classification problems.
- AutoGluon - accuracy deep learning models on tabular, image, and text data.
- TensorFlow - to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.
-
NLP Learning Resources
- Natural Language Processing With Python's NLTK Package
- Cognitive Services—APIs for AI Developers | Microsoft Azure
- Artificial Intelligence Services - Amazon Web Services (AWS)
- Google Cloud Natural Language API
- Top Natural Language Processing Courses Online | Udemy
- Introduction to Natural Language Processing (NLP) | Udemy
- Top Natural Language Processing Courses | Coursera
- Natural Language Processing | Coursera
- Natural Language Processing in TensorFlow | Coursera
- Learn Natural Language Processing with Online Courses and Lessons | edX
- Build a Natural Language Processing Solution with Microsoft Azure | Pluralsight
- Natural Language Processing (NLP) Training Courses | NobleProg
- Natural Language Processing with Deep Learning Course | Standford Online
- Advanced Natural Language Processing - MIT OpenCourseWare
- Certified Natural Language Processing Expert Certification | IABAC
- Natural Language Processing Course - Intel
- Natural Language Processing (NLP) - based modeling of human language with statistical, machine learning, and deep learning models.
- Natural Language Processing Course - Intel
- Cognitive Services—APIs for AI Developers | Microsoft Azure
- Natural Language Processing Course - Intel
-
NLP Tools, Libraries, and Frameworks
- PyTorch
- Apache OpenNLP - source library for a machine learning based toolkit used in the processing of natural language text. It features an API for use cases like [Named Entity Recognition](https://en.wikipedia.org/wiki/Named-entity_recognition), [Sentence Detection](), [POS(Part-Of-Speech) tagging](https://en.wikipedia.org/wiki/Part-of-speech_tagging), [Tokenization](https://en.wikipedia.org/wiki/Tokenization_(data_security)) [Feature extraction](https://en.wikipedia.org/wiki/Feature_extraction), [Chunking](https://en.wikipedia.org/wiki/Chunking_(psychology)), [Parsing](https://en.wikipedia.org/wiki/Parsing), and [Coreference resolution](https://en.wikipedia.org/wiki/Coreference).
- Open Neural Network Exchange(ONNX) - in operators and standard data types.
- Anaconda
- Scikit-Learn
- NVIDIA cuDNN - accelerated library of primitives for [deep neural networks](https://developer.nvidia.com/deep-learning). cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN accelerates widely used deep learning frameworks, including [Caffe2](https://caffe2.ai/), [Chainer](https://chainer.org/), [Keras](https://keras.io/), [MATLAB](https://www.mathworks.com/solutions/deep-learning.html), [MxNet](https://mxnet.incubator.apache.org/), [PyTorch](https://pytorch.org/), and [TensorFlow](https://www.tensorflow.org/).
- Apache Spark - scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
- Apache PredictionIO
- BigDL
- Eclipse Deeplearning4J (DL4J) - based(Scala, Kotlin, Clojure, and Groovy) deep learning application. This means starting with the raw data, loading and preprocessing it from wherever and whatever format it is in to building and tuning a wide variety of simple and complex deep learning networks.
- Chainer - based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using [CuPy](https://github.com/cupy/cupy) for high performance training and inference.
- Natural Language Toolkit (NLTK) - to-use interfaces to over [50 corpora and lexical resources](https://nltk.org/nltk_data/) such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.
- Tensorflow_macOS - optimized version of TensorFlow and TensorFlow Addons for macOS 11.0+ accelerated using Apple's ML Compute framework.
- PlaidML
- Caffe
- Theano - dimensional arrays efficiently including tight integration with NumPy.
- Apache Spark Connector for SQL Server and Azure SQL - performance connector that enables you to use transactional data in big data analytics and persists results for ad-hoc queries or reporting. The connector allows you to use any SQL database, on-premises or in the cloud, as an input data source or output data sink for Spark jobs.
- Numba - aware optimizing compiler for Python sponsored by Anaconda, Inc. It uses the LLVM compiler project to generate machine code from Python syntax. Numba can compile a large subset of numerically-focused Python, including many NumPy functions. Additionally, Numba has support for automatic parallelization of loops, generation of GPU-accelerated code, and creation of ufuncs and C callbacks.
- CoreNLP
- NLPnet - of-speech tagging, semantic role labeling and dependency parsing.
- Flair - of-the-art Natural Language Processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), special support for biomedical data, sense disambiguation and classification, with support for a rapidly growing number of languages.
- Catalyst - trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
-
Python Frameworks and Tools
- Python Package Index (PyPI)
- PyCharm
- Django - level Python Web framework that encourages rapid development and clean, pragmatic design.
- Web2py - source web application framework written in Python allowing allows web developers to program dynamic web content. One web2py instance can run multiple web sites using different databases.
- Falcon - performance Python web framework for building large-scale app backends and microservices with support for MongoDB, Pluggable Applications and autogenerated Admin.
- Pillow
- IPython
- Pandas
- Matplotlib - quality figures in a variety of hardcopy formats and interactive environments across platforms.
- Python Tools for Visual Studio(PTVS)
- Python Tools for Visual Studio(PTVS)
- Pylance
- Pyright
- AWS Chalice
- HTTPie
- Pipenv
- Python Fire
- Bottle - framework for Python. It is distributed as a single file module and has no dependencies other than the [Python Standard Library](https://docs.python.org/library/).
- Neural Network Intelligence(NNI)
- Luigi - in.
- Locust
- spaCy
- PuLP
- Sanic
- GraphLab Create - scale, high-performance machine learning models.
- Sentry
-
Python Learning Resources
- CheckiO
- PCPP – Certified Professional in Python Programming 2
- Getting Started with Python in Visual Studio Code
- Google's Python Style Guide
- Google's Python Education Class
- Intro to Python for Data Science
- Intro to Python by W3schools
- Codecademy's Python 3 course
- Learn Python with Online Courses and Classes from edX
- Python Courses Online from Coursera
- The Python Open Source Computer Science Degree by Forrest Knight
- Real Python
-
Reinforcement Learning Learning Resources
- Top Deep Learning Courses Online | Coursera
- Top Deep Learning Courses Online | Udemy
- Learn Deep Learning with Online Courses and Lessons | edX
- Deep Learning Online Course Nanodegree | Udacity
- Machine Learning Engineering for Production (MLOps) course by Andrew Ng | Coursera
- Data Science: Deep Learning and Neural Networks in Python | Udemy
- Understanding Machine Learning with Python | Pluralsight
- How to Think About Machine Learning Algorithms | Pluralsight
- Deep Learning Courses | Stanford Online
- Deep Learning - UW Professional & Continuing Education
- Deep Learning Online Courses | Harvard University
- Machine Learning for Everyone Courses | DataCamp
- Artificial Intelligence Expert Course: Platinum Edition | Udemy
- Top Artificial Intelligence Courses Online | Coursera
- Learn Artificial Intelligence with Online Courses and Lessons | edX
- Professional Certificate in Computer Science for Artificial Intelligence | edX
- Artificial Intelligence Nanodegree program
- Artificial Intelligence (AI) Online Courses | Udacity
- Intro to Artificial Intelligence Course | Udacity
- Reasoning: Goal Trees and Rule-Based Expert Systems | MIT OpenCourseWare
- Expert Systems and Applied Artificial Intelligence
- Autonomous Systems - Microsoft AI
- Introduction to Microsoft Project Bonsai
- Autonomous Maritime Systems Training | AMC Search
- Top Autonomous Cars Courses Online | Udemy
- Applied Control Systems 1: autonomous cars: Math + PID + MPC | Udemy
- Learn Autonomous Robotics with Online Courses and Lessons | edX
- Autonomous Systems Online Courses & Programs | Udacity
- Autonomous Systems MOOC and Free Online Courses | MOOC List
- Robotics and Autonomous Systems Graduate Program | Standford Online
- Mobile Autonomous Systems Laboratory | MIT OpenCourseWare
- Top Reinforcement Learning Courses | Coursera
- Top Reinforcement Learning Courses | Udemy
- Top Reinforcement Learning Courses | Udacity
- Reinforcement Learning Courses | Stanford Online
- Mobile Autonomous Systems Laboratory | MIT OpenCourseWare
- Artificial Intelligence (AI) Online Courses | Udacity
- Edge AI for IoT Developers Course | Udacity
- Autonomous Systems Online Courses & Programs | Udacity
- Top Reinforcement Learning Courses | Udacity
- Reinforcement Learning - supervised](https://en.wikipedia.org/wiki/Semi-supervised_learning) or [unsupervised](https://en.wikipedia.org/wiki/Unsupervised_learning).
- Machine teaching with the Microsoft Autonomous Systems platform
- Deep Learning Online Courses | NVIDIA
- Autonomous Systems - Microsoft AI
-
Reinforcement Learning Tools, Libraries, and Frameworks
- Apache MXNet
- AutoGluon - accuracy deep learning models on tabular, image, and text data.
- Jupyter Notebook - source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Jupyter is used widely in industries that do data cleaning and transformation, numerical simulation, statistical modeling, data visualization, data science, and machine learning.
- XGBoost
- LIBSVM - SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM). It supports multi-class classification.
- Microsoft Project Bonsai - code AI platform that speeds AI-powered automation development and part of the Autonomous Systems suite from Microsoft. Bonsai is used to build AI components that can provide operator guidance or make independent decisions to optimize process variables, improve production efficiency, and reduce downtime.
- Predictive Maintenance Toolbox™ - based and model-based techniques, including statistical, spectral, and time-series analysis.
- Navigation Toolbox™ - based path planners, as well as metrics for validating and comparing paths. You can create 2D and 3D map representations, generate maps using SLAM algorithms, and interactively visualize and debug map generation with the SLAM map builder app.
- OpenAI
- ReinforcementLearning.jl
- AWS RoboMaker - managed, scalable infrastructure for simulation that customers use for multi-robot simulation and CI/CD integration with regression testing in simulation.
- Cluster Manager for Apache Kafka(CMAK)
- CARLA - source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely.
- ROS/ROS2 bridge for CARLA(package) - way communication between ROS and CARLA. The information from the CARLA server is translated to ROS topics. In the same way, the messages sent between nodes in ROS get translated to commands to be applied in CARLA.
- Azure Databricks - based big data analytics service designed for data science and data engineering. Azure Databricks, sets up your Apache Spark environment in minutes, autoscale, and collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R, Java, and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch, and scikit-learn.
-
R Learning Resources
- R
- An Introduction to R
- Google's R Style Guide
- R developer's guide to Azure
- Running R on AWS
- RStudio Server Pro for AWS
- Learn R by Codecademy
- Learn R Programming with Online Courses and Lessons by edX
- R Language Courses by Coursera
- Learn R For Data Science by Udacity
- Running R at Scale on Google Compute Engine
-
R Tools, Libraries, and Frameworks
- CatBoost
- Visual Studio Code
- Code Server
- R Debugger
- Rmarkdown
- Plotly
- Metaflow - life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
- LightGBM
- MLR
- Plumber
- Drake - focused pipeline toolkit for reproducibility and high-performance computing.
- DiagrammeR
- Knitr - purpose literate programming engine in R, with lightweight API's designed to give users full control of the output without heavy coding work.
- Broom
- VSCode-R - project.org/), including features such as extended syntax highlighting, R language service based on code analysis, interacting with R terminals, viewing data, plots, workspace variables, help pages, managing packages, and working with [R Markdown](https://rmarkdown.rstudio.com/) documents.
- Language Server Protocol (LSP)
- R Host
- Rplugin
- ML workspace - in-one web-based IDE specialized for machine learning and data science. It is simple to deploy and gets you started within minutes to productively built ML solutions on your own machines. ML workspace is the ultimate tool for developers preloaded with a variety of popular data science libraries (Tensorflow, PyTorch, Keras, and MXnet) and dev tools (Jupyter, VS Code, and Tensorboard) perfectly configured, optimized, and integrated.
- Shiny
-
Scala Learning Resources
- Scala - oriented and functional programming in one concise, high-level language. Scala's static types help avoid bugs in complex applications, and its JVM and JavaScript runtimes let you build high-performance systems with easy access to huge ecosystems of libraries.
- Scala Style Guide
- Creating a Scala Maven application for Apache Spark in HDInsight using IntelliJ
- Using Scala to Program AWS Glue ETL Scripts
- Using Flink Scala shell with Amazon EMR clusters
- AWS EMR and Spark 2 using Scala from Udemy
- Using the Google Cloud Storage connector with Apache Spark
- Write and run Spark Scala jobs on Cloud Dataproc for Google Cloud
- Scala Courses and Certifications from edX
- Top Scala Courses from Udemy
- Scala Courses from Coursera
- Top Scala Courses from Udemy
- Intro to Spark DataFrames using Scala with Azure Databricks
Programming Languages
Categories
Reinforcement Learning Learning Resources
44
ML Frameworks, Libraries, and Tools
40
SQL/NoSQL Tools and Databases
35
Python Frameworks and Tools
26
Java Tools, Libraries, and Frameworks
26
NLP Tools, Libraries, and Frameworks
22
Computer Vision Tools, Libraries, and Frameworks
20
NLP Learning Resources
20
Bioinformatics Tools, Libraries, and Frameworks
20
R Tools, Libraries, and Frameworks
20
SQL/NoSQL Learning Resources
17
Computer Vision Learning Resources
17
MATLAB Learning Resources
16
MATLAB Tools, Libraries, Frameworks
16
Scala Learning Resources
15
Reinforcement Learning Tools, Libraries, and Frameworks
15
Learning Resources for ML
14
Bioinformatics Learning Resources
13
Python Learning Resources
12
CUDA Tools Libraries, and Frameworks
12
R Learning Resources
11
Java Learning Resources
11
Scala Tools and Libraries
10
CUDA Learning Resources
6
Uncategorized
4
Deep Learning Tools, Libraries, and Frameworks
3
Deep Learning Learning Resources
2
License
1
Sub Categories
Keywords
python
14
machine-learning
9
deep-learning
8
cuda
8
java
7
gpu
6
nlp
6
natural-language-processing
5
neural-network
4
nvidia
4
pytorch
4
docker
3
tensorflow
3
named-entity-recognition
3
matlab
3
cpp
3
ai
3
artificial-intelligence
3
data-science
3
neural-networks
3
android
3
data-visualization
3
tensor
2
performance
2
compiler
2
nvidia-hpc-sdk
2
gpu-computing
2
cxx20
2
cxx17
2
semantic-role-labeling
2
scala
2
web-framework
2
cxx14
2
cxx11
2
rest
2
cxx
2
cpp20
2
kotlin
2
cpp17
2
cpp14
2
cpp11
2
algorithms
2
devops
2
http
2
cli
2
kubernetes
2
machine-learning-algorithms
2
postgresql
2
guava
1
entity-linking
1