awesome-oneapi
An Awesome list of oneAPI projects
https://github.com/uxlfoundation/awesome-oneapi
Last synced: 13 days ago
JSON representation
-
Related Communities
-
Tools and Development
-
-
Table of Contents
-
AI - Computer Vision
- BMW-IntelOpenVINO-Detection-Inference-API - This is a repository for an object detection inference API using OpenVINO, supporting both Windows and Linux operating systems
- Certiface Anti-Spoofing - Certiface AntiSpoofing use oneAPI for fast decode video for perform liveness detection with inference. The system is capable of spotting fake faces and performing anti-face spoofing in face recognition systems.
- diffusers - Pyke Diffusers is a modular Rust library for pretrained diffusion model inference to generate images using ONNX Runtime as a backend for accelerated generation on both CPUs and GPUs, including features like low memory usage and quantization. It offers an interactive stable diffusion demo and instructions on how to install and use the tool.
- DPCPP-image-Blurring-with-SYCL - A program developed with DPC++ SYCL for parallelizing the Image Blurring process.
- Fast_Human_Pose_Estimation_Pytorch - This is an unofficial implementation for the paper "Fast Human Pose Estimation". The code mainly comes from the PyTorch implementation for Stacked Hourglass Network.
- gocv - The gocv package is a set of Go bindings for the OpenCV 4 computer vision library that supports the latest releases of Go and OpenCV v4.7.0 on Linux, macOS, and Windows.
- RapidOCR - This is the README for RapidOCR, a project that provides OCR tools and models for detecting text in images.
- smart-retail-analytics - The retail analytics application uses video or camera resources to monitor activity and keep track of inventory.
- Stable Diffusion - This repository contains Stable Diffusion models trained from scratch and will be continuously updated with new checkpoints.
- stable_diffusion_arc - The project guide provides instructions on how to set up and run the stable diffusion inference model on Intel Arc GPUs.
- stable-diffusion-webui-arc-directml - The project involves a web UI for stable diffusion on Intel ARC with DirectML.
- stable_diffusion.openvino - This GitHub project provides an implementation of text-to-image generation using stable diffusion on Intel CPU or GPU. It requires Python 3.9.0 and is compatible with OpenVINO.
- visionicpp - A machine vision library written in SYCL and C++ that shows performance-portable implementation of graph algorithms
- yolov5_export_cpu - The project provides documentation on exporting YOLOv5 models for fast CPU inference using Intel's OpenVINO framework
-
AI - Data Science
- root-experimental - Jolly Chen's fork of root.cern demnostrating porting RDataFrame to SYCL from CUDA.
- Boosting epistasis detection on Intel CPU+GPU systems - This work focuses on exploring the architecture of Intel CPUs and Integrated Graphics and their heterogeneous computing potential to boost performance and energy-efficiency of epistasis detection. This will be achieved making use of OpenCL Data Parallel C++ and OpenMP programming models.
- HIAS TassAI Facial Recognition Agent - Security is an important issue for hospitals and medical centers to consider. Today's Facial Recognition can provide ways of automating security in the medical industry reducing staffing costs and making medical facilities safer for both patients and staff.
- daal4py - A simplified API to Intel® DAAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel® DAAL for either direct usage or integration into one's own framework.
-
AI - Frameworks and Toolkits
- AI Personal Identifiable Information Data Protection - Provides anonyimzation functions, which include methods for masking, hashing and encrypting/decrypting the PII data in large datasets. Can be used to protect the privacy and security of individuals in a dataset.
- AI Structured Data Generation - Generate structured synthetic data for training and inferencing.
- AI based transcribing - A reference solution showing how to use speech to text conversion to convert audio session tapes into digital notes in a psychologist's office.
- BMW-Anonymization-API - The BMW Anonymization API is a privacy tool designed to obfuscate sensitive information in images and videos to preserve individual anonymity. Its features include agnostic localization techniques, modular sensitive information training, scalable anonymization techniques, and compatibility with deep learning models
- Credit Card Fraud Detection - Uses Intel AI Analytics Toolkit and scikit-learn to train a AI algorithm to detect credit card fraud.
- Customer Chatbot - a pytorch based conversational AI chatbot for customer care.
- Customer Churn Prediction - Using historical customer churn data along with service details, a machine learning model built to predict whether the customer is going to churn. Reducing churn is key in the telecommunications industry to attract new customers and avoid contract terminations.
- Customer Segmentation for Online Retailers - Demonstrates how machine learning can aid in building a deeper understanding of a businesses clientele by segmenting customers into clusters that can be used to implement personalized and targeted campaign.
- Data Streaming Anomaly Detection - help detect anomalies using tensorflow and oneAPI to build a deep learning model that can detect anomalies in data collected from a IOT device to monitor equipment condition and prevent any issue from being cascaded.
- deeplearning4j - The Eclipse DeepLearning4J ecosystem supports all the needs for JVM-based deep learning applications with various libraries.
- deeplearning4j-examples - The Eclipse Deeplearning4j (DL4J) ecosystem is a set of projects that supports all the needs of a JVM-based deep learning application.
- DeepRec - DeepRec is a recommendation deep learning framework based on TensorFlow, which has been developed since 2016 and supports core businesses such as Taobao search recommendation and advertising.
- Demand Forecasting - Builds and trains an AI model using deep learning to train and utiliez a CNN-LSTM time series model that predicts the next days demand every item based on 130 days worth of sales data.
- Digital Twin for Design Exploration - A model that can be used to test digital replicas of real world products or devices for faults.
- Disease Prediction - Demonstrates using a deep learning based NLP pipeline to train a document classifier that takes in notes from patient's symptoms and predicts the diagnoses among a set of known diseases.
- dlstreamer - The Intel Deep Learning Streamer is an open source streaming media analytics framework based on the GStreamer multimedia framework. It is optimized for performance and functional interoperability between GStreamer plugins built on various backend libraries, with support for over 70 pre-trained models for various use cases.
- Documentation Automation - based on the Tensorflow BERT transfer learning NER Model, build a deep learning model to predict the named entity tags for a given sentence.
- Drone Navigation Inspection - Find safe drone landing zone without damaging property or injuring people using oneAPI and TensorFlow.
- Engineering Design Optimizations - Train a model to create new bicycle designs with unique frames and handles, and generalize rare novelties to a broad set of designs, competely automatic and without requiring human intervention.
- flashlight - Flashlight is a machine learning library written in C and created by Facebook AI Research. It features internal APIs for tensor computation, high performance defaults using just-in-time kernel compilation, and scalability
- Historical Assets Document Processing \(OCR\) - Allows you to process large amounts of structured, semi-structured and unstructured content in documents. Through the use of image processing, analysis, text region detection and text extraction using OCR - the results can then be stored and can be put into a database.
- Image Data Generation - An AI-enabled image generator that aids in generating accurate image and image segmentation datasets where availability of such datasets are limited.
- intel-extension-for-tensorflow - Intel Extension for TensorFlow is a plugin based on TensorFlow PluggableDevice, which aims to bring devices such as Intel XPU, GPU, and CPU into TensorFlow.
- intel-extension-for-transformers - Intel Extension for Transformers is a toolkit designed to efficiently accelerate transformer-based models on Intel platforms, optimized for 4th gen Intel Xeon Scalable Processor (codename Sapphire Rapids).
- intel-extension-for-pytorch - Intel Extension for PyTorch provides features optimizations for an extra performance boost on Intel hardware including CPUs and Discrete GPUs and offers easy GPU acceleration for Intel Discrete GPUs with PyTorch.
- Invoice To Cash Automation - AI toolkit to extract information from claim documents to categorize the claims. Helps develop models to accelerate the resolution of accounts receivable claims for trade promotion deductions.
- Intelligent Indexing - A reference kit to build an AI-based Natural Language Processing solution for classifying documents.
- KernelAbstractions.jl - KernelAbstractions (KA) is a package that enables you to write GPU-like kernels targetting different execution backends.
- Loan Default Risk Prediction - Train and utilize an AI model using XGBoost to predict the probability of a loan default from client characteristics and the type of loan obligation.
- Medical Imaging Diagnostics - Using machine learning and deep learning, train an AI algorithm that identifies images that warrant further attention to classify abnormalities.
- models - The ONNX Model Zoo is a collection of pre-trained, state-of-the-art machine learning models in the ONNX format. These models are contributed by community members and accompanied by Jupyter notebooks for model training and running inference with the trained model.
- Network Intrusion Detection - A pattern based network intrusion system using oneAPI and machine learning.
- neural-compressor - Intel Neural Compressor is an open-source Python library for applying popular model compression techniques, such as pruning, quantization, sparsity, and distillation, on all mainstream deep learning frameworks and Intel extensions.
- nnfusion - A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
- optimum - Optimum is an extension of Transformers and Diffusers that provides optimization tools for efficiency to train and run machine learning models on targeted hardware, while also being easy to use.
- optimum-intel - Optimum Intel is an interface between the Transformers and Diffusers libraries and Intel's different tools and libraries that help accelerate end-to-end pipelines on Intel architectures.
- Order to Delivery Time Forecasting - A machine learning based predictive model that provides delivery time forecasting for e-commerce platform.
- pipeline-server - Intel Deep Learning Streamer (DL Streamer) is a Python package and microservice that supports the deployment of optimized media analytics pipelines. It includes customizable media analytics containers, APIs to monitor pipelines, no-code pipeline definitions, and deep learning model integration with openvino.
- portDNN - portDNN is a library implementing neural network algorithms written using SYCL.
- Power Line Fault Detection - Process and analyze signals from a 3-phase power supply system used in power lines to predict whether or not a signal has a partial discharge using SciPy and NumPy calculations.
- Predictive Asset Maintenance - Shows an alternative method of using oneAPI AI Analytics Toolkit over the stock version of the same package like XGBoost.
- Product Recommedation - A reference kit that demonstrates one way where AI can be used to build a recommendation system for an e-commerce business using scikit-learn and oneAPI.
- Purchase Prediction - A oneAPI based reference AI model that uses machine learning to predict purchases of customers.
- pycaret - PyCaret is an open-source, low-code machine learning library in Python that automates the machine learning workflow. It is an end-to-end machine learning and model management tool that replaces hundreds of lines of code with a few lines to make experiments exponentially fast and efficient.
- pynufft - The pynufft library is a Python package for non-uniform fast Fourier transform, based on a min-max interpolator, with experimental support for CuPy, PyTorch, and TensorFlow Eager mode
- shumai - The Shumai project is a differentiable tensor library for TypeScript and JavaScript built with Bun and Flashlight. It provides standard array utilities, gradients, and supported operators.
- Structural Damage Assessment - A PyTorch-based AI model that works on satellite-captured images to assess the severity of damage in the aftermath of a natural disaster.
- Synthetic Voice/Audio Generation - Generate synthetic voices and speeches - can be used in chatbots, virtual assistants, and is applicable in a host of applications. Voice synthesis technology is increasingly used to create more natural sounding virtual assistants.
- Text Data Generation - Creates synthetic data that is artificially generated. This reference kit uses a pre-trained GPT2 modle provided by hugging face to generate synthetic data applicable to product testing and training machine learning algorithms without running into privacy issues.
- Traffic Camera Object Detection - reference kit demonstrating how to improve traffic using a number of different technology and oneAPI.
- Vertical Search Engine - Demonstrates a possible reference implementation of a deep learning based NLP pipeline for semantic search of an organization's document using a pre-trained model.
- Visual Process Discovery - A reference kit implementing visual process discovery. VPDs can be used to enhance customer experience by providing personalized solutions knowing their needs as they navigate through a company's website.
- Visual Quality Inspection - Build a computer vision based model for building quality visual inspection based on a dataset from the pharma industry.
- webnn-native - WebNN Native is an implementation of the Web Neural Network API, providing building blocks, headers, and backends for ML platforms including DirectML, OpenVINO, and XNNPACK.
- ZenDNN - Zen deep neural network library ZendNN is a powerful library for deep learning inference applications on AMD CPUs. It includes APIs for basic neural network building blocks and is optimized for AMD CPUs.
- PPLNN - PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing. It can run various ONNX models and has better support for OpenMMLab.
- pynufft - The pynufft library is a Python package for non-uniform fast Fourier transform, based on a min-max interpolator, with experimental support for CuPy, PyTorch, and TensorFlow Eager mode
- InfiniTensor - InfiniTensor is a high-performance inference engine tailored for GPUs and AI accelerators. Its design focuses on effective deployment and swift academic validation.
- ThundeRiNG - based pseudo random number generator (PRNG) that can concurrently generate massive number of independent sequences of random numbers.
- AmgT - AmgT, a new AMG solver that utilizes the tensor core and mixed precision ability of the latest GPUs during multiple phases of the AMG algorithm.
- nndeploy - nndeploy is an easy-to-use, high-performance, multi-platform AI inference deployment framework.
- Neurenix - Neurenix is an AI framework optimized for embedded devices (Edge AI), with support for multiple GPUs and distributed clusters. The framework specializes in AI agents, with native support for multi-agent, reinforcement learning, and autonomous AI.
- LBANN: Livermore Big Artificial Neural Network Toolkit - The Livermore Big Artificial Neural Network toolkit (LBANN) is an open-source, HPC-centric, deep learning training framework that is optimized to compose multiple levels of parallelism. LBANN provides model-parallel acceleration through domain decomposition to optimize for strong scaling of network training. It also allows for composition of model-parallelism with both data parallelism and ensemble training methods for training large neural networks with massive amounts of data. LBANN is able to advantage of tightly-coupled accelerators, low-latency high-bandwidth networking, and high-bandwidth parallel file systems.
- DiffKt - A Differentiable Programming Framework for Kotlin - DiffKt is a general-purpose, functional, differentiable programming framework for Kotlin. It can automatically differentiate through functions of tensors, scalars, and user-defined types. It supports forward-mode and reverse-mode differentiation including Jacobian-vector and vector-Jacobian products, which can be composed for higher-order differentiation.
- MegEngine - MegEngine is a fast, scalable and easy-to-use deep learning framework, with auto-differentiation.
- OneFlow - OneFlow is a performance-centered and open-source deep learning framework.
- MagmaDNN - A neural network library in c++ aimed at providing a simple, modularized framework for deep learning that is accelerated for heterogeneous architectures.
- scikit-learn-intelex - Intel r Extension for scikit learn is a free AI accelerator that can accelerate existing scikit learn code without the need to change the existing code. It offers patching and replacing the stock scikit learn algorithms with their optimized versions provided by the extension, which results in over 10-100x acceleration across a variety of applications.
-
AI - Machine Learning
- DQRM - Deep Quantized Recommendation Model (DQRM) is a recommendation framework that is small, powerful in inference, and efficient to train.
- ort - ort is an (unofficial) ONNX Runtime 1.15 wrapper for Rust based on the now inactive onnxruntime-rs. ONNX Runtime accelerates ML inference on both CPU & GPU.
- Performance and Portability Evaluation of the K-Means Algorithm on SYCL with CPU-GPU architectures - This work uses the k-means algorithm to asses the performance portability of one of the most advanced implementations of the literature He-Vialle over different programming models (DPC++ CUDA OpenMP) and multi-vendor CPU-GPU architectures.
- dpcpp-svm - A DPC++ version of ThunderSVM. The mission of ThunderSVM is to help users easily and efficiently apply SVMs to solve problems. ThunderSVM exploits GPU and multi-core CPUs to achieve high efficiency.
- PLSSVM - Implementation of a parallel least squares support vector machine using multiple backends for different GPU vendors.
- HETU - Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed by DAIR Lab at Peking University. It takes account of both high availability in industry and innovation in academia.
- lc0 - Lc0 is a UCI-compliant chess engine designed to play chess via neural network, specifically those of the LeelaChessZero project.
- Singa - Apache SINGA is an Apache Top Level Project, focusing on distributed training of deep learning and machine learning models.
- PaddlePaddle - PaddlePaddle, as the first independent R&D deep learning platform in China, has been officially open-sourced to professional communities since 2016. It is an industrial platform with advanced technologies and rich features that cover core deep learning frameworks, basic model libraries, end-to-end development kits, tools & components as well as service platforms.
- XLA - XLA (Accelerated Linear Algebra) is an open-source machine learning (ML) compiler for GPUs, CPUs, and ML accelerators. The XLA compiler takes models from popular ML frameworks such as PyTorch, TensorFlow, and JAX, and optimizes them for high-performance execution across different hardware platforms including GPUs, CPUs, and ML accelerators.
- TPU-MLIR - TPU-MLIR is an open-source machine-learning compiler based on MLIR for TPU. This project provides a complete toolchain, which can convert pre-trained neural networks from different frameworks into binary files bmodel that can be efficiently operated on TPUs.
- Px0 - Px0 is a UCI-compliant xiangqi engine designed to play xiangqi via neural network, specifically those of the PikaXiangqiZero project.
- OAP MLlib - OAP MLlib is an optimized package to accelerate machine learning algorithms in Apache Spark MLlib. It is compatible with Spark MLlib and leverages open source Intel® oneAPI Data Analytics Library (oneDAL) to provide highly optimized algorithms and get most out of CPU and GPU capabilities. It also take advantage of open source Intel® oneAPI Collective Communications Library (oneCCL) to provide efficient communication patterns in multi-node multi-GPU clusters.
-
AI - Natural Language Processing
- Census
- Language Identification
- ChatGPTCLIBot - The chatgpt cli bot allows the user to run GPT models such as GPT 3.5 and GPT 4, and switch between them using the config.json file.
- CTranslate2 - CTranslate2 is a C and Python library that optimizes inference with transformer models, supporting models trained in various frameworks. It implements various performance optimization techniques such as weights quantization, layers fusion, batch reordering, and more for benchmarks of transformer models on CPU and GPU.
- fastRAG - Build and explore efficient retrieval-augmented generative models and applications. It's main goal is to make retrieval augmented generation as efficient as possible through the use of state-of-the-art and efficient retrieval and generative models.
- Gavin AI - Gavin AI is a project created by Scot_Survivor (Joshua Shiells) ShmarvDogg which aims to have English human like conversations through the use of AI and ML. Gavin works on the Transformer architecture however Performer FNet architectures are being investigated for better scaling.
- hachi - Hachi is a locally hosted web app that enables natural language search for videos and images, using an AI-based machine learning model powered by OpenAI CLIP.
- whisper-ctranslate2 - Whisper ctranslate2 is a command-line client based on ctranslate2, compatible with original OpenAI client.
- ik_llama.cpp - This repository is a fork of llama.cpp with better CPU and hybrid GPU/CPU performance, new SOTA quantization types, first-class Bitnet support, better DeepSeek performance via MLA, FlashMLA, fused MoE operations and tensor overrides for hybrid GPU/CPU inference, row-interleaved quant packing, etc.
- hachi - Hachi is a locally hosted web app that enables natural language search for videos and images, using an AI-based machine learning model powered by OpenAI CLIP.
-
Autonomous Systems
- Alice - We are writing a tutorial for an open source project on how we build an AI to work on the open source project as if she were a remote developer. Bit of a self fulfilling prophecy but who doesn't love an infinite loop now and again.
- FastChat - FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
-
Data Visualization and Rendering
- Blender - Blender is the free and open source 3D creation suite. It supports the entirety of the 3D pipeline-modeling, rigging, animation, simulation, rendering, compositing, motion tracking and video editing.
- Atrc - The ATRC offline rendering lab includes various features such as path tracing, photon mapping, and many material models. It has an optional integrated OIDN and Embree library and an interactive scene editor.
- Brayns - Brayns is a large scientific visualization platform based on CPU ray tracing, using an extension plugin architecture. It comes with several pre-made plugins, such as CircuitExplorer and MoleculeExplorer, and requires several dependencies to build
- ChameleonRT - ChameleonRT is an example path tracer that runs on multiple ray tracing backends including Embree, SYCL, DXR, Optix, Vulkan, Metal, and Ospray.
- embree - Embree is a high performance ray tracing library developed by Intel that targets graphics application developers to improve the performance of photo-realistic rendering applications. It includes various primitive types such as triangles, quads, grids, and curve primitives, and supports dynamic scenes. Embree also offers support for both CPUs and GPUs, while maintaining one code base to improve productivity and eliminate inconsistencies between the two versions of the renderer.
- fresnel - Fresnel is a Python library for path tracing that can be used to generate high quality images in real time.
- f3d - F3D is a fast and minimalist 3D viewer that supports multiple file formats and can show animations, supporting thumbnails and many rendering and texturing options including real-time physically based rendering and raytracing.
- hdospray - The ospray for hydra is an open-source plugin for Pixar's USD to extend the hydra rendering framework with Intel Ospray. It is highly optimized for Intel CPU architectures ranging from laptops to large-scale distributed HPC systems.
- ml-hypersim - The HyperSim dataset is a photorealistic synthetic dataset for indoor scene understanding that includes dense per-pixel semantic instance segmentations and complete camera information for every image.
- openpgl - The Intel Open Path Guiding Library (Open PGL) implements path guiding into a renderer, offering implementations of current state-of-the-art path guiding methods which increase the sampling quality and renderer efficiency.
- ospray - Ospray is an open source, scalable and portable ray tracing engine designed for high fidelity visualization on Intel architecture CPUs. It allows users to easily build interactive applications using ray-tracing based rendering for both surface and volume-based visualizations.
- ospray_studio - Ospray Studio is an open-source, interactive visualization and ray tracing application that utilizes Intel Ospray as its core rendering engine. Users can create scene graphs to render complex scenes with high-fidelity or very large scenes requiring supercomputing resources.
- point-cloud-utils - Point Cloud Utils (PCU) is an easy-to-use Python library for processing and manipulating 3D point clouds and meshes. It provides several algorithms for generating point samples on meshes, downsampling point clouds, and computing distances between point clouds.
- redner - Redner is a differentiable renderer that can compute correct rendering gradients stochastically without approximation. It can simulate photons and produce realistic lighting phenomena, and handle the derivatives of these features correctly.
- SORT - Sort is a cross platform physically based renderer that can be used as a standalone ray tracing program or as a renderer plugin for Blender.
- Substrate - A toolset to help developers create and deploy cloud-based VaaS services (Visualization as a Service). Deployment targets include any platforms capable of running Docker Swarm, such as Amazon AWS, institutional clusters and even personal servers. Native for Python environment (pip installable).
- tracer - Tracer is a renderer that uses Embree and USD to produce photorealistic images using path tracing on the CPU, with features like subpixel jitter antialiasing, depth of field, and a variety of integrators.
- vistle - Vistle is a modular data-parallel visualization system. It requires a C++14 compatible compiler that supports ISO/IEC 14882:2014, alongside compiling requirements of Boost, CMake and MPI. Additionally, it supports Covise, OpenCover, OpenSceneGraph and Qt 5 libraries, and also provides support code, rendering libraries, controlling code for Vistle session and visualization algorithm modules.
- volppm - Volppm is a volumetric progressive photon mapping project that features homogeneous mediums for chromatic absorption and scattering coefficients.
- yocto-gl - Yocto GL is a collection of small C++17 libraries for building physically based graphics algorithms. Each library is split into smaller ones, making code navigation easier.
- openpgl - The Intel Open Path Guiding Library (Open PGL) implements path guiding into a renderer, offering implementations of current state-of-the-art path guiding methods which increase the sampling quality and renderer efficiency.
- embree - Embree is a high performance ray tracing library developed by Intel that targets graphics application developers to improve the performance of photo-realistic rendering applications. It includes various primitive types such as triangles, quads, grids, and curve primitives, and supports dynamic scenes. Embree also offers support for both CPUs and GPUs, while maintaining one code base to improve productivity and eliminate inconsistencies between the two versions of the renderer.
- Accelerating 3D Gaussian Splatting Rendering through Level-of-Detail Structure - The 3D Gaussian Splatting method for 3D environment reconstruction from images brought significant advancements to photorealistic novel-view synthesis. It combines the advantages of primitive-based rendering with a differentiable renderer, thus obtaining state-of-the-art image quality and surpassing neural methods for scene representation in optimization and rendering speed.
- Hyperspectral imaging parallelization - Hyperspectral imaging parallelization with different programming models such as OpenMP, SYCL or Kokkos.
- oidn - Intel Open Image Denoise is an open-source library for image denoising in ray tracing rendering applications with high quality and performance, thanks to efficient deep learning-based filters that can be trained using the included toolkit and user-provided image datasets.
- ospray_studio - Ospray Studio is an open-source, interactive visualization and ray tracing application that utilizes Intel Ospray as its core rendering engine. Users can create scene graphs to render complex scenes with high-fidelity or very large scenes requiring supercomputing resources.
-
Energy
- A DPC++ Backend for the OCCA Portability Framework - OCCA—an open source portable and vendor neutral framework for parallel programming on heterogeneous platforms—is used by mission critical computational science and engineering applications of public and private sector organizations including the U.S. Department of Energy and Shell.
-
Gaming
- NovelRT - NovelRT is a cross-platform game engine for visual novels and 2D games. It is still in the early alpha stage, but currently supports graphics and audio.
-
Manufacturing
- S3_DeformFDM - The S3 Slicer is a framework for achieving support-free strength reinforcement and surface quality in multi-axis 3D printing by computing the rotation-driven deformation for the input model.
-
Mathematics and Science
- Odd Even Merge and Sorting - (C++ based, from Intel) Demonstrates how to use the odd-even mergesort algorithm (also known as "Batcher's odd–even mergesort") which may benefit whenn working with batches of short-sized to mid-sized (key, value) array pairs. Shows how to migrate CUDA based code to SYCL.
- 3D Wave Simulation - (C++ based, from Intel) The ISO3DFD sample refers to Three-Dimensional Finite-Difference Wave Propagation in Isotropic Media; it is a three-dimensional stencil to simulate a wave propagating in a 3D isotropic medium. Starts with a simple serial implementation and shows how to use SYCL to offload to the GPU. Then shows how to optimize.
- Amber - performance molecular dynamics (MD) code used by thousands of scientists in academia, national labs, and industry for computational drug discovery and related research.
- Discrete Cosine Transform Imeage Compression - (C++ based, from Intel) The Discrete Cosine Transform (DCT) sample demonstrates how DCT and Quantizing stages can be implemented to run faster using SYCL* by offloading image processing work to a GPU or other device.
- GROMACS - source software suite for high-performance molecular dynamics and output analysis.
- Jacobi Iterative Solver for Multi-GPU - (C++ based, from Intel) Illustrates how to use the Jacobi Iterative method to solve linear equations. This sample starts with a CPU-oriented application and shows how to use SYCL to offload regions of the code to a GPU. The sample walks through developing an optimization strategy by iteratively optimizing the code and ultimately targetting multi-GPUs if available.
- Monte Carlo Based Finanical Simulation for Multi-GPU - (C++ based, from Intel) Evaluates fair call price for a given set of European options using the Monte Carlo approach. MonteCarlo simulation is one of the most important algorithms in quantitative finance. This sample uses a single CPU Thread to control multiple GPUs. Shows how to migrate CUDA based code to SYCL.
- NAMD - performance simulation of large biomolecular systems.
- Lightwave Explorer - Lightwave explorer is an open source nonlinear optics simulator, intended to be fast, visual, and flexible for students and researchers to play with ultrashort laser pulses and nonlinear optics without having to buy a laser first.
- ACTS GPU Ramp - Demonstrator tracking chain on accelerators
- arpack-ng - Arpack ng is a collection of Fortran77 subroutines designed to solve large scale eigenvalue problems and is a community project maintained by volunteers.
- ATLAS Charged Particle Seed Finding with DPC++ - The ATLAS Experiment is one of the general-purpose particle physics experiments built at the Large Hadron Collider (LHC) at CERN in Geneva. Its goal is to study the behavior of elementary particles at the highest energies ever produced in a laboratory help us better understand universe.
- bfs-sycl-fpga - The Breadth-First Search algorithm implementations _memoryBFS_ and _streamingBFS_ using Intel oneAPI (SYCL2020) on Intel FPGAs
- Direction Field Visualization with Python - This project demonstrates the visualization of a direction field with Python using the differential equation of a falling object as a case study. The effectiveness of Heterogeneous Computing is also shown by exploring optimized libraries added functionalities in Intel® Distribution for Python.
- GinkgoOneAPI - In this project we want to explore the potential of having an Intel OneAPI backend for the Gingko software package: https://ginkgo-project.github.io/
- Grid - Data parallel C++ mathematical object library.
- GeometricTools - The Geometric Tools Engine (GTE) is a collection of source code for high-performance computing in mathematics, geometry, graphics, image analysis, and physics, using CPU multithreading and GPU programming.
- gptoolbox - This is a toolbox of useful MATLAB functions for geometry processing, constrained optimization and image processing. It contains several features such as mesh deformation, mesh parameterization, and discrete differential geometry operators for triangle and tetrahedral meshes.
- gtensor - gtensor is a multi-dimensional array C++14 header-only library for hybrid GPU development. It was inspired by xtensor, and designed to support the GPU port of the GENE fusion code.
- Homogeneous and Heterogeneous Implementations of a tridiagonal solver on Intel® Xeon® E-2176G with oneMKL getrs - Homogeneous and Heterogeneous implementations of a tridiagonal solver with oneMKL getrs
- LAMMPS - LAMMPS is a classical molecular dynamics simulation code designed to run efficiently on parallel computers. It was developed at Sandia National Laboratories, a US Department of Energy facility, with funding from the DOE. It is an open-source code, distributed freely under the terms of the GNU Public License (GPL) version 2.
- mapmap_cpu - MapMap CPU is a massively parallel generic MRF map solver with minimal input assumptions, capable of solving a large class of MRF problems.
- MF-LBM - This is a lattice Boltzmann code designed for direct numerical simulation of flow in porous media. It is written in Fortran 90 and optimized for vectorization and parallel programming.
- mt-kahypar - MT-KaHyPar is a multi-threaded algorithm for partitioning graphs and hypergraphs. It aims to minimize an objective function defined on the hyperedges while balancing block sizes and optimizing connectivity. It can partition extremely large graphs and hypergraphs with comparable solution quality to the best sequential graph partitioners while being more than an order of magnitude faster with only ten threads.
- NWGraph - The Northwest Graph Library (NWGraph) is a high-performance header-only generic C++ graph library based on C++20 concepts and ranges. It includes multiple graph algorithms for well-known graph kernels and supporting data structures.
- octotiger - Octo-Tiger is an astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees. It was implemented using high-level C++ libraries, specifically HPX and Vc, which allows its use on different hardware platforms.
- portBLAS - An implementation of BLAS using the SYCL open standard.
- PyPardisoProject - Pypardiso is a Python package for solving large sparse linear systems of equations using the Intel OneAPI Math Kernel Library Pardiso solver. It provides the same functionality as Scipy's spsolve but is faster in many cases.
- qmckl_sycl - SYCL GPU port of the [QMCkl: Quantum Monte Carlo Kernel Library](https://github.com/TREX-CoE/qmckl).
- SPHinxXsys - SPHinXsys provides C++ APIs for physically accurate simulation and optimization. It aims to handle coupled industrial dynamic systems including fluid, solid, multi-body dynamics and beyond. The multi-physics library is based a unique and unified computational framework by which strong couplings have been achieved for all involved physics.
- suanPan - suanPan is a finite element method (FEM) simulation platform for applications in fields such as solid mechanics and civil/structural/seismic engineering. The name suanPan (in some places such as suffix it is also abbreviated as suPan) comes from the term Suan Pan (算盤), which is Chinese abacus.
- sycl-collision-sim - Demo 3D simulation of rigid body physics with different shapes bouncing off each other confined in a box. Two implementations are provided, one sequential with standard C++ code compiled for CPU, and parallel SYCL implementation which can be compiled for any target device (e.g. a GPU) supported by a SYCL compiler.
- repulsive-surfaces - A numerical framework for optimization of surface geometry while avoiding (self-)collision.
- Grid - Data parallel C++ mathematical object library.
- Homogeneous and Heterogeneous Implementations of a tridiagonal solver on Intel® Xeon® E-2176G with oneMKL getrs - Homogeneous and Heterogeneous implementations of a tridiagonal solver with oneMKL getrs
- PyPardisoProject - Pypardiso is a Python package for solving large sparse linear systems of equations using the Intel oneAPI Math Kernel Library Pardiso solver. It provides the same functionality as Scipy's spsolve but is faster in many cases.
- GROMACS - source software suite for high-performance molecular dynamics and output analysis.
- PW-DFT - Plane-Wave density-functional theory (DFT) development for NWChemEx electronic structure software. An easy way to generate input decks, check your output decks against a large database of calculations, perform simple thermochemistry calculations, calculate the NMR and IR spectra of modest size molecule using NWChem.
- xpm - xpm (Extensive Pore Modelling) is a software for predicting flow properties in multi-scale porous media. It uses a pore network model derived from image data, specifically using Pnextract to extract this network.
- stan-dev - The Stan Math Library is a C++, reverse-mode automatic differentiation library designed to be usable, extensive and extensible, efficient, scalable, stable, portable, and redistributable in order to facilitate the construction and utilization of algorithms that utilize derivatives.
- COGENT - COGENT is a continuum (Eulerian) plasma simulation code. It is primarily focused on tokamak edge plasma geometries, but includes options for, and is extensible to, other configurations. This repository contains the COGENT code (COGENT/) as well as Chombo (Chombo/), the adaptive mesh refinement application framework from Lawrence Berkeley National Laboratory upon which COGENT is built.
- Trilinos - The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.
- LAPACK - The Linear Algebra PACKage (LAPACK) is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares problems, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky, etc. LAPACK was originally written in FORTRAN 77, and moved to Fortran 90 in version 3.2 (2008). LAPACK provides routines for handling both real and complex matrices in both single and double precision.
- elpa - The computation of selected or all eigenvalues and eigenvectors of a symmetric (Hermitian) matrix has high relevance for various scientific disciplines. For the calculation of a significant part of the eigensystem typically direct eigensolvers are used.The ELPA project was initiated with the aim to develop and implement an efficient eigenvalue solver for petaflop applications.
- esys-escript - esys-escript is a module for implementing mathematical models in Python using the finite element method (FEM). As users do not access the underlying data structures it is very easy to use and scripts can run on desktop computers as well as massive parallel supercomputers without changes. Application areas for esys-escript include geophysical inversion, earthquakes, porous media flow, reactive transport, plate subduction, erosion, earth mantle convection, and tsunamis.
- code_saturne - The basic capabilities of code_saturne enable the handling of either incompressible or expandable flows with or without heat transfer and turbulence. Dedicated modules are available for specific physics such as radiative heat transfer, combustion (gas, coal, heavy fuel oil, ...), magneto-hydrodynamics, compressible flows, two-phase flows (Euler-Lagrange approach with two-way coupling), or atmospheric flows.
- TASMANIAN - The Toolkit for Adaptive Stochastic Modeling and Non-Intrusive ApproximatioN is a collection of robust libraries for high dimensional integration and interpolation as well as parameter calibration.
- COSMA - COSMA is a parallel, high-performance, GPU-accelerated, matrix-matrix mutliplication algorithm that is communication-optimal for all combinations of matrix dimensions, number of processors and memory sizes, without the need for any parameter tuning.
- Apache MXNet - Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scalable to many GPUs and machines.
- HeavyDB - HeavyDB is an open source SQL-based, relational, columnar database engine that leverages the full performance and parallelism of modern hardware (both CPUs and GPUs) to enable querying of multi-billion row datasets in milliseconds, without the need for indexing, pre-aggregation, or downsampling
- MFLib - A Matched Filtering Library. In principle, the algorithm is quite simple in that it computes a Pearson correlation coefficient at every sample in a time series corresponding to a template. However, the actual implementation in a compiled language is tedious.
- RMGDFT - RMG is an Open Source code for electronic structure calculations and modeling of materials and molecules. It is based on density functional theory and uses a real space basis and pseudopotentials.
- ITPP - IT++ is a C++ library of mathematical, signal processing and communication classes and functions. Its main use is in simulation of communication systems and for performing research in the area of communications. The kernel of the library consists of generic vector and matrix classes, and a set of accompanying routines. Such a kernel makes IT++ similar to MATLAB or GNU Octave.
- WarpX - WarpX is an advanced electromagnetic & electrostatic Particle-In-Cell code. It supports many features including Perfectly-Matched Layers (PML), mesh refinement, and the boosted-frame technique.
- Highly Efficient FFT for Exascale - The Highly Efficient FFT for Exascale (HeFFTe) library is being developed as part of the Exascale Computing Project (ECP), which is a joint project of the U.S. Department of Energy's Office of Science and National Nuclear Security Administration (NNSA). HeFFTe delivers algorithms for distributed fast-Fourier transforms in on a heterogeneous systems, targeting the upcoming exascale machines.
- NESO (Neptune Exploratory SOftware) - This is a work-in-progress respository for exploring the implementation of a series of tokamak exhaust relevant models combining high order finite elements with particles, written in C++ and SYCL.
- Ewald-Splitting-with-Prolates - This fork includes custom modifications for the ESP (Ewald summation with prolate spheroidal wave functions) method.
- CUTLASS - CUTLASS is a collection of CUDA C++ template abstractions for implementing high-performance matrix-matrix multiplication (GEMM) and related computations at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement cuBLAS and cuDNN.
- repulsive-surfaces - A numerical framework for optimization of surface geometry while avoiding (self-)collision.
-
Programming Languages
Sub Categories
Keywords
deep-learning
33
machine-learning
29
cuda
23
pytorch
20
gpu
19
sycl
17
python
13
tensorflow
12
cpp
10
intel
10
hpc
10
scikit-learn
9
oneapi
9
opencl
9
ai-starter-kit
9
openvino
8
gpu-computing
8
inference
8
onnx
7
raytracing
7
openmp
7
neural-network
7
parallel-computing
7
computer-vision
6
hip
6
rendering
6
path-tracing
6
ray-tracing
5
c-plus-plus
5
quantization
5
gpgpu
4
high-performance-computing
4
kokkos
4
visualization
4
rust
4
object-detection
4
distributed-training
4
java
4
computer-graphics
4
onnxruntime
4
simulation
4
linear-algebra
4
path-tracer
3
deep-neural-networks
3
numpy
3
intel-xpu
3
arrayfire
3
cpu
3
ai
3
volume-rendering
3