Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-real-time-AI
This is a list of awesome edgeAI inference related papers.
https://github.com/Kyrie-Zhao/awesome-real-time-AI
Last synced: about 14 hours ago
JSON representation
-
Papers
-
Survey
- Machine Learning in Real-Time Internet of Things (IoT) Systems: A Survey
- Edge Intelligence: Architectures, Challenges, and Applications
- A Survey of Multi-Tenant Deep Learning Inference on GPU
- Machine Learning in Real-Time Internet of Things (IoT) Systems: A Survey
- AI Augmented Edge and Fog Computing: Trends and Challenges
- Multi-DNN Accelerators for Next-Generation AI Systems - Savvas Bouganis, and Nicholas D. Lane., arxiv 2022
- A Survey of GPU Multitasking Methods Supported by Hardware Architecture
- The Future of Consumer Edge-AI Computing
- DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack
- Machine Learning in Real-Time Internet of Things (IoT) Systems: A Survey
- Machine Learning in Real-Time Internet of Things (IoT) Systems: A Survey
- Enable deep learning on mobile devices: Methods, systems, and applications
- Edge Intelligence: Architectures, Challenges, and Applications
- A Survey of Multi-Tenant Deep Learning Inference on GPU
- AI Augmented Edge and Fog Computing: Trends and Challenges
- Multi-DNN Accelerators for Next-Generation AI Systems - Savvas Bouganis, and Nicholas D. Lane., arxiv 2022
- The Future of Consumer Edge-AI Computing
- DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack
-
HPC and Archs
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- Real-time high performance computing using a Jetson Xavier AGX
- GPU scheduling on the NVIDIA TX2: Hidden details revealed
- Nimble: Lightweight and parallel gpu task scheduling for deep learning
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- A study of persistent threads style GPU programming for GPGPU workloads
- Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks
- Online Thread Auto-Tuning for Performance Improvement and Resource Saving
- Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming
- Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling
- Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
- Orion: A framework for gpu occupancy tuning
- Efficient performance estimation and work-group size pruning for OpenCl kernels on GPUs
- Autotuning GPU kernels via static and predictive analysis
- Fractional GPUs: Software-based compute and memory bandwidth reservation for GPUs
- Automatic thread-block size adjustment for memory-bound BLAS kernels on GPUs
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Simultaneous multikernel GPU: Multi-tasking throughput processors via fine-grained sharing
- Optimum: Runtime Optimization for Multiple Mixed Model Deployment Deep Learning Inference
- Making Powerful Enemies on NVIDIA GPUs
- Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent
- VectorVisor: A Binary Translation Scheme for Throughput-Oriented GPU Acceleration
- Arbitor: A Numerically Accurate Hardware Emulation Tool for DNN Accelerators
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
- Addressing GPU on-chip shared memory bank conflicts using elastic pipeline
- FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - Albelda, Bernabé, et al., The Journal of Supercomputing 2022
-
Other Cool Ideas
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices
- EDGEWISE: A Better Stream Processing Engine for the Edge
- LiteFlow: towards high-performance adaptive neural networks for kernel datapath
- CoCoPIE: Making Mobile AI Sweet As PIE--Compression-Compilation Co-Design Goes a Long Way
- Beyond Data and Model Parallelism for Deep Neural Networks
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Gemel: Model Merging for {Memory-Efficient},{Real-Time} Video Analytics at the Edge
- {RECL}: Responsive {Resource-Efficient} Continuous Learning for Video Analytics
- Ekya: Continuous learning of video analytics models on edge compute servers
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- CoCoPIE: Making Mobile AI Sweet As PIE--Compression-Compilation Co-Design Goes a Long Way
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Towards efficient vision transformer inference: a first study of transformers on mobile devices
- Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
- Discovering faster matrix multiplication algorithms with reinforcement learning
-
DNN Compiler
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
- Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
- Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
- Ansor: Generating High-Performance Tensor Programs for Deep Learning
- TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers
- Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs
- Ios: Inter-operator scheduler for cnn acceleration
- Chameleon: Adaptive code optimization for expedited deep neural network compilation
- AutoGTCO: Graph and Tensor Co-Optimize for Image Recognition with Transformers on GPU
- DietCode: Automatic Optimization for Dynamic Tensor Programs
- ROLLER: Fast and Efficient Tensor Compilation for Deep Learning
- FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity
- Reusing Auto-Schedules for Efficient DNN Compilation
- Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs
- Cortex: A Compiler for Recursive Deep Learning Models
- SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
- ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
- AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization
- Enabling Data Movement and Computation Pipelining in Deep Learning Compiler
- Automatic Horizontal Fusion for GPU Kernels
- Compiler Framework for Optimizing Dynamic Parallelism on GPUs
- Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation
- CMLCompiler: A Unified Compiler for Classical Machine Learning
- TinyIREE: An ML Execution Environment for Embedded Systems from Compilation to Deployment - I. Cindy, et al., arxiv 2022
- High Performance GPU Code Generation for Matrix-Matrix Multiplication using MLIR Some Early Results
- Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
- Flashattention: Fast and memory-efficient exact attention with io-awareness
- SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
- AutoMap: Automatic Mapping of Neural Networks to Deep Learning Accelerators for Edge Devices
- Optimizing Dynamic Neural Networks with Brainstorm
- EINNET: Optimizing Tensor Programs with Derivation-Based Transformations
- Welder: Scheduling Deep Learning Memory Access via Tile-graph
- TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
- Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation
- Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their {Domain-Specific} Accelerators
- Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
- Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning
- Revisiting the Evaluation of Deep Learning-Based Compiler Testing
- Learning Compiler Pass Orders using Coreset and Normalized Value Prediction
- TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
- AStitch: Enabling a New Multi-dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures
- AutoGTCO: Graph and Tensor Co-Optimize for Image Recognition with Transformers on GPU
- AutoMap: Automatic Mapping of Neural Networks to Deep Learning Accelerators for Edge Devices
- Moses: Exploiting Cross-device Transferable Features for On-device Tensor Program Optimization
- TASO: The Tensor Algebra SuperOptimizer for Deep Learning
- Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus
- DeepCuts: A Deep Learning Optimization Framework for Versatile GPU Workloads
- CASE: a compiler-assisted SchEduling framework for multi-GPU systems
- Analytical characterization and design space exploration for optimization of CNNs
- DNNFusion: accelerating deep neural networks execution with advanced operator fusion
- Seastar: Vertex-Centric Programming for Graph Neural Networks
- Nnsmith: Generating diverse and valid test cases for deep learning compilers
- VeGen: A Vectorizer Generator for SIMD and Beyond
- Triton: an intermediate language and compiler for tiled neural network computations - Tsung Kung, and David Cox., SIGPLAN Workshop 2019
- Heron: Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators
- Flextensor: An automatic schedule exploration and optimization framework for tensor computation on heterogeneous system
- Chameleon: Adaptive code optimization for expedited deep neural network compilation
- FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity
- Reusing Auto-Schedules for Efficient DNN Compilation
- SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
- On Optimizing the Communication of Model Parallelism
- ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
- AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization
- Enabling Data Movement and Computation Pipelining in Deep Learning Compiler
- CMLCompiler: A Unified Compiler for Classical Machine Learning
- Composable and Modular Code Generation in MLIR
- TinyIREE: An ML Execution Environment for Embedded Systems from Compilation to Deployment - I. Cindy, et al., arxiv 2022
- High Performance GPU Code Generation for Matrix-Matrix Multiplication using MLIR Some Early Results
- Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
- SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
- Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation
- Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations
- TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
- Graphene: An IR for Optimized Tensor Computations on GPUs
- Tensorir: An abstraction for automatic tensorized program optimization
- Codon: A Compiler for High-Performance Pythonic Applications and DSLs
- Autotuning convolutions is easier than you think
-
DNN Extraction
- DnD: A Cross-Architecture Deep Neural Network Decompiler
- Decompiling x86 Deep Neural Network Executables
- LibSteal: Model Extraction Attack towards Deep Learning Compilers by Reversing DNN Binary Library
- Cache telepathy: Leveraging shared resource attacks to learn {DNN} architectures
- Deepsteal: Advanced model extractions leveraging efficient weight stealing in memories
- Hermes attack: Steal {DNN} models with lossless inference accuracy
- Stealing machine learning models via prediction {APIs}
- SoK: Demystifying Binary Lifters Through the Lens of Downstream Applications
- Deepsteal: Advanced model extractions leveraging efficient weight stealing in memories
- SoK: Demystifying Binary Lifters Through the Lens of Downstream Applications
-
Edge-Cloud Collaborative Inference
- EdgeML: An AutoML framework for real-time deep learning on the edge
- AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments
- Mistify: Automating dnn model porting for on-device inference at the edge
- AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments
-
Concurrent DNN Inference
- Horus: Interference-aware and prediction-based scheduling in deep learning systems
- Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU
- Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences
- Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks
- NeuOS: A Latency-Predictable Multi-Dimensional Optimization Framework for DNN-driven Autonomous Systems
- Multi-Neural Network Acceleration Architecture
- Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference
- Layerweaver: Maximizing resource utilization of neural processing units via layer-wise scheduling
- Accelerating deep learning workloads through efficient multi-model execution
- MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
- Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices - Hsuan, et al., ATC 2023
- Horus: Interference-aware and prediction-based scheduling in deep learning systems
- Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks
- MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
-
Heterogeneous Platforms
- Lalarand: Flexible layer-by-layer cpu/gpu scheduling for real-time dnn tasks
- ODMDEF: On-Device Multi-DNN Execution Framework Utilizing Adaptive Layer-Allocation on General Purpose Cores and Accelerator
- CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices
- Coda: Improving resource utilization by slimming and co-locating dnn and cpu jobs
- ODMDEF: On-Device Multi-DNN Execution Framework Utilizing Adaptive Layer-Allocation on General Purpose Cores and Accelerator
- Lalarand: Flexible layer-by-layer cpu/gpu scheduling for real-time dnn tasks
- OPTiC: Optimizing collaborative CPU–GPU computing on mobile devices with thermal constraints
-
Latency Predictor
- MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge
- MAPLE-Edge: A Runtime Latency Predictor for Edge Devices
- Maple: Microprocessor a priori for latency estimation
- Predicting and reining in application-level slowdown on spatial multitasking GPUs
- Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training
- MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge
-
TinyML
- Mcunet: Tiny deep learning on iot devices
- TinyML: Current Progress, Research Challenges, and Future Roadmap
- Benchmarking TinyML systems: Challenges and direction
- Memory-efficient Patch-based Inference for Tiny Deep Learning
- Deep Learning on Microcontrollers: A Study on Deployment Costs and Challenge - Marques, Edgar Liberis, Nicholas D Lane, EuroMLSys 2022
- Yono: Modeling multiple heterogeneous neural networks on microcontrollers
- TinyML: Current Progress, Research Challenges, and Future Roadmap
- Yono: Modeling multiple heterogeneous neural networks on microcontrollers
- Benchmarking TinyML systems: Challenges and direction
-
Multi-modality Inference
-
Sparse Inference
- SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute
- A high-performance sparse tensor algebra compiler in Multi-Level IR
- Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction
- COEXE: An Efficient Co-execution Architecture for Real-Time Neural Network Services
- TorchSparse: Efficient Point Cloud Inference Engine
- A high-performance sparse tensor algebra compiler in Multi-Level IR
- Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction
- ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel Decomposition
-
Privacy-aware Inference
- PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party Computation Based Private Inference
- Cheetah: Lean and Fast Secure Two-Party Deep Neural Network Inference
- PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party Computation Based Private Inference
- SecureTVM: A TVM-Based Compiler Framework for Selective Privacy-Preserving Neural Inference - Hsuan, et al., TODAES 2023
-
Distributed Inference
- Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks
- Distributed inference with deep learning models across heterogeneous edge devices
- ARK: GPU-driven Code Execution for Distributed Deep Learning
- On Modular Learning of Distributed Systems for Predicting {End-to-End} Latency - Jan Mike, et al., NSDI 2023
- Exploring Collaborative Distributed Diffusion-Based AI-Generated Content (AIGC) in Wireless Networks
- Distributed inference with deep learning models across heterogeneous edge devices
-
-
Benchmark and Dataset
-
Open Source Projects
Sub Categories
Keywords
machine-learning
2
performance
2
cuda
2
llm-inference
1
inference
1
deepsparse
1
cpus
1
computer-vision
1
mqtt
1
mosquitto
1
kubernetes
1
iot
1
golang
1
edge-computing
1
docker
1
device-management
1
container
1
cncf
1
cloud-native
1
profiling-library
1
profiling
1
profiler
1
performance-analysis
1
library
1
gamedevelopment
1
gamedev-library
1
gamedev
1
vulkan
1
tensorflow
1
spirv
1
runtime
1
pytorch
1
mlir
1
jax
1
compiler
1
tensor
1
openmp
1
jit
1
gpu
1
code-generation
1
automatic-differentiation
1
ast
1
hidet
1
asplos23
1
artifact-evaluation
1
sparsification
1
quantization
1
pruning
1
pretrained-models
1
onnx
1