Projects in Awesome Lists by feifeibear
A curated list of projects in awesome lists by feifeibear .
https://github.com/feifeibear/llmspeculativesampling
Fast inference from large lauguage models via speculative decoding
Last synced: 13 Apr 2025
https://github.com/feifeibear/long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
attention-is-all-you-need deepspeed-ulysses llm-inference llm-training pytorch ring-attention
Last synced: 14 May 2025
https://github.com/feifeibear/odysseus-transformer
Odysseus: Playground of LLM Sequence Parallelism
Last synced: 22 Nov 2024
https://github.com/feifeibear/swcaffe
A Deep Learning Framework customized for Sunway TaihuLight
caffe deep-learning mpi sunway-taihulight
Last synced: 22 Nov 2024
https://github.com/feifeibear/distributed-resnet-tensorflow
A Distributed ResNet on multi-machines each with one GPU card.
distributed imagenet-dataset resnet tensorflow
Last synced: 22 Nov 2024
https://github.com/feifeibear/swdnn
a highly-efficient library for deep neural networks based on Sunway TaihuLight supercomputer.
Last synced: 16 Mar 2025
https://github.com/feifeibear/swgemm
A highly efficient library for GEMM operations on Sunway TaihuLight
Last synced: 22 Nov 2024
https://github.com/feifeibear/pstensor
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
cuda deeplearning machinelearning pytorch tensorflow2
Last synced: 16 Mar 2025
https://github.com/feifeibear/pytorchmemtracer
Depict GPU memory footprint during DNN training of PyTorch
Last synced: 22 Nov 2024
https://github.com/feifeibear/llmroofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
Last synced: 22 Nov 2024
https://github.com/feifeibear/deepspeedzero3benchmark
A finetuned benchmark scripts for DeepSpeed zero3 stage
Last synced: 15 Apr 2025
https://github.com/feifeibear/swdnnv1.0
A Deep Learning Library for Sunway TaihuLight
Last synced: 22 Nov 2024
https://github.com/feifeibear/ssh-passwd-free
Method to set passwd-free for a set of IPs
Last synced: 15 Apr 2025
https://github.com/feifeibear/tensorrtbenchmark
Benchmark bert using TensorRT
Last synced: 22 Nov 2024
https://github.com/feifeibear/large-scale-tensorflow-benchmark
benchmark tensorflow for supercomputers
Last synced: 16 Mar 2025
https://github.com/feifeibear/commtest
Test for PyTorch Async Collective Communication
Last synced: 16 Mar 2025
https://github.com/feifeibear/admm-neuralnetwork
ADMM-NeuralNetwork was implemented by a potato
Last synced: 16 Mar 2025
https://github.com/feifeibear/distributed-compression-dnn
A repo clone from terngrad
Last synced: 16 Mar 2025
https://github.com/feifeibear/dist-tensorflow
Tensorflow test for supercomputer
Last synced: 16 Mar 2025
https://github.com/feifeibear/spark-smo-svm-libsvm
a spark-based SVM training program with SMO method
Last synced: 16 Mar 2025
https://github.com/feifeibear/spark-smo-svm-ws1
a spark-based SVM training program with SMO method. The working set selection method is of the 1st order!!!
Last synced: 16 Mar 2025
https://github.com/feifeibear/megablocks
A self-maintained version of megablocks (https://github.com/stanford-futuredata/megablocks)
Last synced: 16 Mar 2025