https://github.com/johnpertoft/nlp-inference-benchmark

Trying some different inference frameworks for text generation tasks
https://github.com/johnpertoft/nlp-inference-benchmark

Last synced: 10 months ago
JSON representation

Trying some different inference frameworks for text generation tasks

Host: GitHub
URL: https://github.com/johnpertoft/nlp-inference-benchmark
Owner: johnPertoft
License: mit
Created: 2022-10-07T14:07:19.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-12-01T15:55:32.000Z (over 3 years ago)
Last Synced: 2025-03-22T08:48:42.744Z (over 1 year ago)
Language: Python
Size: 43 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Inference benchmarks/playground
This repo holds benchmarks for some deep learning inference frameworks for our specific workload.

## Results
TODO

## TODO
- Some of these require higher CUDA compute capabilities, should denote this somewhere
- Define some delimitations, e.g. only running on 3080, only using T5 etc
- Define some common output report format
- Don't use Bert here
- Plain pytorch/transformer model, also compare different implementations?
- Try with FlashAttention module?
- Try with TorchScript
- Try with torch.jit.trace
- Try lower precisions etc?
- See https://ppwwyyxx.com/blog/2022/TorchScript-Tracing-vs-Scripting/
- Pin dependency versions
- Share huggingface cache somewhere?
- What's the best way to time things?
- Run model() or model.forward()?
- Check both latency and throughput?
- Is it even possible to trace the .generate(..) method as well? Or do we need
to manually call the forward pass multiple times?
- For the onnx models, is it a problem not having access to gpu when exporting?
(since we run the export during docker build)
- Be consistent with cuda/cudnn versions etc
- Should verify correctness in output?
- For onnx, we need to use a tool to perform optimizations of the graph I think.
- https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb seems relevant

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/johnpertoft/nlp-inference-benchmark

Awesome Lists containing this project

README