Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/google/jetstream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
https://github.com/google/jetstream
gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer
Last synced: about 2 months ago
JSON representation
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
- Host: GitHub
- URL: https://github.com/google/jetstream
- Owner: google
- License: apache-2.0
- Created: 2024-03-01T00:24:07.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-09-06T21:51:55.000Z (2 months ago)
- Last Synced: 2024-09-22T11:05:18.245Z (about 2 months ago)
- Topics: gemma, gpt, gpu, inference, jax, large-language-models, llama, llama2, llm, llm-inference, llmops, mlops, model-serving, pytorch, tpu, transformer
- Language: Python
- Homepage:
- Size: 2.7 MB
- Stars: 195
- Watchers: 14
- Forks: 26
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Authors: AUTHORS
Awesome Lists containing this project
README
[![Unit Tests](https://github.com/google/JetStream/actions/workflows/unit_tests.yaml/badge.svg?branch=main)](https://github.com/google/JetStream/actions/workflows/unit_tests.yaml?query=branch:main)
[![PyPI version](https://badge.fury.io/py/google-jetstream.svg)](https://badge.fury.io/py/google-jetstream)
[![PyPi downloads](https://img.shields.io/pypi/dm/google-jetstream?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/google-jetstream/)
[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)# JetStream is a throughput and memory optimized engine for LLM inference on XLA devices.
## About
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
## JetStream Engine Implementation
Currently, there are two reference engine implementations available -- one for Jax models and another for Pytorch models.
### Jax
- Git: https://github.com/google/maxtext
- README: https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md### Pytorch
- Git: https://github.com/google/jetstream-pytorch
- README: https://github.com/google/jetstream-pytorch/blob/main/README.md## Documentation
- [Online Inference with MaxText on v5e Cloud TPU VM](https://cloud.google.com/tpu/docs/tutorials/LLM/jetstream) [[README](https://github.com/google/JetStream/blob/main/docs/online-inference-with-maxtext-engine.md)]
- [Online Inference with Pytorch on v5e Cloud TPU VM](https://cloud.google.com/tpu/docs/tutorials/LLM/jetstream-pytorch) [[README](https://github.com/google/jetstream-pytorch/tree/main?tab=readme-ov-file#jetstream-pytorch)]
- [Serve Gemma using TPUs on GKE with JetStream](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-tpu-jetstream)
- [Benchmark JetStream Server](https://github.com/google/JetStream/blob/main/benchmarks/README.md)
- [Observability in JetStream Server](https://github.com/google/JetStream/blob/main/docs/observability-prometheus-metrics-in-jetstream-server.md)
- [Profiling in JetStream Server](https://github.com/google/JetStream/blob/main/docs/profiling-with-jax-profiler-and-tensorboard.md)
- [JetStream Standalone Local Setup](#jetstream-standalone-local-setup)# JetStream Standalone Local Setup
## Getting Started
### Setup
```
make install-deps
```### Run local server & Testing
Use the following commands to run a server locally:
```
# Start a server
python -m jetstream.core.implementations.mock.server# Test local mock server
python -m jetstream.tools.requester# Load test local mock server
python -m jetstream.tools.load_tester```
### Test core modules
```
# Test JetStream core orchestrator
python -m unittest -v jetstream.tests.core.test_orchestrator# Test JetStream core server library
python -m unittest -v jetstream.tests.core.test_server# Test mock JetStream engine implementation
python -m unittest -v jetstream.tests.engine.test_mock_engine# Test mock JetStream token utils
python -m unittest -v jetstream.tests.engine.test_token_utils
python -m unittest -v jetstream.tests.engine.test_utils```