https://github.com/hurricane1988/check-gpu-device
✨本项目是一个基于 Flask + Gunicorn + NVIDIA CUDA 的 API 服务,提供 CUDA 设备信息查询 和 健康检查 接口。支持 GPU 运行,可用于 深度学习推理环境 部署
https://github.com/hurricane1988/check-gpu-device
cuda docker makefile nvidia python3 pytorch
Last synced: 12 months ago
JSON representation
✨本项目是一个基于 Flask + Gunicorn + NVIDIA CUDA 的 API 服务,提供 CUDA 设备信息查询 和 健康检查 接口。支持 GPU 运行,可用于 深度学习推理环境 部署
- Host: GitHub
- URL: https://github.com/hurricane1988/check-gpu-device
- Owner: hurricane1988
- Created: 2025-03-06T07:04:51.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-05-13T20:38:38.000Z (about 1 year ago)
- Last Synced: 2025-06-06T03:41:10.233Z (about 1 year ago)
- Topics: cuda, docker, makefile, nvidia, python3, pytorch
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 3
- Watchers: 1
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## 📌 项目简介
本项目是一个基于 Flask + Gunicorn + NVIDIA CUDA 的 API 服务,提供 CUDA 设备信息查询 和 健康检查 接口。支持 GPU 运行,可用于 深度学习推理环境 部署
---
## ✨ 功能特性
- ✅ 健康检查 (/healthz) —— 确保服务正常运行
- ✅ CUDA 设备信息 (/device) —— 查询 NVIDIA GPU 设备状态
- ✅ Gunicorn 生产级 WSGI 服务器 —— 提供高性能 API
- ✅ 非 root 运行 —— 提高安全性
- ✅ Docker/Kubernetes部署支持 —— 适用于容器化环境
---
## 🚀 快速开始
### 1️⃣ 本地运行(仅开发环境)
执行帮忙
```shell
make help
```
```shell
Usage:
make
General
help Display this help.
Development
freeze Run pip freeze export the python library.
run Run a main.py script from your host.
Build
docker-build Build docker image with the check-nvidia-cuda.
docker-push Push docker image with the check-nvidia-cuda.
docker-buildx Build and push docker image for the check-gpu-check for cross-platform support.
```
安装依赖
```shell
pip install -r requirements.txt
```
启动服务
```shell
gunicorn -b 0.0.0.0:8000 --access-logfile - main:app
```
访问 API
```shell
curl http://127.0.0.1:8000/healthz
curl http://127.0.0.1:8000/device
```
---
### 2️⃣ Docker 运行(推荐方式)
构建 Docker 镜像
```shell
make docker-build
```
运行容器
```shell
docker run --gpus all -p 8000:8000 --rm check-gpu-device
```
```shell
checking nvidia-cuda environment...
✅ NVIDIA CUDA is available!
+------------------+-------------+
| Property | Value |
+==================+=============+
| PyTorch Version | 2.6.0+cu124 |
+------------------+-------------+
| CUDA Version | 12.4 |
+------------------+-------------+
| GPU Device Count | 2 |
+------------------+-------------+
+----------+----------+----------------+-------------------+--------------------+-----------------+
| Device | Name | Total Memory | Reserved Memory | Allocated Memory | Max Allocated |
+==========+==========+================+===================+====================+=================+
| 0 | Tesla T4 | 14.58 GB | 0.00 GB | 0.00 GB | 0.00 GB |
+----------+----------+----------------+-------------------+--------------------+-----------------+
| 1 | Tesla T4 | 14.58 GB | 0.00 GB | 0.00 GB | 0.00 GB |
+----------+----------+----------------+-------------------+--------------------+-----------------+
```
```shell
curl http://127.0.0.1:8000/device
```
```shell
{
"cuda_version": "12.4",
"gpu_count": 2,
"gpus": [
{
"allocated_memory_gb": 0,
"id": 0,
"max_allocated_memory_gb": 0,
"name": "Tesla T4",
"reserved_memory_gb": 0,
"total_memory_gb": 14.5775146484375
},
{
"allocated_memory_gb": 0,
"id": 1,
"max_allocated_memory_gb": 0,
"name": "Tesla T4",
"reserved_memory_gb": 0,
"total_memory_gb": 14.5775146484375
}
],
"pytorch_version": "2.6.0+cu124",
"status": "available"
}
```