Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oracle/hiq
HiQ - Observability And Optimization In Modern AI Era
https://github.com/oracle/hiq
logging monitoring observability python tracing
Last synced: 5 days ago
JSON representation
HiQ - Observability And Optimization In Modern AI Era
- Host: GitHub
- URL: https://github.com/oracle/hiq
- Owner: oracle
- License: other
- Created: 2022-03-04T08:59:02.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2025-01-08T20:48:00.000Z (13 days ago)
- Last Synced: 2025-01-09T12:18:07.736Z (13 days ago)
- Topics: logging, monitoring, observability, python, tracing
- Language: Python
- Homepage:
- Size: 18.6 MB
- Stars: 71
- Watchers: 5
- Forks: 8
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Security: SECURITY.md
Awesome Lists containing this project
README
![](docs/../hiq/docs/source/_static/hiq.png) 🦉 Observability And Optimization In Modern AI Era
----
[![Documentation Status](https://readthedocs.org/projects/hiq/badge/?version=latest)](https://hiq.readthedocs.io/en/latest/?badge=latest)
[![CodeCov][cov-img]][cov]
[![Github release][release-img]][release]
[![lic][license-img]][license]
[![arXiv](https://img.shields.io/badge/arXiv-2304.13302-red.svg)](https://arxiv.org/abs/2304.13302)> 🔥 HiQ now supports GPU profiling, DNN model visualization and tracing for DNN libraries like pyTorch, `transformers`, LAVIS, and LLMs like LLaMA, OPT, Bloom, T5 and GPT2 in addition to Onnxruntime, FastAPI and Flask.
HiQ is a `declarative`, `non-intrusive`, `dynamic` and `transparent` tracking system for both **monolithic** application and **distributed** system. It brings the runtime information tracking and optimization to a new level without compromising with speed and system performance, or hiding any tracking overhead information. HiQ applies for both I/O bound and CPU bound applications. In addition to latency tracking, HiQ provides memory, disk I/O and Network I/O tracking out of the box. The output can be saved in form of normal line by line log file, or HiQ tree, or span graph.
HiQ's philosophy is to **decouple `observability logic` from `business logic`**. We don't have to enter the black hole to observe it. Do you like the idea? Leave a ⭐ if you enjoy the project and welcome to say Hi to us on [Slack 👋](https://join.slack.com/t/hiq-myo2317/shared_invite/zt-17ejh6ybo-51IX6G1lHMXgLbq2HKIO_Q)
[📜HiQ Pape: A Declarative, Non-intrusive, Dynamic and Transparent Observability and Optimization System](https://arxiv.org/abs/2304.13302)
![Observability of DNN Model](https://raw.githubusercontent.com/henrywoo/hiq/main/hiq/docs/medium/all.png)
## Installation
- Basic Installation
```bash
pip install hiq-python
```- HiQ also supports extra installation
```bash
pip install hiq-python[fastapi] # To support fastapi web server online tracing
pip install hiq-python[gpu] # To support GPU tracing, which will install pynvml
pip install hiq-python[lavis] # To support Salesforce LAVIS Vision Language models
pip install hiq-python[transformers] # To support tracing Hugging Face's transformers library
pip install hiq-python[full] # To support all the cases, and this will install all the dependency libraries
```## Get Started
Let start with a simplest example by running HiQ against a simple monolithic python code [📄 `main.py`](hiq/examples/quick_start/main.py):
```python
# this is the main.py python source code
import timedef func1():
time.sleep(1.5)
print("func1")
func2()def func2():
time.sleep(2.5)
print("func2")def main():
func1()if __name__ == "__main__":
main()
```In this code, there is a simple chain of function calls: `main()` -> `func1` -> `func2`.
Now we want to trace the functions without modifying its code. Let's run the following:
```python
git clone https://github.com/oracle-samples/hiq.git
cd hiq/examples/quick_start
python main_driver.py
```If everything is fine, you should be able to see the output like this:
![HiQ Simplest Example](https://github.com/oracle/hiq/raw/main/hiq/docs/source/img/main_driver.jpg)
From the screenshot we can see the timestamp and the latency of each function:
| | main | func1 | func2 | tracing overhead |
|---|---|---|---|---|
| latency(second) | 4.0045 | 4.0044 | 2.5026 | 0.0000163 |HiQ just traced the `main.py` file running without touching one line of its code.
## Documentation
**HTML**: [🔗 HiQ Online Documents](https://hiq.readthedocs.io/en/latest/index.html) | **PDF**: Please check [🔗 HiQ User Guide](https://github.com/oracle/hiq/blob/main/hiq/docs/hiq.pdf).
----
Logging: https://hiq.readthedocs.io/en/latest/4_o_advanced.html#log-monkey-king
Tracing: https://hiq.readthedocs.io/en/latest/5_distributed.html
- Zipkin: https://hiq.readthedocs.io/en/latest/5_distributed.html#zipkin
- Jaeger: https://hiq.readthedocs.io/en/latest/5_distributed.html#jaegerMetrics:
- Prometheus: https://hiq.readthedocs.io/en/latest/7_integration.html#prometheusStreaming:
- Kafka: https://hiq.readthedocs.io/en/latest/7_integration.html#oci-streaming## DNN Model Observability & Visualization
HiQ can visualize DNN model. To get the following BERT model's structure, you can just run:
```
python -m hiq.vis
```![BERT](https://raw.githubusercontent.com/henrywoo/hiq/main/hiq/docs/medium/vis_bert.png)
The graph is self-explantory. There are several conventions:
- ❄️ means frozen layer, where `requires_grad` is false.
- 📈 means gradient exists for that model parameter, which usually happens after backpopulation.
- `+`, bold font, and underscored dotted line mean the displayed layer is a folded version of multiple layers with the same structure.What you need to do is just calling `print_model(model)` in your code. Refer to: [here](https://github.com/henrywoo/hiq/tree/main/hiq/examples/vis) for how to use it.
## HiQ Web UI
- Main Page
![HiQ UI Main Page](https://github.com/oracle/hiq/raw/main/hiq/docs/source/img/hiq-ui-1.png)
- Latency Details
![HiQ UI Latency Details](https://github.com/oracle/hiq/raw/main/hiq/docs/source/img/hiq-ui-2.png)
## Jupyter NoteBook
HiQ was originally developed to find Onnxruntime performance bottleneck in DNN inference, and it works well for other computation intensive applications too. The following are two examples.
### Add Observability to PaddlePaddle (PaddleOCR)
- [Latency](https://github.com/oracle-samples/hiq/blob/main/hiq/examples/paddle/demo.ipynb)
- [Memory](https://github.com/oracle-samples/hiq/blob/main/hiq/examples/paddle/demo_memory.ipynb)- Latency Gantt Chart
![Latency Gantt Chart](https://raw.githubusercontent.com/oracle/hiq/main/hiq/docs/medium/hiq-gantt.png)
- HiQ Call Graph
![HiQ Call Graph](https://raw.githubusercontent.com/oracle/hiq/main/hiq/docs/medium/hiq-call-graph.png)
### Add Observability to Onnxruntime (AlexNet)
- [Latency](https://github.com/oracle-samples/hiq/blob/main/hiq/examples/onnxruntime/demo.ipynb)
- [Intrusive](https://github.com/oracle-samples/hiq/blob/main/hiq/examples/onnxruntime/demo_intrusive.ipynb)## Examples
Please check [🔗 examples](https://github.com/oracle/hiq/blob/main/hiq/examples) for usage examples.
## Contributing
HiQ welcomes contributions from the community. Before submitting a pull request, please review our contribution guide](./CONTRIBUTING.md).
## Security
Please consult the [🔗 security guide](https://github.com/oracle/hiq/blob/main/SECURITY.md) for our responsible security vulnerability disclosure process.
## License
Copyright (c) 2022, 2023 Oracle and/or its affiliates. Released under the Universal Permissive License v1.0 as shown at .
## Presentation and Demos
- [Introduction to Observability with HiQ](https://github.com/oracle-samples/hiq/blob/main/hiq/docs/Introduction-To-Observability-With-HiQ.pdf)
[cov-img]: https://codecov.io/gh/uber/athenadriver/branch/master/graph/badge.svg
[cov]: https://hiq.readthedocs.io/en/latest/index.html[release-img]: https://img.shields.io/badge/release-v1.1.13-red
[release]: https://github.com/oracle-samples/hiq[license-img]: https://img.shields.io/badge/License-UPL--1.0-red
[license]: https://github.com/oracle-samples/hiq/blob/main/LICENSE.txt[release-policy]: https://golang.org/doc/devel/release.html#policy