https://github.com/laelhalawani/gguf_llama
Wrapper for simplified use of Llama2 GGUF quantized models.
https://github.com/laelhalawani/gguf_llama
cpu-inference gguf llama llama2 llamacpp quantization
Last synced: 4 months ago
JSON representation
Wrapper for simplified use of Llama2 GGUF quantized models.
- Host: GitHub
- URL: https://github.com/laelhalawani/gguf_llama
- Owner: laelhalawani
- License: other
- Created: 2024-01-04T00:52:17.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-14T21:29:33.000Z (over 1 year ago)
- Last Synced: 2024-10-30T01:01:49.384Z (7 months ago)
- Topics: cpu-inference, gguf, llama, llama2, llamacpp, quantization
- Language: Python
- Homepage: https://pypi.org/project/gguf_llama
- Size: 48.8 KB
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: license.txt
Awesome Lists containing this project
README
# gguf_llama
Provides a LlamaAI class with Python interface for generating text using Llama models.
## Features
- Load Llama models and tokenizers automatically from gguf file
- Generate text completions for prompts
- Automatically adjust model size to fit longer prompts up to a specific limit
- Convenient methods for tokenizing and untokenizing text
- Fix text formatting issues before generating## Usage
```python
from llama_ai import LlamaAIai = LlamaAI("my_model.gguf", max_tokens=500, max_input_tokens=100)"
```
Generate text by calling infer():
```python
text = ai.infer("Once upon a time")
print(text)"
```
## Installation```python
pip install gguf_llama
```## Documentation
See the [API documentation](https://laelhalawani.github.io/gguf_llama) for full details on classes and methods.
## Contributing
Contributions are welcome! Open an issue or PR to improve gguf_llama.