{"id":30805541,"url":"https://github.com/aaronsb/shallama","last_synced_at":"2025-09-06T00:58:51.403Z","repository":{"id":309245001,"uuid":"1035298572","full_name":"aaronsb/shallama","owner":"aaronsb","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-10T19:10:51.000Z","size":9388,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-10T20:33:58.932Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aaronsb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-10T04:59:48.000Z","updated_at":"2025-08-10T19:35:50.000Z","dependencies_parsed_at":"2025-08-10T20:34:00.700Z","dependency_job_id":"851a28ad-356d-4d77-920a-93a197d9c1fa","html_url":"https://github.com/aaronsb/shallama","commit_stats":null,"previous_names":["aaronsb/shallama"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/aaronsb/shallama","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fshallama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fshallama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fshallama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fshallama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aaronsb","download_url":"https://codeload.github.com/aaronsb/shallama/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fshallama/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273842826,"owners_count":25177921,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-05T02:00:09.113Z","response_time":402,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-06T00:58:48.867Z","updated_at":"2025-09-06T00:58:51.372Z","avatar_url":"https://github.com/aaronsb.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ✨ Shallama 🦙 ✨\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"magical-llama-genie.png\" alt=\"Magical Llama emerging from a genie lamp\" width=\"400\"/\u003e\n  \u003cbr\u003e\n  \u003cem\u003e✨ Where llama.cpp meets magical wishes! ✨\u003c/em\u003e\n\u003c/div\u003e\n\n## 🪄 The Magic Shell for LLMs\n\nA powerful shell tool for running and managing llama.cpp models with a modern terminal interface, featuring LCP (Language model Command Processor). Rub the lamp, summon the llama, and watch your AI wishes come true! 🧞‍♂️\n\n## ✨ Magical Features ✨\n\n- **🚀 Easy Model Management**: Automatic discovery and smart matching of GGUF models\n- **💬 Rich Chat Interface**: Beautiful markdown rendering with syntax highlighting and ANSI color support\n- **🎨 Visual Hardware Profiling**: Real-time GPU/CPU resource monitoring with visual bars\n- **🐳 Docker Integration**: Seamless llama.cpp server management via Docker Compose\n- **📦 Multiple Backends**: Support for llama.cpp server, HuggingFace transformers (coming soon)\n- **🔧 Smart Configuration**: XDG-compliant settings with sensible defaults\n- **🎯 Intelligent Model Selection**: Hardware-aware model recommendations based on available resources\n- **🚀 Ollama API Compatibility**: Drop-in replacement for existing Ollama clients\n- **⚡ Optimized Performance**: Auto-tuned for your hardware\n\n## 🎭 Quick Start (Say the Magic Words!)\n\n### Prerequisites\n\n- Python 3.11+\n- Docker and Docker Compose\n- pipx (for clean Python tool installation)\n- NVIDIA GPU with CUDA support (optional, CPU mode available)\n\n### Installation\n\n```bash\n# Clone the repository (with submodules!)\ngit clone --recursive https://github.com/aaronsb/shallama.git\ncd shallama\n\n# If you forgot --recursive, summon the submodules:\ngit submodule update --init --recursive\n\n# Install pipx if you don't have it (choose one):\npython3 -m pip install --user pipx      # Install pipx\n# OR on Ubuntu/Debian:\nsudo apt install pipx\n# OR on macOS with Homebrew:\nbrew install pipx\n\n# Ensure pipx is in your PATH\npipx ensurepath\n\n# Install LCP using the magic installer (RECOMMENDED)\ncd lcp-py\n./install.sh    # Installs to ~/.local/bin using pipx\ncd ..\n\n# Alternative: Development install (for contributors)\n# cd lcp-py\n# pip install -e .\n# cd ..\n\n# Start the llama.cpp server\n./start-llamacpp.sh\n```\n\n✨ **Why pipx?** It creates isolated environments for Python CLI tools, preventing dependency conflicts and keeping your system Python clean!\n\n### Basic Usage\n\n```bash\n# List available models\nlcp list\n\n# Start a chat with automatic model selection\nlcp chat\n\n# Chat with a specific model\nlcp chat --model \"llama-3.2\"\n\n# View hardware capabilities\nlcp profile\n\n# Configure settings\nlcp config\n```\n\n## Migration from Ollama\n\nIf you're coming from Ollama, use the migration script:\n\n```bash\n./migrate-from-ollama.sh\n```\n\nThis will help you:\n- Export model configurations\n- Set up model directory structure\n- Migrate environment settings\n- Provide download instructions for GGUF models\n\n## 🎪 Components (Inside the Magic Box)\n\n### LCP (Language model Command Processor)\n\nThe main Python CLI tool providing:\n- Interactive chat with streaming responses\n- Model discovery and management\n- Hardware profiling and optimization\n- Rich terminal UI with markdown and ANSI color support\n\n### Llama.cpp Server\n\nDocker-based llama.cpp server with:\n- GPU acceleration support\n- Automatic model loading\n- OpenAI-compatible API\n- Configurable context sizes\n\n## Project Structure\n\n```\nshallama/\n├── lcp-py/                     # Python CLI package\n│   └── lcp/\n│       ├── ui/                 # Terminal UI components\n│       ├── backends/           # Model backend implementations\n│       └── config/             # Configuration management\n├── models/                     # GGUF model storage\n├── config/\n│   └── models.yaml            # Model configuration\n├── docker-compose.nvidia.yml   # NVIDIA GPU configuration\n├── docker-compose.cpu.yml      # CPU-only configuration\n├── docker-compose.yml          # Symlink to active config\n├── start-llamacpp.sh           # Server startup script\n├── llamacpp                    # Helper script\n└── migrate-from-ollama.sh      # Migration tool from Ollama\n```\n\n## 🔮 Configuration\n\nShallama follows XDG Base Directory specification:\n- Config: `~/.config/lcp/config.yaml`\n- Cache: `~/.cache/lcp/`\n- Data: `~/.local/share/lcp/`\n\n### Example Configuration\n\n```yaml\nbackend:\n  default: llamacpp\n  llamacpp:\n    host: localhost\n    port: 8080\n    \nui:\n  theme: monokai\n  markdown:\n    code_theme: monokai\n    show_locals: true\n    \nmodels:\n  directory: ./models\n  auto_download: false\n```\n\n## Hardware Optimization\n\nThis setup is optimized for:\n- **CPU**: Intel i9-14900K (24 cores, 32 threads)\n- **GPU**: RTX 4060 Ti (16GB VRAM)\n- **RAM**: 125GB system memory\n\n### Performance Settings\n\n**GPU Mode (NVIDIA)**:\n- GPU layers: 999 (auto-detect optimal)\n- Context length: 8192 tokens\n- Parallel requests: 4\n- Memory limit: 32GB\n\n**CPU Mode**:\n- Threads: 24 (optimized for i9-14900K)\n- Context length: 16384 tokens\n- Parallel requests: 2\n- Memory limit: 64GB\n\n## Usage\n\n### Container Management\n\n```bash\n# Start with auto-detection\n./start-llamacpp.sh\n\n# Check status\n./llamacpp status\n\n# View logs\n./llamacpp logs\n\n# Restart container\n./llamacpp restart\n\n# Stop container\n./llamacpp stop\n```\n\n### Model Management\n\n```bash\n# List available models\n./llamacpp list\n\n# Test API connection\n./llamacpp test\n\n# Get help\n./llamacpp help\n```\n\n### API Usage\n\nThe API is compatible with Ollama endpoints:\n\n```bash\n# List models\ncurl http://localhost:11434/api/tags\n\n# Generate text\ncurl -X POST http://localhost:11434/api/generate \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n    \"model\": \"llama3-8b\",\n    \"prompt\": \"Why is the sky blue?\",\n    \"stream\": false\n  }'\n\n# Chat completion (OpenAI-compatible)\ncurl -X POST http://localhost:11434/v1/chat/completions \\\n  -H 'Content-Type: application/json' \\\n  -d '{\n    \"model\": \"llama3-8b\",\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Hello!\"}\n    ]\n  }'\n```\n\n## Model Configuration\n\nEdit `./config/models.yaml` to configure your models:\n\n```yaml\nmodels:\n  llama3-8b:\n    path: \"/models/llama-3-8b-instruct.Q4_K_M.gguf\"\n    n_gpu_layers: 35      # GPU layers (adjust for your model)\n    n_ctx: 8192           # Context length\n    temperature: 0.7      # Sampling temperature\n    \n  phi4-14b:\n    path: \"/models/phi-4.Q4_K_M.gguf\"\n    n_gpu_layers: 40\n    n_ctx: 16384\n    temperature: 0.8\n\ndefault_model: \"llama3-8b\"\n```\n\n## Adding Models\n\n1. **Download GGUF models** to the `./models/` directory:\n   - From [Hugging Face](https://huggingface.co/models?library=gguf)\n   - Using `huggingface-hub` CLI tool\n   - Convert existing models with llama.cpp tools\n\n2. **Update configuration** in `./config/models.yaml`\n\n3. **Restart container** to load new models:\n   ```bash\n   ./llamacpp restart\n   ```\n\n## Troubleshooting\n\n### GPU Issues\n\n1. **NVIDIA GPU not detected**:\n   ```bash\n   # Check NVIDIA drivers\n   nvidia-smi\n   \n   # Check Docker GPU support\n   docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi\n   ```\n\n2. **Container using CPU instead of GPU**:\n   - Verify NVIDIA Container Toolkit installation\n   - Check Docker daemon configuration\n   - Restart Docker service\n\n### Performance Issues\n\n1. **Slow inference**:\n   - Increase `n_gpu_layers` in model config\n   - Check GPU memory usage with `nvidia-smi`\n   - Reduce `n_ctx` if running out of memory\n\n2. **Out of memory errors**:\n   - Reduce `n_gpu_layers` or `n_ctx`\n   - Use quantized models (Q4_K_M, Q5_K_M)\n   - Switch to CPU mode for large models\n\n### Container Issues\n\n1. **Container won't start**:\n   ```bash\n   # Check logs\n   docker compose logs llamacpp\n   \n   # Check Docker resources\n   docker system df\n   ```\n\n2. **API not responding**:\n   ```bash\n   # Test container health\n   docker compose ps\n   \n   # Check port binding\n   ss -tlnp | grep 11434\n   ```\n\n## Environment Variables\n\nKey environment variables (set in docker-compose files):\n\n- `CUDA_VISIBLE_DEVICES`: GPU selection\n- `LLAMA_CPP_N_THREADS`: CPU thread count\n- `LLAMA_CPP_N_GPU_LAYERS`: GPU layer count\n- `LLAMA_CPP_N_CTX`: Context length\n- `LLAMA_CPP_HOST`: Bind address\n- `LLAMA_CPP_PORT`: Internal port\n\n## Comparison with Ollama\n\n| Feature | LlamaCP | Ollama |\n|---------|---------|---------|\n| Base Engine | llama.cpp | llama.cpp |\n| API Compatibility | Ollama + OpenAI | Ollama |\n| Model Format | GGUF | Ollama format |\n| GPU Support | NVIDIA, CPU | NVIDIA, AMD, CPU |\n| Performance | Direct llama.cpp | Optimized wrapper |\n| Model Management | Manual + Config | Built-in |\n| Memory Usage | Lower overhead | Higher overhead |\n\n## 🪄 Development\n\n### Installation Methods\n\n#### For Users (Recommended)\n```bash\ncd lcp-py\n./install.sh      # Uses pipx to install to ~/.local/bin\n```\n\n#### For Developers\n```bash\ncd lcp-py\npip install -e .  # Editable install for development\n```\n\n#### For Contributors\n```bash\ncd lcp-py\n./dev-install.sh  # Sets up full development environment with venv\n```\n\n### Running Tests\n\n```bash\ncd lcp-py\npytest tests/\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n### Areas for Contribution\n\n1. **Additional backends**: Ollama, vLLM, TGI integration\n2. **UI enhancements**: Themes, layouts, visual effects\n3. **Model management**: Auto-download, conversion tools\n4. **Performance**: Optimization for different hardware\n\n## License\n\nMIT License - see LICENSE file for details\n\n## 🎭 Meet Our Magical Inspiration\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"jambi.jpg\" alt=\"Jambi the Genie\" width=\"250\"/\u003e\n  \u003cbr\u003e\n  \u003cem\u003e\"Meka-leka-hi-meka-hiney-ho! Your wish is granted!\" - Jambi\u003c/em\u003e\n\u003c/div\u003e\n\n### The Spirit of Jambi Lives On! 🧞\n\nShallama is inspired by Jambi the Genie from Pee-wee's Playhouse, who taught us that with the right magic words, anything is possible! Just as Jambi granted wishes from his box, our magical llama grants your AI wishes from the command line.\n\nEvery time you run `lcp chat`, remember you're summoning a genie - but instead of \"Meka-leka-hi-meka-hiney-ho\", you're typing commands that bring AI magic to life! ✨\n\n### 🔬 The Science Behind the Magic\n\nOf course, we must admit that all magic is grounded in science, and ours is no different! While it may *feel* like magic when the llama genie responds to your wishes, there's fascinating mathematics and engineering underneath.\n\n**Curious about how the magic really works?** 🤔 Dive into our [comprehensive guide to the science behind LLMs](docs/how-the-magic-works.md) where we reveal the mathematical spells, the attention mechanisms that power understanding, and the clever optimizations that make it all possible on your hardware!\n\n## 🌟 Acknowledgments\n\n### Special Thanks\n- **Jambi the Genie** 🧞 - For teaching us the power of magic words\n- **The magical llama** 🦙 - Emerging from the lamp to grant AI wishes\n\n- [llama.cpp](https://github.com/ggerganov/llama.cpp) - High-performance C++ inference\n- [Rich](https://github.com/Textualize/rich) - Beautiful terminal formatting\n- [Typer](https://github.com/tiangolo/typer) - Modern CLI framework\n- [GGUF Models on Hugging Face](https://huggingface.co/models?library=gguf)\n- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronsb%2Fshallama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaronsb%2Fshallama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronsb%2Fshallama/lists"}