{"id":33374368,"url":"https://github.com/procoder1199x/nanoaccel","last_synced_at":"2025-11-22T23:01:20.313Z","repository":{"id":317717183,"uuid":"1068466951","full_name":"ProCoder1199X/NanoAccel","owner":"ProCoder1199X","description":"Python Library for inference of LLMs on low end hardware and CPU optimizations ","archived":false,"fork":false,"pushed_at":"2025-11-01T18:20:39.000Z","size":50,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-01T20:24:26.803Z","etag":null,"topics":["ai","inference","large-language-models","python","pythonlibrarires"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ProCoder1199X.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-02T12:34:16.000Z","updated_at":"2025-11-01T18:20:42.000Z","dependencies_parsed_at":"2025-10-02T17:23:20.448Z","dependency_job_id":"9c95b2a4-866f-47e8-a7b6-9a251857eb7f","html_url":"https://github.com/ProCoder1199X/NanoAccel","commit_stats":null,"previous_names":["procoder1199x/nanoaccel"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ProCoder1199X/NanoAccel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProCoder1199X%2FNanoAccel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProCoder1199X%2FNanoAccel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProCoder1199X%2FNanoAccel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProCoder1199X%2FNanoAccel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ProCoder1199X","download_url":"https://codeload.github.com/ProCoder1199X/NanoAccel/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ProCoder1199X%2FNanoAccel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":285873538,"owners_count":27246054,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-22T02:00:05.934Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","inference","large-language-models","python","pythonlibrarires"],"created_at":"2025-11-22T23:01:00.672Z","updated_at":"2025-11-22T23:01:20.307Z","avatar_url":"https://github.com/ProCoder1199X.png","language":"Python","readme":"# NanoAccel: CPU-Optimized LLM Accelerator for Low-End Hardware\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![PyPI version](https://badge.fury.io/py/nanoaccel.svg)](https://badge.fury.io/py/nanoaccel)\n\nNanoAccel is a lightweight Python library designed to accelerate inference and fine-tuning of 1B-8B parameter LLMs on low-end CPUs (e.g., i3/i5 with 8-16GB RAM), without GPUs or specialized hardware. Inspired by recent research in quantization and speculative decoding, it aims for 2-3x speedups and reduced memory footprints, making LLMs accessible on budget setups.\n\n## 🚀 Features\n\n- **Ultra-low-bit quantization** (1-4 bit) for memory efficiency\n- **Advanced speculative decoding** with adaptive gamma adjustment for optimal performance\n- **CPU scheduling optimizations** with performance/efficiency core pinning\n- **Memory management** with KV cache quantization and offloading\n- **Mixed precision inference** for improved performance\n- **Comprehensive CLI** with system requirement checking\n- **Configuration management** via YAML/JSON files and environment variables\n- **Compatible models**: TinyLlama, Gemma 2B, Llama 3.2 1B/3B, Pythia, and more.\n\n## 📦 Installation\n\n### From PyPI (Recommended)\n```bash\npip install nanoaccel\n```\n\n### From Source\n```bash\ngit clone https://github.com/ProCoder1199X/NanoAccel.git\ncd NanoAccel\npip install -e .\n```\n\n### Development Installation\n```bash\ngit clone https://github.com/ProCoder1199X/NanoAccel.git\ncd NanoAccel\npip install -e \".[dev]\"\n```\n\n## 🎯 Quick Start\n\n### Basic Usage\n```bash\n# Simple inference\nnanoaccel --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt \"Hello, world!\"\n\n# With quantization for memory efficiency\nnanoaccel --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant int4 --prompt \"Tell me a story\"\n\n# With speculative decoding for speed\nnanoaccel --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --speculative --draft-model EleutherAI/pythia-70m\n```\n\n### Python API\n```python\nfrom nanoaccel import NanoAccel, QuantizationConfig\n\n# Initialize with quantization\nquant_config = QuantizationConfig(enabled=True, quant_type=\"int4\")\nnanoaccel = NanoAccel(\n    model_name=\"TinyLlama/TinyLlama-1.1B-Chat-v1.0\",\n    quant_config=quant_config,\n    mixed_precision=True\n)\n\n# Load model\nnanoaccel.load_model()\n\n# Generate text\nresult = nanoaccel.generate(\n    prompt=\"Write a short story about a robot\",\n    max_new_tokens=100,\n    temperature=0.8\n)\n\nprint(result[\"text\"])\n```\n\n### System Requirements Check\n```bash\n# Check if your system can run a specific model\nnanoaccel --check-requirements --model TinyLlama/TinyLlama-1.1B-Chat-v1.0\n\n# Display CPU information\nnanoaccel --cpu-info\n```\n\n## ⚙️ Configuration\n\n### Configuration File\nCreate a `nanoaccel.yaml` file:\n\n```yaml\nmodel:\n  default_model: \"TinyLlama/TinyLlama-1.1B-Chat-v1.0\"\n  default_draft_model: \"EleutherAI/pythia-70m\"\n\nquantization:\n  enabled: true\n  quant_type: \"int4\"\n  compute_dtype: \"float32\"\n\ngeneration:\n  max_tokens: 100\n  temperature: 0.8\n  top_p: 0.9\n\nspeculative_decoding:\n  enabled: true\n  gamma: 4\n  early_exit_threshold: 0.9\n  adaptive_gamma: true\n  gamma_min: 1\n  gamma_max: 8\n  adaptation_window: 10\n\nsystem:\n  cpu_optimization: true\n  mixed_precision: true\n```\n\n### Environment Variables\n```bash\nexport NANOACCEL_MODEL=\"TinyLlama/TinyLlama-1.1B-Chat-v1.0\"\nexport NANOACCEL_QUANT_ENABLED=\"true\"\nexport NANOACCEL_QUANT_TYPE=\"int4\"\nexport NANOACCEL_SPECULATIVE=\"true\"\n```\n\n## 🔧 Advanced Usage\n\n### Custom Quantization\n```python\nfrom nanoaccel import QuantizationConfig\n\n# Custom quantization configuration\nquant_config = QuantizationConfig(\n    enabled=True,\n    quant_type=\"int8\",\n    compute_dtype=torch.bfloat16,\n    chunk_size=2048\n)\n```\n\n### Speculative Decoding\n```python\n# Load draft model for speculative decoding\nnanoaccel.load_draft_model(\"EleutherAI/pythia-70m\")\n\n# Generate with speculative decoding\nresult = nanoaccel.generate(\n    prompt=\"Your prompt here\",\n    use_speculative=True,\n    gamma=6,  # Number of speculative tokens\n    early_exit_threshold=0.95\n)\n\n# Advanced: Use adaptive gamma adjustment\nfrom nanoaccel.speculative import SpeculativeDecoding\n\nspec_decoder = SpeculativeDecoding(\n    draft_model_name=\"EleutherAI/pythia-70m\",\n    gamma=4,\n    adaptive_gamma=True,  # Enable adaptive adjustment\n    gamma_min=1,\n    gamma_max=8,\n    adaptation_window=10\n)\n\n# Monitor adaptive statistics\nadaptive_stats = spec_decoder.get_adaptive_stats()\nprint(f\"Current gamma: {adaptive_stats['current_gamma']}\")\nprint(f\"Recent acceptance rates: {adaptive_stats['recent_acceptance_rates']}\")\n```\n\n### Performance Monitoring\n```python\n# Get inference statistics\nstats = nanoaccel.get_stats()\nprint(f\"Average tokens/sec: {stats['average_tokens_per_second']:.2f}\")\nprint(f\"Total tokens generated: {stats['total_tokens']}\")\n\n# Reset statistics\nnanoaccel.reset_stats()\n```\n\n## 📊 Performance\n\nNanoAccel is optimized for low-end hardware with the following target performance:\n\n- **Memory Usage**: 2-4x reduction with quantization\n- **Speed**: 2-3x improvement with speculative decoding\n- **Compatibility**: Works on CPUs without AVX2 (with reduced performance)\n- **Minimum Requirements**: 8GB RAM, 2 CPU cores\n\n### Benchmark Results (TinyLlama-1.1B)\n| Configuration | Memory Usage | Tokens/sec | Speedup |\n|---------------|--------------|------------|---------|\n| Baseline (FP32) | 4.4 GB | 12.3 | 1.0x |\n| INT8 Quantized | 2.2 GB | 15.7 | 1.3x |\n| INT4 Quantized | 1.1 GB | 18.2 | 1.5x |\n| Speculative (INT4) | 1.1 GB | 28.5 | 2.3x |\n\n*Results on Intel i5-8400 (6 cores, 16GB RAM)*\n\n## 🧪 Testing\n\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=nanoaccel --cov-report=html\n\n# Run specific test categories\npytest -m \"not slow\"  # Skip slow tests\npytest -m integration  # Only integration tests\n```\n\n## 🤝 Contributing\n\nContributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details.\n\n### Development Setup\n```bash\n# Clone the repository\ngit clone https://github.com/ProCoder1199X/NanoAccel.git\ncd NanoAccel\n\n# Install development dependencies\npip install -e \".[dev]\"\n\n# Install pre-commit hooks\npre-commit install\n\n# Run tests\npytest\n```\n\n## 📝 License\n\nThis project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.\n\n## 🙏 Acknowledgments\n\n- Hugging Face for the Transformers library\n- BitsAndBytes for quantization support\n- The open-source AI community for research and inspiration\n\n## 📚 Documentation\n\nFor detailed documentation, examples, and API reference, visit our [documentation](https://github.com/ProCoder1199X/NanoAccel#readme).\n\n## 🐛 Issues and Support\n\n- [Report bugs](https://github.com/ProCoder1199X/NanoAccel/issues)\n- [Request features](https://github.com/ProCoder1199X/NanoAccel/issues)\n- [Join discussions](https://github.com/ProCoder1199X/NanoAccel/discussions)\n\n---\n\n**Note**: This is an early-stage project. Performance may vary based on hardware configuration and model choice. Contributions and feedback are highly appreciated!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprocoder1199x%2Fnanoaccel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprocoder1199x%2Fnanoaccel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprocoder1199x%2Fnanoaccel/lists"}