https://github.com/agora-lab-ai/neocore
NeoCoreβ’ - Next Generation CPU-Native Transformer.
https://github.com/agora-lab-ai/neocore
ai cpu-native gpt1000 gpt5 gpts ml noam-gpt transformer transformers
Last synced: 7 months ago
JSON representation
NeoCoreβ’ - Next Generation CPU-Native Transformer.
- Host: GitHub
- URL: https://github.com/agora-lab-ai/neocore
- Owner: Agora-Lab-AI
- License: mit
- Created: 2024-11-13T13:40:47.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-11-13T13:44:52.000Z (7 months ago)
- Last Synced: 2024-11-13T14:34:49.332Z (7 months ago)
- Topics: ai, cpu-native, gpt1000, gpt5, gpts, ml, noam-gpt, transformer, transformers
- Language: Python
- Homepage: https://swarms.xyz/
- Size: 0 Bytes
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# NeoCoreβ’ - Next Generation CPU-Native Transformer
[](https://discord.gg/agora-999382051935506503) [](https://www.youtube.com/@kyegomez3242) [](https://www.linkedin.com/in/kye-g-38759a207/) [](https://x.com/kyegomezb)
[](https://github.com/The-Swarm-Corporation/Legal-Swarm-Template)
[](https://github.com/kyegomez/swarms)[](https://badge.fury.io/py/neocore)
[](https://opensource.org/licenses/Apache-2.0)
[](https://www.python.org/downloads/)## π Overview
NeoCore is a state-of-the-art, CPU-optimized transformer architecture designed for edge computing and enterprise deployment. By leveraging advanced CPU-specific optimizations and modern architectural improvements, NeoCore achieves exceptional performance without requiring GPU acceleration.
### Key Features
- π **CPU-Native Design**: Optimized from the ground up for modern CPU architectures
- π― **Memory Efficient**: Advanced caching and chunking strategies for optimal memory usage
- π **Enterprise Ready**: Production-grade implementation with comprehensive logging and monitoring
- π **Modern Architecture**: Incorporates Multi-Query Attention, RMSNorm, and Rotary Embeddings
- π **Extensive Benchmarking**: Built-in performance profiling and optimization tools## π§ Installation
```bash
pip install neocore
```## π Architecture
NeoCore introduces several architectural innovations:
### Core Components
1. **Multi-Query Attention (MQA)**
```python
Q: [Batch, Seq, Heads, Head_Dim] # Multiple query heads
K,V: [Batch, 1, Head_Dim] # Single key/value
```2. **RMSNorm for Stabilization**
```python
RMSNorm(x) = x * scale / sqrt(mean(xΒ²) + Ξ΅)
```3. **Block-wise Computation**
```
Input -> Chunked Processing -> Cache-Friendly Operations -> Output
```### Performance Optimizations
#### Memory Access Pattern
```
ββββββββββββββββββββ
β Input Embedding β
ββββββββββ¬ββββββββββ
β
ββββββΌβββββ
β Chunk 1 ββββ
βββββββββββ β
βββββββββββ β
β Chunk 2 ββββΌββΊ Parallel Processing
βββββββββββ β
βββββββββββ β
β Chunk N ββββ
βββββββββββ
```## π« Key Innovations
### 1. Cache-Optimized Linear Operations
- Custom blocked matrix multiplication
- Adaptive chunk sizing
- Operation result caching### 2. Efficient Attention Mechanism
```python
# Traditional vs NeoCore MQA
Traditional: O(N * H * D) memory
NeoCore: O(N * D) memory
```### 3. Advanced Position Encoding
- Rotary embeddings for enhanced position awareness
- Cache-friendly implementation
- Optimized for CPU SIMD operations## π Performance Metrics
| Batch Size | Sequence Length | Processing Time (ms) | Tokens/Second |
|------------|----------------|---------------------|---------------|
| 1 | 32 | 31.17 | 1,026 |
| 4 | 64 | 43.51 | 5,883 |
| 16 | 128 | 161.28 | 12,700 |## π Quick Start
```python
from neocore import NoamConfig, CPUOptimizedNoamTransformer# Initialize configuration
config = NoamConfig(
d_model=512,
n_heads=8,
n_layers=6,
warmup_steps=4000,
chunk_size=32
)# Create model
model = CPUOptimizedNoamTransformer(config)# Process input
output = model(input_ids)
```## π― Use Cases
- **Edge Computing**: Optimal for deployment on CPU-only edge devices
- **Enterprise Systems**: Reliable performance on standard server hardware
- **CI/CD Pipelines**: Efficient inference in production pipelines
- **Privacy-First Applications**: On-device processing without GPU requirements## π¬ Technical Details
### Memory Management
- Intelligent cache management system
- Adaptive chunk sizing based on input
- Memory-efficient attention patterns### Threading Model
```python
Number of Threads = min(CPU_COUNT, MAX_EFFICIENT_THREADS)
Thread Pool Size = Adaptive based on workload
```### Optimization Levels
1. **Level 1**: Basic CPU optimizations
2. **Level 2**: Cache-aware operations
3. **Level 3**: Advanced parallelization
4. **Level 4**: Full SIMD utilization## π Benchmarking
Run comprehensive benchmarks:
```bash
python -m neocore.benchmark --config benchmark_config.yaml
```## π€ Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
## π License
Apache License 2.0. See [LICENSE](LICENSE) for details.
## π Acknowledgments
Built on modern transformer innovations with specific optimizations for CPU architectures. Special thanks to the research community for their groundbreaking work in efficient transformer designs.
---
## Citation
```bibtex
@software{neocore2024,
title={NeoCore: CPU-Optimized Transformer Architecture},
author={Kye Gomez},
year={2024},
publisher={GitHub},
url={https://github.com/neocore/neocore}
}
```