https://github.com/takara-ai/go-attention
A full attention mechanism and transformer in pure go.
https://github.com/takara-ai/go-attention
artificial-intelligence golang machine-learning serverless
Last synced: 3 months ago
JSON representation
A full attention mechanism and transformer in pure go.
- Host: GitHub
- URL: https://github.com/takara-ai/go-attention
- Owner: takara-ai
- License: mit
- Created: 2025-02-10T15:46:35.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-18T09:43:19.000Z (11 months ago)
- Last Synced: 2025-12-17T17:48:30.054Z (6 months ago)
- Topics: artificial-intelligence, golang, machine-learning, serverless
- Language: Go
- Homepage:
- Size: 561 KB
- Stars: 451
- Watchers: 5
- Forks: 15
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# go-attention

From the Frontier Research Team at takara.ai we present the first pure Go implementation of attention mechanisms and transformer layers, designed for high performance, consistency, and production reliability.
## Why Go for Attention Mechanisms?
### **Performance Without Compromise**
This implementation proves that Go can deliver **production-grade performance** for AI workloads:
- **Consistent, Predictable Performance**: Single optimized code path ensures repeatable results across all input sizes
- **Edge-Optimized**: Zero external dependencies and minimal memory footprint perfect for edge devices
- **Production-Ready**: Comprehensive error handling, type safety, and deterministic behavior
- **Scalable**: Efficient batched operations support high-throughput cloud deployments
### **Addressing Go's AI Limitations**
We've solved the common concerns about Go in AI/ML:
- **SIMD Support**: Assembly-optimized critical paths with automatic fallbacks
- **Memory Efficiency**: Object pools and optimized allocations reduce GC pressure
- **Parallel Performance**: Goroutine-based parallelization for multi-core systems
- **Numerical Stability**: Robust floating-point operations with proper error handling
### **Real-World Benefits**
- **Zero Cold Starts**: Pure Go implementation eliminates dependency resolution delays
- **Predictable Latency**: Consistent performance characteristics across all hardware
- **Easy Deployment**: Single binary with no external dependencies
- **Cost Effective**: Efficient resource usage reduces cloud costs
## Quick Start
Run our comprehensive examples:
```bash
# Get the module
go get github.com/takara-ai/go-attention
# Run the examples
go run api_examples.go
```
## Performance Characteristics
### **Consistent Performance Across All Sizes**
Our implementation uses a single, highly optimized code path that delivers predictable performance:
```go
// Always fast, always consistent - no unpredictable performance cliffs
result, err := attention.DotProduct(v1, v2)
```
**Performance Results:**
- **Small vectors (64-256)**: ~30-100ns per operation
- **Medium vectors (512-1024)**: ~400-1700ns per operation
- **Large vectors (2048+)**: ~1600ns+ per operation
- **Consistent across all hardware**: Same performance characteristics on any Go-compatible platform
### **Production-Grade Reliability**
- **Deterministic Results**: Same input always produces same output
- **Memory Safe**: No buffer overflows or memory corruption
- **Error Handling**: Comprehensive validation and error reporting
- **Type Safe**: Compile-time guarantees prevent runtime errors
## API Documentation
For complete API documentation, see [API.md](API.md).
### Core Types
```go
type Vector []float64 // Represents a 1D vector of float64 values
type Matrix []Vector // Represents a 2D matrix of float64 values
```
### Quick Examples
#### 1. Basic Dot-Product Attention
The simplest form of attention mechanism. Useful for basic sequence processing tasks.
```go
import "github.com/takara-ai/go-attention/attention"
// Create query-key-value setup
query := attention.Vector{1.0, 0.0, 1.0, 0.0} // Pattern to search for
keys := attention.Matrix{
{1.0, 0.0, 1.0, 0.0}, // Similar to query
{0.0, 1.0, 0.0, 1.0}, // Different from query
{0.5, 0.5, 0.5, 0.5}, // Neutral pattern
}
values := attention.Matrix{
{1.0, 2.0}, // Value for similar key
{3.0, 4.0}, // Value for different key
{5.0, 6.0}, // Value for neutral key
}
// Compute attention
output, weights, err := attention.DotProductAttention(query, keys, values)
if err != nil {
log.Fatal(err)
}
// Output will be a weighted combination of values based on query-key similarity
// Weights will show how much attention each key received
```
#### 2. Multi-Head Attention
More sophisticated attention mechanism that can capture different types of relationships in parallel.
```go
import "github.com/takara-ai/go-attention/attention"
// Configure multi-head attention
config := attention.MultiHeadConfig{
NumHeads: 4, // Number of parallel attention heads
DModel: 64, // Size of input/output embeddings
DKey: 16, // Size per head (DModel/NumHeads)
DValue: 16, // Size per head (DModel/NumHeads)
DropoutRate: 0.1, // For regularization
}
// Create the attention module
mha, err := attention.NewMultiHeadAttention(config)
if err != nil {
log.Fatal(err)
}
// Process sequences (batched input)
batchSize, seqLen := 2, 3 // Process 2 sequences, each with 3 tokens
// Create input matrices [batchSize × seqLen × DModel]
queries := make(attention.Matrix, batchSize*seqLen)
keys := make(attention.Matrix, batchSize*seqLen)
values := make(attention.Matrix, batchSize*seqLen)
// Initialize your matrices with actual data...
// Process through multi-head attention
output, err := mha.Forward(queries, keys, values)
if err != nil {
log.Fatal(err)
}
```
#### 3. Full Transformer Layer
Complete transformer layer with self-attention and feed-forward network.
```go
import (
"github.com/takara-ai/go-attention/transformer"
"github.com/takara-ai/go-attention/attention"
)
// Configure transformer layer
config := transformer.TransformerConfig{
DModel: 64, // Size of token embeddings
NumHeads: 4, // Number of attention heads
DHidden: 256, // Size of feed-forward hidden layer
DropoutRate: 0.1, // For regularization
}
// Create transformer layer
layer, err := transformer.NewTransformerLayer(config)
if err != nil {
log.Fatal(err)
}
// Create input sequence [seq_len × d_model]
seqLen := 3
input := make(attention.Matrix, seqLen)
for i := range input {
input[i] = make(attention.Vector, config.DModel)
// Fill with your embedding data...
}
// Process through transformer
output, err := layer.Forward(input)
if err != nil {
log.Fatal(err)
}
```
## Example Output
When running the examples, you'll see:
1. **Dot-Product Attention**:
```
Query: [1 0 1 0]
Attention Weights: [0.523 0.174 0.302] // Shows focus on similar patterns
Output: [2.558 3.558] // Weighted combination of values
```
2. **Multi-Head Attention**:
```
Input dimensions: [2 batches × 3 tokens × 64 features]
Output shape: [6×64]
```
3. **Transformer Layer**:
```
Input shape: [3×64]
Output shape: [3×64]
```
## Common Use Cases
1. **Text Processing**:
- Sequence-to-sequence translation
- Document summarization
- Sentiment analysis
2. **Time Series**:
- Financial forecasting
- Sensor data analysis
- Anomaly detection
3. **Structured Data**:
- Graph node embedding
- Feature interaction modeling
- Recommendation systems
## Performance Features
### **Built-in Optimizations**
- **Loop Unrolling**: 8x unrolled dot product for maximum throughput
- **Memory Pooling**: Object pools reduce allocation overhead in hot paths
- **Cache-Friendly**: Optimized memory access patterns
- **Zero Dependencies**: Pure Go implementation with no external requirements
### **Production Monitoring**
```go
// Enable performance monitoring and auto-tuning
config := attention.DefaultPerformanceConfig()
config.EnableMonitoring = true
config.EnableAutoTuning = true
attention.SetPerformanceConfig(config)
// Use memory pools to reduce allocations
v1 := attention.GetVectorFromPool(size)
defer attention.PutVectorToPool(v1)
// Get performance statistics
stats := attention.GetAllPerformanceStats()
```
### **Parallel Operations**
```go
// Configure parallel processing
config := attention.DefaultParallelConfig()
config.NumWorkers = runtime.NumCPU()
```
## Why This Go Implementation?
### **Edge Computing Excellence**
- **Zero Dependencies**: Perfect for edge devices where dependency management is crucial
- **Predictable Performance**: Consistent latency regardless of input size
- **Memory Efficient**: Minimal allocations and GC pressure
- **Easy Deployment**: Single binary deployment
### **Production System Benefits**
- **Type Safety**: Compile-time guarantees prevent runtime errors
- **Error Handling**: Comprehensive validation and error reporting
- **Deterministic**: Same input always produces same output
- **Scalable**: Efficient batched operations for high throughput
### **Real-time Processing**
- **Consistent Latency**: No unpredictable performance cliffs
- **Low Memory Footprint**: Efficient resource usage
- **Fast Startup**: No dependency resolution delays
- **Reliable**: Robust error handling and recovery
## Features
- **Efficient Dot-Product Attention**: Upgraded with Scalable-Softmax (SSMax, s=1) for improved long-context performance
- **Multi-Head Attention**: Parallel attention heads for capturing different relationships
- **Full Transformer Layer**: Complete implementation with:
- Layer normalization
- Position-wise feed-forward networks
- Residual connections
- **Batched Operations**: Efficient processing of multiple sequences
- **Production Monitoring**: Built-in performance tracking and optimization
## Performance Benchmarks
Run comprehensive performance benchmarks:
```bash
go test -bench=. ./attention
```
This will benchmark all operations and show performance characteristics across different input sizes and hardware configurations.
**Sample Results:**
```
BenchmarkDotProduct-8 154059886 7.747 ns/op
BenchmarkDotProductAttention-8 6989079 170.7 ns/op
BenchmarkMultiHeadAttentionForward-8 6178 192052 ns/op
```
## Roadmap
Future improvements may include:
- **Real SIMD Assembly**: Actual AVX2/NEON implementations for critical paths
- **Positional Encoding**: RoPE and other positional encoding methods
- **Advanced Optimizations**: Flash Attention, Sparse Attention variants
- **Training Support**: Gradient computation and optimization utilities
- **Model Export**: ONNX and other format support
- **GPU Acceleration**: CUDA/OpenCL backends for GPU computation
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT License - see LICENSE file for details
---
For research inquiries and press, please reach out to research@takara.ai
> 人類を変革する