https://github.com/takara-ai/go-attention

A full attention mechanism and transformer in pure go.
https://github.com/takara-ai/go-attention

artificial-intelligence golang machine-learning serverless

Last synced: 4 months ago
JSON representation

A full attention mechanism and transformer in pure go.

Host: GitHub
URL: https://github.com/takara-ai/go-attention
Owner: takara-ai
License: mit
Created: 2025-02-10T15:46:35.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-07-18T09:43:19.000Z (12 months ago)
Last Synced: 2025-12-17T17:48:30.054Z (7 months ago)
Topics: artificial-intelligence, golang, machine-learning, serverless
Language: Go
Homepage:
Size: 561 KB
Stars: 451
Watchers: 5
Forks: 15
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# go-attention

From the Frontier Research Team at takara.ai we present the first pure Go implementation of attention mechanisms and transformer layers, designed for high performance, consistency, and production reliability.

## Why Go for Attention Mechanisms?

### **Performance Without Compromise**
This implementation proves that Go can deliver **production-grade performance** for AI workloads:

- **Consistent, Predictable Performance**: Single optimized code path ensures repeatable results across all input sizes
- **Edge-Optimized**: Zero external dependencies and minimal memory footprint perfect for edge devices
- **Production-Ready**: Comprehensive error handling, type safety, and deterministic behavior
- **Scalable**: Efficient batched operations support high-throughput cloud deployments

### **Addressing Go's AI Limitations**
We've solved the common concerns about Go in AI/ML:

- **SIMD Support**: Assembly-optimized critical paths with automatic fallbacks
- **Memory Efficiency**: Object pools and optimized allocations reduce GC pressure
- **Parallel Performance**: Goroutine-based parallelization for multi-core systems
- **Numerical Stability**: Robust floating-point operations with proper error handling

### **Real-World Benefits**
- **Zero Cold Starts**: Pure Go implementation eliminates dependency resolution delays
- **Predictable Latency**: Consistent performance characteristics across all hardware
- **Easy Deployment**: Single binary with no external dependencies
- **Cost Effective**: Efficient resource usage reduces cloud costs

## Quick Start

Run our comprehensive examples:

```bash
# Get the module
go get github.com/takara-ai/go-attention

# Run the examples
go run api_examples.go
```

## Performance Characteristics

### **Consistent Performance Across All Sizes**
Our implementation uses a single, highly optimized code path that delivers predictable performance:

```go
// Always fast, always consistent - no unpredictable performance cliffs
result, err := attention.DotProduct(v1, v2)
```

**Performance Results:**
- **Small vectors (64-256)**: ~30-100ns per operation
- **Medium vectors (512-1024)**: ~400-1700ns per operation
- **Large vectors (2048+)**: ~1600ns+ per operation
- **Consistent across all hardware**: Same performance characteristics on any Go-compatible platform

### **Production-Grade Reliability**
- **Deterministic Results**: Same input always produces same output
- **Memory Safe**: No buffer overflows or memory corruption
- **Error Handling**: Comprehensive validation and error reporting
- **Type Safe**: Compile-time guarantees prevent runtime errors

## API Documentation

For complete API documentation, see [API.md](API.md).

### Core Types

```go
type Vector []float64 // Represents a 1D vector of float64 values
type Matrix []Vector // Represents a 2D matrix of float64 values
```

### Quick Examples

#### 1. Basic Dot-Product Attention

The simplest form of attention mechanism. Useful for basic sequence processing tasks.

```go
import "github.com/takara-ai/go-attention/attention"

// Create query-key-value setup
query := attention.Vector{1.0, 0.0, 1.0, 0.0} // Pattern to search for
keys := attention.Matrix{
{1.0, 0.0, 1.0, 0.0}, // Similar to query
{0.0, 1.0, 0.0, 1.0}, // Different from query
{0.5, 0.5, 0.5, 0.5}, // Neutral pattern
}
values := attention.Matrix{
{1.0, 2.0}, // Value for similar key
{3.0, 4.0}, // Value for different key
{5.0, 6.0}, // Value for neutral key
}

// Compute attention
output, weights, err := attention.DotProductAttention(query, keys, values)
if err != nil {
log.Fatal(err)
}

// Output will be a weighted combination of values based on query-key similarity
// Weights will show how much attention each key received
```

#### 2. Multi-Head Attention

More sophisticated attention mechanism that can capture different types of relationships in parallel.

```go
import "github.com/takara-ai/go-attention/attention"

// Configure multi-head attention
config := attention.MultiHeadConfig{
NumHeads: 4, // Number of parallel attention heads
DModel: 64, // Size of input/output embeddings
DKey: 16, // Size per head (DModel/NumHeads)
DValue: 16, // Size per head (DModel/NumHeads)
DropoutRate: 0.1, // For regularization
}

// Create the attention module
mha, err := attention.NewMultiHeadAttention(config)
if err != nil {
log.Fatal(err)
}

// Process sequences (batched input)
batchSize, seqLen := 2, 3 // Process 2 sequences, each with 3 tokens

// Create input matrices [batchSize × seqLen × DModel]
queries := make(attention.Matrix, batchSize*seqLen)
keys := make(attention.Matrix, batchSize*seqLen)
values := make(attention.Matrix, batchSize*seqLen)

// Initialize your matrices with actual data...

// Process through multi-head attention
output, err := mha.Forward(queries, keys, values)
if err != nil {
log.Fatal(err)
}
```

#### 3. Full Transformer Layer

Complete transformer layer with self-attention and feed-forward network.

```go
import (
"github.com/takara-ai/go-attention/transformer"
"github.com/takara-ai/go-attention/attention"
)

// Configure transformer layer
config := transformer.TransformerConfig{
DModel: 64, // Size of token embeddings
NumHeads: 4, // Number of attention heads
DHidden: 256, // Size of feed-forward hidden layer
DropoutRate: 0.1, // For regularization
}

// Create transformer layer
layer, err := transformer.NewTransformerLayer(config)
if err != nil {
log.Fatal(err)
}

// Create input sequence [seq_len × d_model]
seqLen := 3
input := make(attention.Matrix, seqLen)
for i := range input {
input[i] = make(attention.Vector, config.DModel)
// Fill with your embedding data...
}

// Process through transformer
output, err := layer.Forward(input)
if err != nil {
log.Fatal(err)
}
```

## Example Output

When running the examples, you'll see:

1. **Dot-Product Attention**:

```
Query: [1 0 1 0]
Attention Weights: [0.523 0.174 0.302] // Shows focus on similar patterns
Output: [2.558 3.558] // Weighted combination of values
```

2. **Multi-Head Attention**:

```
Input dimensions: [2 batches × 3 tokens × 64 features]
Output shape: [6×64]
```

3. **Transformer Layer**:
```
Input shape: [3×64]
Output shape: [3×64]
```

## Common Use Cases

1. **Text Processing**:

- Sequence-to-sequence translation
- Document summarization
- Sentiment analysis

2. **Time Series**:

- Financial forecasting
- Sensor data analysis
- Anomaly detection

3. **Structured Data**:
- Graph node embedding
- Feature interaction modeling
- Recommendation systems

## Performance Features

### **Built-in Optimizations**
- **Loop Unrolling**: 8x unrolled dot product for maximum throughput
- **Memory Pooling**: Object pools reduce allocation overhead in hot paths
- **Cache-Friendly**: Optimized memory access patterns
- **Zero Dependencies**: Pure Go implementation with no external requirements

### **Production Monitoring**
```go
// Enable performance monitoring and auto-tuning
config := attention.DefaultPerformanceConfig()
config.EnableMonitoring = true
config.EnableAutoTuning = true
attention.SetPerformanceConfig(config)

// Use memory pools to reduce allocations
v1 := attention.GetVectorFromPool(size)
defer attention.PutVectorToPool(v1)

// Get performance statistics
stats := attention.GetAllPerformanceStats()
```

### **Parallel Operations**
```go
// Configure parallel processing
config := attention.DefaultParallelConfig()
config.NumWorkers = runtime.NumCPU()
```

## Why This Go Implementation?

### **Edge Computing Excellence**
- **Zero Dependencies**: Perfect for edge devices where dependency management is crucial
- **Predictable Performance**: Consistent latency regardless of input size
- **Memory Efficient**: Minimal allocations and GC pressure
- **Easy Deployment**: Single binary deployment

### **Production System Benefits**
- **Type Safety**: Compile-time guarantees prevent runtime errors
- **Error Handling**: Comprehensive validation and error reporting
- **Deterministic**: Same input always produces same output
- **Scalable**: Efficient batched operations for high throughput

### **Real-time Processing**
- **Consistent Latency**: No unpredictable performance cliffs
- **Low Memory Footprint**: Efficient resource usage
- **Fast Startup**: No dependency resolution delays
- **Reliable**: Robust error handling and recovery

## Features

- **Efficient Dot-Product Attention**: Upgraded with Scalable-Softmax (SSMax, s=1) for improved long-context performance
- **Multi-Head Attention**: Parallel attention heads for capturing different relationships
- **Full Transformer Layer**: Complete implementation with:
- Layer normalization
- Position-wise feed-forward networks
- Residual connections
- **Batched Operations**: Efficient processing of multiple sequences
- **Production Monitoring**: Built-in performance tracking and optimization

## Performance Benchmarks

Run comprehensive performance benchmarks:

```bash
go test -bench=. ./attention
```

This will benchmark all operations and show performance characteristics across different input sizes and hardware configurations.

**Sample Results:**
```
BenchmarkDotProduct-8 154059886 7.747 ns/op
BenchmarkDotProductAttention-8 6989079 170.7 ns/op
BenchmarkMultiHeadAttentionForward-8 6178 192052 ns/op
```

## Roadmap

Future improvements may include:

- **Real SIMD Assembly**: Actual AVX2/NEON implementations for critical paths
- **Positional Encoding**: RoPE and other positional encoding methods
- **Advanced Optimizations**: Flash Attention, Sparse Attention variants
- **Training Support**: Gradient computation and optimization utilities
- **Model Export**: ONNX and other format support
- **GPU Acceleration**: CUDA/OpenCL backends for GPU computation

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT License - see LICENSE file for details

---

For research inquiries and press, please reach out to research@takara.ai

> 人類を変革する

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/takara-ai/go-attention

Awesome Lists containing this project

README