https://github.com/codeyousef/seenlang
https://github.com/codeyousef/seenlang
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/codeyousef/seenlang
- Owner: codeyousef
- License: other
- Created: 2025-05-09T21:18:10.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-03-06T00:40:50.000Z (4 months ago)
- Last Synced: 2026-03-06T04:59:54.254Z (4 months ago)
- Language: C
- Size: 122 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Seen (س)
A self-hosted systems programming language with multi-language keywords
Quick Start ·
Examples ·
Features ·
Benchmarks ·
IDE Support ·
Contributing
---
Seen is a compiled systems programming language where the compiler is written entirely in Seen itself. It targets LLVM, ships with a built-in LSP, and lets you write code using keywords in English, Arabic, Spanish, Russian, Chinese, or Japanese.
```seen
fun main() {
let names = ["Alice", "Bob", "Charlie"]
for name in names {
println("Hello, {name}!")
}
}
```
The same program in Arabic:
```seen
دالة رئيسية() {
اجعل أسماء = ["أحمد", "سارة", "خالد"]
لكل اسم في أسماء {
اطبع("مرحبا، {اسم}!")
}
}
```
And in Chinese:
```seen
函数 主函数() {
让 名字列表 = ["小明", "小红", "小华"]
对于 名字 在 名字列表 {
打印("你好,{名字}!")
}
}
```
## Why Seen?
**LLVM code generation** -- Seen compiles through LLVM with native codegen,
SIMD controls, target selection, and package artifact linking.
**Self-hosted** -- The compiler is written in Seen and verifies itself through
the Stage 1 -> Stage 2 -> Stage 3 bootstrap flow.
**Incremental compilation** -- Source and IR caches keep rebuilds focused on
changed modules where possible.
**Multi-language keywords** -- Keywords are defined in TOML files under `languages/`. Adding a new language is adding a directory of TOML files -- no compiler changes required.
**Region-based memory** -- No garbage collector. Memory is managed through regions and arenas with compile-time lifetime tracking.
## Quick Start
### Source Build Prerequisites
- **LLVM 18+** (`clang`, `opt`, `llc`, `llvm-as`, `lld`)
- **GCC** (for runtime compilation)
- **Git**
### Build from Source
```bash
git clone https://github.com/codeyousef/SeenLang.git
cd SeenLang
AVAIL_KB=$(awk '/MemAvailable/ {print $2}' /proc/meminfo)
MAIN_KB=$(( AVAIL_KB * 70 / 100 ))
if [ "$MAIN_KB" -gt 14680064 ]; then MAIN_KB=14680064; fi
ulimit -v "$MAIN_KB"
SEEN_LOW_MEMORY=1 SEEN_SKIP_LOW_MEMORY_SHORTCUT=1 \
SEEN_MAIN_VMEM_KB="$MAIN_KB" SEEN_OPT_VMEM_KB=2097152 \
./scripts/safe_rebuild.sh
```
The production compiler lands at `compiler_seen/target/seen`.
### Install
```bash
sudo cp compiler_seen/target/seen /usr/local/bin/seen
```
Or add to your shell profile:
```bash
export PATH="$PATH:/path/to/SeenLang/compiler_seen/target"
```
### Hello World
```bash
echo 'fun main() { println("Hello, Seen!") }' > hello.seen
seen compile hello.seen hello
./hello
```
## Usage
```bash
seen compile source.seen output # Compile to native binary
seen compile source.seen output --fast
seen run source.seen # JIT execution
seen check source.seen # Type check only
seen pkg fetch # Install package dependencies from Seen.toml
seen lsp # Start language server
```
### Compiler Flags
| Flag | Description |
|------|-------------|
| `--fast` | Skip heavy optimizations, use O1 |
| `--emit-llvm` | Dump generated LLVM IR |
| `--backend llvm` | Use the shipped LLVM backend |
| `SEEN_TRACE_LLVM=all` | Trace LLVM IR generation |
| `SEEN_TRACE_LLVM=gep` | Trace selected struct/GEP layout paths |
| `--null-safety` | Enable null safety checks |
| `--warn-uninit` | Warn on uninitialized variables |
| `--stack-check` | Enable stack overflow checks |
## Examples
### Variables and Control Flow
```seen
fun main() {
let name = "Seen"
var count = 0
while count < 5 {
count = count + 1
if count == 3 {
println("Three!")
}
}
println("{name}: counted to {count}")
}
```
### Classes and Methods
```seen
class Vec2 {
var x: Float
var y: Float
static fun new(x: Float, y: Float) r: Vec2 {
return Vec2 { x: x, y: y }
}
fun length() r: Float {
return sqrt(this.x * this.x + this.y * this.y)
}
fun add(other: Vec2) r: Vec2 {
return Vec2.new(this.x + other.x, this.y + other.y)
}
}
fun main() {
let a = Vec2.new(3.0, 4.0)
let b = Vec2.new(1.0, 2.0)
let c = a.add(b)
println("Length: {c.length()}")
}
```
### Enums and Pattern Matching
```seen
enum Shape {
Circle(radius: Float)
Rectangle(width: Float, height: Float)
}
fun area(shape: Shape) r: Float {
return when shape {
is Circle(r) => 3.14159 * r * r
is Rectangle(w, h) => w * h
}
}
```
### Traits
```seen
trait Printable {
fun display() r: String
}
impl Printable for Vec2 {
fun display() r: String {
return "({this.x}, {this.y})"
}
}
```
### Generics
```seen
fun max(a: T, b: T) r: T {
if a > b { return a }
return b
}
class Stack {
var items: Array
fun push(item: T) {
this.items.push(item)
}
fun pop() r: T {
return this.items.pop()
}
}
```
### Async/Await
```seen
@async
fun fetchData(url: String) r: String {
let response = await httpGet(url)
return response.body
}
```
### Closures
```seen
fun apply(arr: Array, f: Fun) r: Array {
var result = Array()
for item in arr {
result.push(f(item))
}
return result
}
fun main() {
let nums = [1, 2, 3, 4, 5]
let doubled = apply(nums, |x| x * 2)
}
```
### SIMD
```seen
fun dot_product(a: Array, b: Array, n: Int) r: Float {
var sum = f32x4(0.0, 0.0, 0.0, 0.0)
var i = 0
while i + 4 <= n {
let va = simd_load_f32x4(a, i)
let vb = simd_load_f32x4(b, i)
sum = sum + va * vb
i = i + 4
}
return reduce_add(sum)
}
```
### GPU Compute (Vulkan)
```seen
@compute(workgroup_size = 64)
fun vector_add(a: Buffer, b: Buffer, out: Buffer) {
let idx = global_invocation_id.x
out[idx] = a[idx] + b[idx]
}
```
### Parallel For
```seen
fun main() {
var results = Array.withLength(1000)
parallel_for i in 0..1000 {
results[i] = i * i
}
}
```
### Compile-Time Evaluation
```seen
comptime fun factorial(n: Int) r: Int {
if n <= 1 { return 1 }
return n * factorial(n - 1)
}
let TABLE_SIZE = comptime { factorial(10) }
```
### Defer and Error Handling
```seen
fun readFile(path: String) r: String {
let file = File.open(path)
defer { file.close() }
try {
return file.readAll()
} catch e {
println("Error: {e}")
return ""
}
}
```
## Language Features
### Type System
- Immutable by default (`let`), opt-in mutability (`var`)
- Nullable types (`T?`) with safe access (`?.`) and null coalescing (`??`)
- Generics with constraints (``)
- Type aliases and distinct types
- `Result` and `Option` types
### Data Structures
- Classes with methods, inheritance, and traits
- Enums (simple and data-carrying)
- Structs (value types)
- `Array`, `Vec`, `HashMap`, `BTreeMap`, `LinkedList`, `SmallVec`
### Memory Management
- Region-based memory (no GC)
- `move`, `borrow`, `ref` semantics
- `defer` for cleanup
- `arena` allocators
- `@packed`, `@cache_line` layout control
### Concurrency
- `async`/`await` with LLVM coroutines
- `parallel_for` with fork-based parallelism
- `Mutex`, `RwLock`, `Barrier`, `Channel`, `AtomicInt`
- `@send`/`@sync` markers for thread safety
### Metaprogramming
- `comptime` evaluation
- Decorators: `@derive(Clone, Hash, Eq, Debug, Serialize, Deserialize, Json)`
- `@reflect` for runtime type information
- `@intrinsic` for LLVM intrinsic mapping
### GPU
- `@compute`, `@vertex`, `@fragment` shader annotations
- `Buffer`, `Uniform`, `Image` types
- GLSL codegen with Vulkan runtime
- `--emit-glsl` to inspect generated shaders
### SIMD
- Vector types: `i8x16`, `i16x8`, `i32x4`, `i64x2`, `f32x4`, `f64x2`
- Arithmetic, comparison, shuffle, swizzle
- Horizontal reductions (`reduce_add`, `reduce_min`, `reduce_max`)
- Aligned load/store, gather/scatter
### Interop
- `extern fun` for C FFI
- `@cImport` for C header inclusion
- `@repr(C)` for C-compatible struct layout
### Operators
- Word operators: `and`, `or`, `not` (alongside `&&`, `||`, `!`)
- String interpolation: `"Hello, {name}!"`
- Range: `0..n`, `0..=n`
- Pipe-style chaining
## Benchmarks
17 production benchmarks in `benchmarks/production/`:
| Benchmark | Description |
|-----------|-------------|
| `01_matrix_mult` | Dense matrix multiplication |
| `02_sieve` | Sieve of Eratosthenes |
| `03_binary_trees` | GC-stress binary tree allocation |
| `04_fasta` | FASTA sequence generation |
| `05_nbody` | N-body planetary simulation |
| `06_revcomp` | Reverse-complement DNA |
| `07_mandelbrot` | Mandelbrot set rendering |
| `08_lru_cache` | LRU cache with hash map |
| `09_json_serialize` | JSON serialization |
| `11_spectral_norm` | Spectral norm computation |
| `12_fannkuch` | Fannkuch-redux permutations |
| `13_great_circle` | Great-circle distance |
| `14_hyperbolic_pde` | Hyperbolic PDE solver |
| `15_dft_spectrum` | Discrete Fourier transform |
| `16_euler_totient` | Euler's totient function |
| `17_fibonacci` | Recursive Fibonacci |
Run benchmarks:
```bash
./scripts/run_production_benchmarks.sh
```
Comparison benchmarks against C, C++, Rust, and Zig are in `benchmarks/comparison/`.
## Multi-Language Support
Seen's keywords are defined externally in TOML files. Six languages ship with the compiler:
| Language | Directory | Example keyword for `fun` |
|----------|-----------|---------------------------|
| English | `languages/en/` | `fun` |
| Arabic | `languages/ar/` | `دالة` |
| Spanish | `languages/es/` | `fun` |
| Russian | `languages/ru/` | `функция` |
| Chinese | `languages/zh/` | `函数` |
| Japanese | `languages/ja/` | `関数` |
Each language has 17 TOML files covering keywords, operators, and standard library names.
### Adding a New Language
1. Create `languages/xx/` (where `xx` is the language code)
2. Copy the English TOML files as templates
3. Translate keyword values
4. The compiler auto-detects available languages
No compiler rebuild required.
## IDE Support
### Visual Studio Code
The `vscode-seen/` directory contains a full-featured extension:
- Syntax highlighting with TextMate grammar
- IntelliSense via built-in LSP
- Real-time error diagnostics
- Code formatting, debugging, REPL
- Snippets for common patterns
- Multi-language keyword support
```bash
cd vscode-seen
npm install
npm run package
code --install-extension seen-*.vsix
```
### Any Editor (LSP)
Seen includes a built-in language server:
```bash
seen lsp
```
**Neovim:**
```lua
require'lspconfig'.seen.setup{
cmd = {"seen", "lsp"},
filetypes = {"seen"},
root_dir = require'lspconfig.util'.root_pattern("Seen.toml", ".git"),
}
```
**Emacs:**
```elisp
(lsp-register-client
(make-lsp-client :new-connection (lsp-stdio-connection '("seen" "lsp"))
:major-modes '(seen-mode)
:server-id 'seen-lsp))
```
## Project Structure
```
SeenLang/
├── compiler_seen/ # Self-hosted compiler
│ └── src/
│ ├── main_compiler.seen # Shipped compiler CLI/bootstrap driver
│ ├── main.seen # Higher-level CLI wrapper source
│ ├── lexer/ # Tokenizer with multi-language support
│ ├── parser/ # Recursive descent parser
│ ├── typechecker/ # Type inference and checking
│ ├── codegen/ # LLVM IR generation and backend helpers
│ ├── bootstrap/ # Frontend orchestration
│ └── lsp/ # Language server implementation
├── bootstrap/ # Frozen bootstrap compiler
│ └── stage1_frozen # Verified binary (SHA-256 checked)
├── seen_std/ # Standard library (Seen)
├── seen_runtime/ # C runtime (memory, I/O, collections)
├── languages/ # Keyword definitions (6 languages, 102 TOML files)
├── vscode-seen/ # VS Code extension
├── tests/ # Test suites
│ └── e2e_multilang/ # multilingual end-to-end tests
├── benchmarks/ # 17 production benchmarks + comparison suite
├── scripts/ # Build, test, and IR validation tools
├── installer/ # Platform installers (Linux, macOS, Windows)
└── docs/ # Design documents and specifications
```
## Compiler Architecture
The compiler follows a staged pipeline:
```
Source (.seen)
→ Lexer (tokenize with language-specific keywords)
→ Parser (recursive descent → AST)
→ Type Checker (inference, validation, smart casts)
→ IR Generator (AST → LLVM IR, three-pass: signatures → types → bodies)
→ LLVM/toolchain backend
→ Native Binary
```
Key architectural decisions:
- **Multi-module codegen**: LLVM IR generation is split across focused state and driver modules.
- **Content-addressed caches**: source and IR caches avoid recompiling unchanged work where possible.
- **Three-pass IR generation**: declarations, type/layout preparation, then bodies.
- **IR validation gates**: `scripts/seen_prebuild_gates.sh`, `scripts/seen_ir_verify.sh`, and `seen_ir_lint` catch known late-stage failures early.
## Development
### Bootstrap-Verified Builds
The compiler compiles itself. After any change to `compiler_seen/src/`, verify bootstrap:
```bash
AVAIL_KB=$(awk '/MemAvailable/ {print $2}' /proc/meminfo)
MAIN_KB=$(( AVAIL_KB * 70 / 100 ))
if [ "$MAIN_KB" -gt 14680064 ]; then MAIN_KB=14680064; fi
ulimit -v "$MAIN_KB"
SEEN_LOW_MEMORY=1 SEEN_SKIP_LOW_MEMORY_SHORTCUT=1 \
SEEN_MAIN_VMEM_KB="$MAIN_KB" SEEN_OPT_VMEM_KB=2097152 \
./scripts/safe_rebuild.sh
```
This builds stage 2 from the frozen bootstrap, then stage 3 from stage 2. If
the staged checks match, the fixed-point is confirmed.
### Running Tests
```bash
# End-to-end language tests
bash tests/e2e_multilang/run_all_e2e.sh
# IR validation on generated modules
./scripts/seen_ir_verify.sh /tmp/seen_module_*.ll
```
### Debugging the Compiler
```bash
# Type checker tracing
SEEN_DEBUG_TYPES=1 seen compile program.seen program
# LLVM IR generation tracing
SEEN_TRACE_LLVM=all seen compile program.seen program
# Struct layout debugging
SEEN_TRACE_LLVM=gep seen compile program.seen program
```
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make changes
4. Run tests: `bash tests/e2e_multilang/run_all_e2e.sh`
5. Verify bootstrap with explicit memory caps as described above.
6. Submit a pull request
## License
MIT License. See [LICENSE](LICENSE) for details.
---