https://github.com/laphilosophia/strime
Streaming projection engine — extract fields at multi-gigabit speeds with O(1) memory
https://github.com/laphilosophia/strime
data-extraction data-processing fsm high-performance json json-parser json-streaming ndjson parser projection streaming typescript zero-copy
Last synced: 4 months ago
JSON representation
Streaming projection engine — extract fields at multi-gigabit speeds with O(1) memory
- Host: GitHub
- URL: https://github.com/laphilosophia/strime
- Owner: laphilosophia
- License: apache-2.0
- Created: 2026-01-05T10:44:49.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-02-11T14:44:34.000Z (4 months ago)
- Last Synced: 2026-02-11T20:49:35.608Z (4 months ago)
- Topics: data-extraction, data-processing, fsm, high-performance, json, json-parser, json-streaming, ndjson, parser, projection, streaming, typescript, zero-copy
- Language: TypeScript
- Homepage:
- Size: 265 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# Strime
A streaming JSON projection engine. Selects and extracts fields from JSON without parsing the entire structure.
[](https://www.npmjs.com/package/@laphilosophia/strime)
[](LICENSE)
[](src/__tests__)
---
## Why Strime
Most JSON parsers convert the entire input to objects before you can query it. Strime inverts this: it filters during traversal, touching only the bytes you need. The result is constant memory usage regardless of file size—a 10GB file uses the same baseline memory as a 10KB file.
This isn't a marginal improvement. On high-volume workloads, Strime processes JSON at throughputs typically reserved for native implementations.
---
## Performance
| Metric | Result | Guarantee |
| ------------------------ | -------------- | -------------- |
| Throughput (1GB) | 809-939 Mbps | O(N) time |
| Skip-heavy (chunked) | **4,408 Mbps** | 6.5x faster |
| Memory | Constant | O(D) space |
| 1M NDJSON rows | ~4.2s | 220k+ rows/s |
| Deep nesting (1k levels) | < 1ms | Stack-safe |
| Large strings (50MB) | 1.6 Gbps | No degradation |
_Validated on 1GB+ files. See [Performance Contract](docs/performance.md) for methodology._
---
## Install
```bash
npm install @laphilosophia/strime
```
---
## Use
### Query
```typescript
import { query } from '@laphilosophia/strime'
const result = await query(data, '{ id, user { name, email } }')
```
### Stream NDJSON
```typescript
import { ndjsonStream } from '@laphilosophia/strime'
for await (const row of ndjsonStream(stream, '{ id, name }')) {
console.log(row)
}
```
### Fault-Tolerant
```typescript
for await (const row of ndjsonStream(stream, '{ id, name }', {
skipErrors: true,
onError: (info) => console.error(`Line ${info.lineNumber}: ${info.error.message}`),
})) {
console.log(row)
}
```
### Real-Time Subscribe
```typescript
import { subscribe } from '@laphilosophia/strime'
subscribe(telemetryStream, '{ lat, lon }', {
onMatch: (data) => console.log(data),
onComplete: () => console.log('Done'),
})
```
### CLI
```bash
strime data.json "{ name, meta { type } }"
cat massive.json | strime "{ actor { login } }"
tail -f telemetry.log | strime --ndjson "{ lat, lon }"
```
---
## Features
**Core**
- Zero-allocation tokenizer with object recycling
- String interning for repeated keys
- Integer fast-path parsing
- Binary line splitting for NDJSON
**API**
- Streaming and pull modes
- Parallel NDJSON processing (`ndjsonParallel`)
- Raw byte emission for zero-copy pipelines
- Compression sink (gzip/brotli)
- Budget enforcement for DoS protection
**Error Handling**
- Typed errors with position tracking
- Fault-tolerant NDJSON streaming
- Controlled termination via AbortSignal
---
## Advanced
### Chunked Execution (Large Files)
For maximum throughput on large single-buffer inputs:
```typescript
import { Engine } from '@laphilosophia/strime'
import { StrimeParser } from '@laphilosophia/strime'
import { readFileSync } from 'fs'
const buffer = readFileSync('large-file.json')
const schema = new StrimeParser('{ id, name }').parse()
const engine = new Engine(schema)
// 6.5x faster on skip-heavy workloads
const result = engine.executeChunked(buffer) // 64KB chunks (default)
const result = engine.executeChunked(buffer, 32768) // Custom 32KB chunks
```
### Low-Level Tokenizer
```typescript
import { Tokenizer } from '@laphilosophia/strime'
const tokenizer = new Tokenizer()
const buffer = new TextEncoder().encode('{"key": "value"}')
// Zero-allocation callback mode
tokenizer.processChunk(buffer, (token) => {
console.log(token.type, token.value)
})
// Iterator mode
for (const token of tokenizer.tokenize(buffer)) {
console.log(token.type, token.value)
}
```
### Telemetry
```typescript
await query(stream, '{ id, name }', {
sink: {
onStats: (stats) => {
console.log(`Throughput: ${stats.throughputMbps.toFixed(2)} Mbps`)
console.log(`Skip ratio: ${(stats.skipRatio * 100).toFixed(1)}%`)
},
},
})
```
### Raw Emission
```typescript
await query(stream, '{ items { id } }', {
emitMode: 'raw',
sink: {
onRawMatch: (bytes) => outputStream.write(bytes),
},
})
```
---
## Documentation
- [Quick Start](docs/quick-start.md) — Getting started
- [Deployment Matrix](docs/howto-deployment-matrix.md) — Decision guide for API entry points
- [Real World Use Cases](docs/howto-use-cases.md) — Practical projection scenarios
- [Query Language](docs/query-language.md) — Syntax reference
- [API Reference](docs/api-reference.md) — Complete API
- [CLI Guide](docs/cli-guide.md) — Command-line usage
- [Performance](docs/performance.md) — Benchmarks and guarantees
- [Internals](docs/internals.md) — How it works
---
## Licensing
Strime is released under the **Apache 2.0**.
---
## Contact
Erdem Arslan