https://github.com/puneethkumarck/prism
Prism โ High-performance real-time Solana transaction indexer built with Java 25, Helidon 4 SE, Virtual Threads, pgjdbc COPY protocol, and hexagonal architecture. Streams via Yellowstone gRPC or free WebSocket. No Spring Boot.
https://github.com/puneethkumarck/prism
blockchain grpc helidon hexagonal-architecture indexer java postgresql real-time solana virtual-threads
Last synced: 8 days ago
JSON representation
Prism โ High-performance real-time Solana transaction indexer built with Java 25, Helidon 4 SE, Virtual Threads, pgjdbc COPY protocol, and hexagonal architecture. Streams via Yellowstone gRPC or free WebSocket. No Spring Boot.
- Host: GitHub
- URL: https://github.com/puneethkumarck/prism
- Owner: Puneethkumarck
- Created: 2026-04-09T23:12:34.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-08T12:37:27.000Z (25 days ago)
- Last Synced: 2026-05-08T14:34:23.464Z (25 days ago)
- Topics: blockchain, grpc, helidon, hexagonal-architecture, indexer, java, postgresql, real-time, solana, virtual-threads
- Language: Java
- Size: 492 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 67
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README








# ๐บ Prism
### Refract the Solana firehose into a queryable data stream.
**A zero-Spring, zero-JPA, real-time Solana transaction indexer built on Java 25 Virtual Threads and Helidon 4 SE.**
Streams confirmed transactions via Yellowstone gRPC (paid) or WebSocket `blockSubscribe` (free), persists them with the PostgreSQL `COPY` protocol, and serves them over a paginated REST API โ all with sub-100ms startup and a <50 MB resident footprint.
[Why Prism?](#-why-does-this-exist) ยท [Architecture](#-architecture) ยท [The Hot Path](#-the-hot-path-how-a-transaction-becomes-a-row) ยท [Quick Start](#-quick-start) ยท [API Reference](#-api-reference) ยท [Tech Stack](#-tech-stack)
---
## ๐ฌ The Problem
Solana produces a new block roughly every **400 milliseconds**. At any given moment, mainnet can push **50,000+ transactions per second** through the firehose. If you want to know what happened on chain โ payments, memos, failed swaps, large transfers โ you have exactly three options:
| Option | What Happens | ๐ Verdict |
|---|---|---|
| ๐ข **Poll `getBlock`** | `getBlock(slot)` โ repeat, repeat, repeat | Falls behind in minutes. Burns RPC credits. Dies at 50K TPS. |
| ๐๏ธ **Use a hosted indexer** | Pay per query. Sit behind someone else's cache. Hope their schema fits | $$$, schema lock-in, no custom parsing |
| ๐ **Stream + batch yourself** | Subscribe to Geyser/WebSocket, parse in-process, batch write to Postgres | Full control. Sub-second freshness. Your schema, your queries. |
Prism is option 3 โ a sharp, opinionated take on **option 3** โ written in Java 25 with Virtual Threads, Helidon 4 SE, and raw pgjdbc. No Spring Boot. No JPA. No reflection. Just a tight hot path from socket to row.
## ๐ก The Solution
```text
๐ Solana ๐บ Prism ๐๏ธ PostgreSQL
โโโโโโโโโ โโโโโโ โโโโโโโโโโโโโ
transactions
Yellowstone gRPC โโโบ stream adapter โโโบ COPY โโโบ failed_tx
(or WebSocket) โ โโบ memos
โผ โโบ large_transfers
LinkedTransferQueue โโบ accounts
(unbounded, lock-free) โฒ
โ โ
โผ โโโโโโโ
TransactionBatchService parallel writes
(200 tx / 100ms dual-trigger) on virtual threads
โ
โผ
TransactionProcessor
split โ [success | failed | memo | transfer]
```
## ๐ฏ The Result
A real-time pipeline that stays caught up with mainnet, survives RPC flaps, flushes batches in ~100ms, and exposes 8 paginated read endpoints โ all inside a JVM that starts in under a second and sips heap.
| Metric | Value |
|--------|-------|
| **Throughput target** | ~99.5% indexing efficiency at mainnet velocity |
| **Write strategy** | PostgreSQL `COPY FROM STDIN` + staging merge โ **5-10ร faster** than `INSERT VALUES` |
| **Startup time** | < 100 ms (Helidon 4 SE, no classpath scanning) |
| **Thread model** | Virtual Threads โ no reactive `.flatMap().subscribeOn()` gymnastics |
| **Backpressure** | Unbounded tx queue, bounded account queue โ zero disconnects from Yellowstone |
| **Streaming modes** | ๐ WebSocket `blockSubscribe` (free) ยท ๐ฐ Yellowstone gRPC (paid) |
| **Stack size** | Helidon 4 SE + pgjdbc + Avaje Inject + Micrometer โ **NO Spring Boot** |
---
## ๐ Table of Contents
- [๐ค Why Does This Exist?](#-why-does-this-exist)
- [๐ Why "Prism"?](#-why-prism)
- [๐๏ธ A Day in the Life of a Solana Transaction](#%EF%B8%8F-a-day-in-the-life-of-a-solana-transaction)
- [โก The Hot Path: How a Transaction Becomes a Row](#-the-hot-path-how-a-transaction-becomes-a-row)
- [๐๏ธ Architecture](#%EF%B8%8F-architecture)
- [๐งฌ The COPY Protocol: Why We Bypass `INSERT VALUES`](#-the-copy-protocol-why-we-bypass-insert-values)
- [๐งต Virtual Threads: Why No Reactor, No WebFlux, No Spring Boot](#-virtual-threads-why-no-reactor-no-webflux-no-spring-boot)
- [๐ Dual Streaming Modes: Free or Fast](#-dual-streaming-modes-free-or-fast)
- [๐ชฃ Dual-Trigger Batching: Size OR Time](#-dual-trigger-batching-size-or-time)
- [๐ The Reconnect Dance](#-the-reconnect-dance)
- [๐ ๏ธ Tech Stack](#%EF%B8%8F-tech-stack)
- [๐งฑ Module Structure](#-module-structure)
- [๐ Quick Start](#-quick-start)
- [๐๏ธ Make Targets](#%EF%B8%8F-make-targets)
- [๐ API Reference](#-api-reference)
- [โ๏ธ Configuration Reference](#%EF%B8%8F-configuration-reference)
- [๐ Observability](#-observability)
- [๐งช Testing Strategy](#-testing-strategy)
- [๐๏ธ Database Schema](#%EF%B8%8F-database-schema)
- [๐ง Design Decisions, Quick Reference](#-design-decisions-quick-reference)
- [๐ License](#-license)
---
## ๐ค Why Does This Exist?
Because every on-chain product eventually asks the same questions:
- ๐ธ *"Did our transaction confirm?"*
- ๐ฐ *"Which wallets just moved more than 1 SOL?"*
- ๐ *"Did the customer include a memo with that payment?"*
- ๐ซ *"How many of our swaps failed in the last hour?"*
- ๐ฆ *"What's the current balance of every fee payer we've seen?"*
These questions need **fresh, queryable, relational** data โ not JSON-RPC round-trips to a Solana RPC node, and not someone else's hosted indexer. You need *your* Postgres with *your* indexes and *your* schema, populated a few hundred milliseconds after finality.
Prism answers all five questions out of the box.
> **๐ฏ Design principle:** Hot path stays **synchronous and boring**. No reactive, no actors, no fancy schedulers. One virtual thread per job. One queue per concurrency boundary. The JVM does the rest.
---
## ๐ Why "Prism"?
Because a prism takes one stream of white light and **splits it into the colors that were always there** โ it doesn't create information, it reveals structure.
That's exactly what the indexer does to Solana's stream:
```text
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โจ Solana stream โโโค ๐บ Prism โ
(undifferentiated)โ โ
โ โโโบ ๐ข successful transactions โ
โ โโโบ ๐ด failed transactions โ
โ โโโบ ๐ก large transfers (>1 SOL) โ
โ โโโบ ๐ฃ memos โ
โ โโโบ ๐ต fee payer accounts โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
five queryable tables, refracted
out of one protobuf soup
```
Every incoming transaction is refracted into the projections you actually want to query. No more scanning JSON blobs. No more `getBlock` loops. Just `SELECT`.
---
## ๐๏ธ A Day in the Life of a Solana Transaction
> **๐ฌ Scene 1 โ Somewhere in a Solana validator, 400 ms ago**
```text
โ๏ธ validator ๐ก Geyser plugin ๐บ Prism
โโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโ
๐งฑ builds slot 312_701 โ
๐ includes tx 5Kx7aLm... โ
โ๏ธ finalizes block โ
๐ "new tx!" โโโโโโโโโโบ ๐ฅ arrives on gRPC
โ
๐ TransactionParser
โ
โ signature: 5Kx7a...
โ slot: 312_701
โ amount: 4.2 SOL
โ from: 7xKX...h9Fz
โ to: 9vBM...n3Tr
โ memo: "invoice #7341"
โ failed: false
โผ
LinkedTransferQueue (unbounded)
โ
โผ
๐ฆ TransactionBatchService
โ (accumulating...)
โ 199 tx + this one = 200 โ ๐ฝ FLUSH
โผ
๐ TransactionProcessor
โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
๐ transactions ๐ memos ๐ฐ large_transfers
(COPY FROM STDIN) (batch INSERT) (batch INSERT)
staging โ merge reWriteBatched=true reWriteBatched=true
โ
โผ
done in ~8 ms
total, parallel
```
**Timeline for one transaction:**
| Stage | Latency | What happens |
|---|---|---|
| ๐ก Geyser publish | ~5 ms | Validator plugin flushes to wire |
| ๐ Network hop | ~10-30 ms | HTTP/2 frame to Prism |
| ๐ Parse | < 1 ms | Protobuf โ domain record |
| ๐ชฃ Queue | < 1 ฮผs | `LinkedTransferQueue.offer()` |
| ๐ฆ Batch wait | 0-100 ms | Dual-trigger: 200 tx OR 100 ms |
| ๐๏ธ COPY + merge | ~5-10 ms | 200 rows in one write |
| โ
**End-to-end** | **< 200 ms** | From finality to queryable row |
> **Scene 2 โ A developer runs `curl localhost:3000/api/transfers?min_amount=4.0` and sees the transaction from Scene 1 in the results.** That's the whole movie.
---
## โก The Hot Path: How a Transaction Becomes a Row
The write side is the most interesting part of the system. It's designed to be boring, fast, and **impossible to back-pressure**.
```mermaid
flowchart LR
subgraph Source["๐ Solana Source"]
direction TB
YS["Yellowstone gRPC
(paid)"]
WS["WebSocket
blockSubscribe
(free)"]
end
subgraph Parse["๐ Parsing"]
P1["TransactionParser
protobuf / json"]
P2["BlockNotificationParser
shared logic"]
end
subgraph Queue["๐ชฃ Concurrency Boundary"]
direction TB
TQ["LinkedTransferQueue
unbounded
tx stream"]
AQ["ArrayBlockingQueue(10K)
bounded, drop-if-full
account stream"]
end
subgraph Batch["๐ฆ Batching"]
direction TB
TB["TransactionBatchService
200 tx / 100 ms"]
AB["AccountBatchService
200 acct / 2 s
+ dedup by pubkey"]
end
subgraph Processor["๐ Processor"]
TP["TransactionProcessor
split into 4 buckets"]
end
subgraph DB["๐๏ธ PostgreSQL"]
direction TB
T1["transactions
COPY FROM STDIN"]
T2["failed_transactions
batch INSERT"]
T3["memos
batch INSERT"]
T4["large_transfers
batch INSERT"]
T5["accounts
UPSERT ON CONFLICT"]
end
YS --> P1 --> TQ
WS --> P2 --> TQ
P1 --> AQ
P2 --> AQ
TQ --> TB --> TP
AQ --> AB --> T5
TP --> T1
TP --> T2
TP --> T3
TP --> T4
style T1 fill:#4caf50,color:#fff
style YS fill:#9945FF,color:#fff
style WS fill:#00D18C,color:#fff
```
**Two concurrency boundaries, two queue strategies:**
| Queue | Type | Capacity | Policy | Why |
|---|---|---|---|---|
| ๐ชฃ **Transaction** | `LinkedTransferQueue` | **Unbounded** | Never blocks producer | If this queue blocks, Yellowstone hangs up with a `lagged` error. Losing a transaction is worse than using heap. |
| ๐ชฃ **Account** | `ArrayBlockingQueue` | **10,000** | `try_offer` โ drop if full | Accounts are less critical and dedup-friendly. Dropping the occasional fee payer snapshot is fine. |
This asymmetry is the whole trick. Transactions get backpressure protection; accounts get memory protection. Nobody wins both fights at once.
---
## ๐๏ธ Architecture
Prism follows strict **hexagonal architecture (ports & adapters)** with DDD tactical patterns. Dependencies always point inward.
```text
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ ๐๏ธ application/ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Helidon 4 SE functional routes โ โ
โ โ IndexerApplication (main) โ โ
โ โ IndexerConfig (env parsing) โ โ
โ โ GlobalErrorHandler โ โ
โ โ MapStruct mappers โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ delegates to โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ๐ง domain/ โ โ
โ โ โ โ
โ โ โโโโโ model โโโโโ โ โ
โ โ โ Signature โ โ โ
โ โ โ Pubkey โ โ โ
โ โ โ Slot โ โ โ
โ โ โ SolanaTx โ โ โ
โ โ โ Account โ โ โ
โ โ โโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ โโโโโ service โโโ โ โ
โ โ โ BatchService โ โ โ
โ โ โ Processor โ โ โ
โ โ โ LargeTransfer โ โ โ
โ โ โ Filter โ โ โ
โ โ โโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ โโโโโ port โโโโโโ โ โ
โ โ โ TxStream โโโโ implemented by
โ โ โ TxRepo โ โ โ
โ โ โ MemoRepo โ โ โ
โ โ โ ... โ โ โ
โ โ โโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ ZERO framework imports. โ โ
โ โ Only Lombok + java.* โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โฒ โ
โ โ implements ports โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ๐ infrastructure/ โ โ
โ โ โ โ
โ โ grpc/ Yellowstone โ โ
โ โ websocket/ blockSubscribe โ โ
โ โ persistence/ pgjdbc + COPY โ โ
โ โ metrics/ Micrometer โ โ
โ โ console/ ANSI formatter โ โ
โ โ solana/ Base58, balance โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก๏ธ ArchUnit enforces these rules at build time
```
**The rules (enforced by ArchUnit):**
| Rule | What It Stops |
|---|---|
| `domain` โฅ `infrastructure` | Prevents domain from leaking JDBC/gRPC types |
| `domain` โฅ `application` | Prevents domain from reaching up into routes |
| `domain` has **zero** Helidon/Jakarta imports | Keeps domain framework-free (Lombok + `java.*` only) |
| `domain` has **zero** `java.sql.*` imports | No DB types in business logic |
| `infrastructure` โฅ `application.routing` | Infra adapters can't call routes directly |
Break any rule and the build fails. No social contracts, only compile errors.
---
## ๐งฌ The COPY Protocol: Why We Bypass `INSERT VALUES`
PostgreSQL has two fundamentally different write paths. Most ORMs use the slower one. Prism uses the faster one.
> **๐ฌ The 5ร speedup you get by ignoring your instincts**
```text
โ INSERT VALUES (what JPA / Hibernate / Spring Data give you)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
INSERT INTO transactions VALUES ($1, $2, $3);
INSERT INTO transactions VALUES ($4, $5, $6);
INSERT INTO transactions VALUES ($7, $8, $9);
... ร 200
Each row:
๐ parse SQL
๐ plan query
๐ acquire lock
๐พ write WAL
๐ update index
โ
commit row
200 rows ร overhead = ๐ slow
โ
COPY FROM STDIN (what pgjdbc's CopyManager gives you)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
COPY staging_transactions (signature, slot, success) FROM STDIN (FORMAT TEXT);
5Kx7a... 312701 t
6Lm8b... 312701 t
7Nv9c... 312701 t
... ร 200
\.
INSERT INTO transactions SELECT * FROM staging_transactions
ON CONFLICT (signature) DO NOTHING;
TRUNCATE staging_transactions;
One batch:
๐ parse SQL once
๐ plan query once
๐ stream 200 rows over STDIN
๐พ one WAL flush
๐ index update once
โ
commit batch
5-10ร faster on the hottest table ๐ฅ
```
**Why a staging table?** `COPY` doesn't support `ON CONFLICT`. So we:
1. `COPY` into `staging_transactions` (no constraints, no indexes, pure speed)
2. `INSERT ... SELECT ... ON CONFLICT (signature) DO NOTHING` from staging โ main
3. `TRUNCATE staging_transactions` and repeat
The staging merge costs an extra statement, but `COPY` + merge is still ~5ร faster than individual `INSERT`s because the expensive parts โ parsing, planning, locking, WAL โ amortize across 200 rows.
> **๐ก Secondary tables** (`failed_transactions`, `memos`, `large_transfers`) are low volume, so they use plain `PreparedStatement.addBatch()` with `reWriteBatchedInserts=true` on the pgjdbc URL. The driver rewrites `INSERT ... VALUES ($1, $2)` batches into a single `INSERT ... VALUES ($1, $2), ($3, $4), ...` statement โ nearly `COPY`-level throughput without the staging dance.
---
## ๐งต Virtual Threads: Why No Reactor, No WebFlux, No Spring Boot
Traditional Java servers tried to solve the C10K problem with reactive streams:
```java
// Reactive way โ every I/O op is a callback in a chain
return webClient.get()
.uri("/slot")
.retrieve()
.bodyToMono(Slot.class)
.flatMap(slot -> repo.findBySlot(slot))
.flatMap(txs -> Flux.fromIterable(txs)
.parallel()
.runOn(Schedulers.boundedElastic())
.map(this::process)
.sequential()
.collectList())
.onErrorResume(e -> Mono.error(new IndexerException(e)));
```
That's fine code. It's also impossible to debug, step through, or reason about at 3 AM during an incident.
**Virtual Threads (Project Loom, finalized in JDK 21) change the rules.** You can write plain blocking code and the JVM parks the virtual thread on any I/O wait โ no OS thread is held, no carrier is pinned, no Schedulers, no operators:
```java
// Loom way โ boring, blocking, testable
var slot = httpClient.get("/slot", Slot.class);
var txs = repo.findBySlot(slot);
for (var tx : txs) {
process(tx);
}
```
One virtual thread per job. The JVM multiplexes millions onto a handful of carrier threads. Helidon 4 SE was built from the ground up on this model โ it has no Netty event loop, no Servlet container, no CDI graph to warm up. Startup is under 100 ms and p99.999 latency is under 7 ms.
**Prism's one rule:** never call `synchronized`, always use `ReentrantLock`. `synchronized` pins a virtual thread to its carrier and kills throughput. ArchUnit enforces this at build time.
---
## ๐ Dual Streaming Modes: Free or Fast
Prism can consume Solana's transaction stream in two ways โ **same domain, same batching, same persistence** โ through a pluggable `TransactionStream` port.
```text
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TransactionStream port โ
โ (domain interface) โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโดโโโโโโโโโโ โโโโโโโโโโดโโโโโโโโโ
โ ๐ WebSocket โ โ ๐ฐ Yellowstone โ
โ blockSubscribe โ โ gRPC (Geyser) โ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
```
| | ๐ WebSocket mode | ๐ฐ gRPC mode |
|---|---|---|
| **Endpoint** | `wss://api.mainnet-beta.solana.com` | Paid Yellowstone provider |
| **Protocol** | JSON-RPC `blockSubscribe` over WS | Protobuf over HTTP/2 |
| **Cost** | **$0** โ public RPC | **$300-500/mo** typical |
| **Latency** | Higher โ JSON parse + `confirmed` commitment | Lower โ native protobuf + direct Geyser |
| **Throughput** | Lower โ JSON overhead | Higher โ 8 MB HTTP/2 window |
| **Stability** | Public RPC can be flaky | Dedicated, SLA-backed |
| **Vote filtering** | Client-side (check Vote program) | Server-side (`vote: false` filter) |
| **Tx data** | `encoding: "jsonParsed"`, full | Raw protobuf (richer) |
| **When to use** | Dev, testnet, hobby projects, low TPS | Production, mainnet, real workloads |
Switch with a single env var โ `STREAM_MODE=websocket` (default) or `STREAM_MODE=grpc`. The domain layer doesn't know or care.
> **โ ๏ธ Known HTTP/2 limitation (Helidon 4.4):** The stream-level window is a configurable 8 MiB, but Helidon doesn't yet expose the connection-level window (defaults to 64 KiB per RFC 7540). In practice, Helidon emits `WINDOW_UPDATE` frames as data is consumed, so throughput is gated by consumption speed โ not a static cap. Tracked in the `docs/implementation-plan.md` for a future revisit.
---
## ๐ชฃ Dual-Trigger Batching: Size OR Time
The worst thing you can do to a write-heavy Postgres workload is flush one row at a time. The second-worst thing is to wait forever for a batch that never fills up.
Prism uses **dual-trigger batching** โ flush when *either* threshold fires.
```text
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ฆ TransactionBatchService โ
โ โ
โ Buffer: [ ๐ฆ ๐ฆ ๐ฆ ๐ฆ ๐ฆ ... ] โ
โ โ
โ Trigger A: size โฅ 200 txs โ
โ Trigger B: elapsed โฅ 100 ms โ
โ โ
โ Whichever fires first โ FLUSH ๐ฝ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โก High TPS (40K/s) ๐ชถ Low TPS (100/s) ๐ค Idle (0/s)
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโ
200 txs in 5 ms 200 txs in 2 sec 0 txs
โ size triggers โ time triggers โ no flush
โ flush every 5 ms โ flush every 100 ms โ buffer stays empty
~200ร fewer round-trips bounded max latency no wasted writes
```
**The numbers:**
| Scenario | TPS in | Batches/s | DB round-trips/s | Max write latency |
|---|---|---|---|---|
| Mainnet burst | 40,000 | ~200 | ~200 | 5 ms |
| Typical mainnet | 4,000 | ~40 | ~40 | 50 ms |
| Quiet dev chain | 100 | 10 | 10 | 100 ms |
Compare that to naive per-row writes at 40K TPS: **40,000 round-trips per second**. The database would melt.
**Account batching uses the same pattern with different thresholds** (200 / 2,000 ms) because account upserts are less latency-sensitive and dedup well โ the same `pubkey` often appears multiple times in a 2-second window, and we keep the one with the highest slot in memory before sending a single `UPSERT`.
---
## ๐ The Reconnect Dance
Solana RPC endpoints โ whether free public or paid Yellowstone โ will drop you. Count on it. Here's what happens when they do:
```text
T+0s ๐ stream is flowing... transactions pouring in
T+120s ๐ฅ stream ends unexpectedly (TCP reset / GOAWAY / network blip)
โ
โ ๐งฎ attempt 1: delay = 2 ร 2ยน = 4 s
โผ
T+124s ๐ retry โ connected! streaming resumes
T+184s โ
60 s of stable flow โ attempt counter resets to 0
โ
โ ๐ next disconnect will start at 4 s again
โ
โผ
...
T+900s ๐ฅ another disconnect
โ attempt 1: 4 s โ fails
โ attempt 2: 8 s โ fails
โ attempt 3: 16 s โ fails
โ attempt 4: 30 s (capped) โ connects
โผ
T+958s โ
back online, counter resets after 60 s stable
```
**Formula:** `delay = base ร 2^min(attempt, 4)` where `base = 2 s`, capped at `30 s`.
| Attempt | Computed | Actual Delay |
|---|---|---|
| 1 | 4 s | **4 s** |
| 2 | 8 s | **8 s** |
| 3 | 16 s | **16 s** |
| 4 | 32 s | **30 s** (capped) |
| 5+ | 32 s | **30 s** (capped) |
**Reset rule:** after **60 seconds** of stable connection, attempt counter resets to 0 โ so transient blips don't accumulate into slow restarts.
The same `ReconnectHandler` is shared by both the gRPC and WebSocket adapters. One strategy, two transports.
---
## ๐ ๏ธ Tech Stack
| Component | Choice | Version | Why |
|-----------|--------|---------|-----|
| **Runtime** | Java + Virtual Threads | 25 LTS | Scoped Values finalized, +291% VT throughput vs JDK 21 |
| **HTTP server** | Helidon 4 SE | 4.4.0 | Built on VTs from the ground up, <7 ms p99.999, <50 MB RSS, <100 ms startup, no CDI/reflection |
| **gRPC client** | Helidon 4 SE gRPC | 4.4.0 | Built-in HTTP/2 engine, VT-native, no grpc-java |
| **DI** | Avaje Inject | latest | Compile-time codegen, JSR-330 (`@Singleton`), zero reflection |
| **DB driver** | pgjdbc | 42.7+ | `CopyManager` + `reWriteBatchedInserts=true` |
| **Connection pool** | HikariCP ร 2 | 7.x | Dual pools: write (20) + read (20) |
| **JSON** | Jackson | 2.18+ | Helidon native media support |
| **Migrations** | Flyway (standalone) | 12.x | No Spring integration, runs in `main()` |
| **Resilience** | Resilience4j | 2.3+ | Reconnect backoff strategy |
| **Metrics** | Micrometer + Prometheus | 1.14+ | Native Helidon integration |
| **Mapping** | MapStruct | 1.6.3 | Compile-time, `componentModel = "jsr330"` |
| **Architecture tests** | ArchUnit | 1.4.1 | Hexagonal rules enforced at build time |
| **Logging** | SLF4J + Logback | โ | Structured via `@Slf4j` |
| **Testing** | JUnit 5 + Mockito BDD + AssertJ + Testcontainers + Awaitility | โ | Three source sets: unit, integration, fixtures |
| **Build** | Gradle (Kotlin DSL) + convention plugins | 9.0 | `prism.service` + `prism.library` in `buildSrc/` |
### โ What Prism Explicitly Does Not Use
| Avoided | Replacement | Why |
|---|---|---|
| **Spring Boot** | Helidon 4 SE + `public static void main` | No classpath scanning, no reflection, <100 ms startup |
| **Spring Data JPA** | Raw pgjdbc + `CopyManager` | `COPY FROM STDIN` is 5-10ร faster than `saveAll()` |
| **`@Autowired`** | Avaje `@Singleton` + Lombok `@RequiredArgsConstructor` | Constructor injection only |
| **`@ConfigurationProperties`** | `IndexerConfig` record + `System.getenv()` | Fail-fast parsing, no magic binding |
| **`@RestController`** | Helidon SE `HttpService` functional routing | No annotations, pure function composition |
| **`synchronized`** | `java.util.concurrent.locks.ReentrantLock` | `synchronized` pins virtual threads to carrier threads |
| **`System.out`/`println`** | `@Slf4j` everywhere | Structured logs only |
| **Comments/Javadoc** | Self-documenting code | If a method needs a comment, rename it |
---
## ๐งฑ Module Structure
```text
prism/ โ root project
โ
โโโ buildSrc/ โ Gradle convention plugins
โ โโโ src/main/kotlin/
โ โโโ prism.service.gradle.kts โ applied to main service
โ โโโ prism.library.gradle.kts โ applied to shared libs
โ
โโโ prism/ โ main service (Helidon 4 SE)
โ โโโ src/
โ โโโ main/java/com/stablebridge/prism/
โ โ โโโ application/ โ inbound adapters
โ โ โ โโโ IndexerApplication.java โ main(), wiring
โ โ โ โโโ IndexerLifecycle.java โ shutdown hook
โ โ โ โโโ config/IndexerConfig.java โ env โ record
โ โ โ โโโ route/ โ Helidon SE routes
โ โ โ โ โโโ HealthRoutes.java
โ โ โ โ โโโ StatsRoutes.java
โ โ โ โ โโโ TransactionRoutes.java
โ โ โ โ โโโ TransferRoutes.java
โ โ โ โ โโโ MemoRoutes.java
โ โ โ โ โโโ AccountRoutes.java
โ โ โ โ โโโ CorsConfiguration.java
โ โ โ โ โโโ PaginationLimits.java
โ โ โ โโโ mapper/ โ MapStruct
โ โ โ โโโ error/GlobalErrorHandler.java
โ โ โ
โ โ โโโ domain/ โ core โ zero framework imports
โ โ โ โโโ model/ โ SolanaTransaction, Account, ...
โ โ โ โโโ port/ โ TransactionStream, TransactionRepository, ...
โ โ โ โโโ service/ โ BatchService, Processor, filters
โ โ โ โโโ solana/ โ Base58, balance math, programs
โ โ โ โโโ exception/
โ โ โ
โ โ โโโ infrastructure/ โ outbound adapters
โ โ โโโ grpc/ โ Yellowstone stream + parser
โ โ โโโ websocket/ โ blockSubscribe stream + parser
โ โ โโโ persistence/ โ pgjdbc repositories
โ โ โ โโโ DataSourceFactory.java โ HikariCP ร 2
โ โ โ โโโ FlywayMigrator.java
โ โ โ โโโ CopyTransactionRepository.java
โ โ โ โโโ JdbcFailedTransactionRepository.java
โ โ โ โโโ JdbcMemoRepository.java
โ โ โ โโโ JdbcTransferRepository.java
โ โ โ โโโ JdbcAccountRepository.java
โ โ โ โโโ JdbcStatsRepository.java
โ โ โโโ solana/Base58.java
โ โ โโโ metrics/
โ โ โ โโโ MicrometerMetricsRecorder.java
โ โ โ โโโ BenchmarkLogReporter.java
โ โ โโโ console/ConsoleOutputFormatter.java
โ โ
โ โโโ main/resources/
โ โโโ db/migration/ โ Flyway V1, V2, V3 ...
โ
โโโ prism-api/ โ shared DTOs (java-library)
โ โโโ src/main/java/.../api/model/ โ Page, TransactionResponse, ...
โ
โโโ docs/
โ โโโ functional-spec.md โ single source of truth
โ โโโ implementation-plan.md โ phases 0-7
โ โโโ CODING_STANDARDS.md
โ โโโ TESTING_STANDARDS.md
โ
โโโ infra/
โ โโโ prometheus.yml โ scrape config
โ
โโโ docker-compose.yml โ Postgres + Prometheus + Grafana + app
โโโ Makefile โ developer workflow automation
โโโ build.gradle.kts / settings.gradle.kts
โโโ CLAUDE.md โ agent instructions
```
---
## ๐ Quick Start
### Prerequisites
- ๐ **Docker & Docker Compose** (for PostgreSQL, Prometheus, Grafana)
- โ **Java 25** (for local builds)
- ๐ ๏ธ **Make** (optional, just thin wrappers around `./gradlew` and `docker compose`)
### 60-Second Onboarding
```bash
# 1๏ธโฃ Clone
git clone https://github.com/Puneethkumarck/prism.git
cd prism
# 2๏ธโฃ Boot the infrastructure (Postgres + Prometheus + Grafana)
make infra-up
# 3๏ธโฃ Run Prism against the free public Solana WebSocket endpoint
# Defaults: STREAM_MODE=websocket, RPC_WS_ENDPOINT=wss://api.mainnet-beta.solana.com
DATABASE_URL=postgresql://indexer:indexer@localhost:5432/indexer \
make run
# 4๏ธโฃ In another terminal, watch it work
curl -s http://localhost:3000/health | jq
curl -s http://localhost:3000/api/stats | jq
curl -s "http://localhost:3000/api/transactions?limit=5" | jq
curl -s "http://localhost:3000/api/transfers?min_amount=1.0&limit=10" | jq
```
Within a few seconds you'll see `[SLOT]`, `[TX]`, `[MEMO]`, and `[TRANSFER]` events streaming to stdout, and rows accumulating in Postgres.
### Switching to Paid gRPC Mode
```bash
STREAM_MODE=grpc \
GRPC_ENDPOINT=https://.com \
X_TOKEN= \
DATABASE_URL=postgresql://indexer:indexer@localhost:5432/indexer \
make run
```
### Run Everything in Docker
```bash
make up # builds image via Jib + starts Postgres + Prometheus + Grafana + Prism
make down # stops it all
```
---
## ๐๏ธ Make Targets
| Target | Description |
|---|---|
| `make build` | Compile + Spotless + unit + integration + ArchUnit |
| `make test` | Unit tests only |
| `make integration-test` | Integration tests (requires Docker for Testcontainers) |
| `make clean` | Remove all build artifacts |
| `make format` | Auto-format with Spotless |
| `make lint` | Spotless check + ArchUnit (matches pre-commit hook) |
| `make run` | Run Prism locally via Gradle |
| `make infra-up` | Start Postgres + Prometheus + Grafana |
| `make infra-down` | Stop infrastructure |
| `make infra-clean` | Stop + delete volumes |
| `make infra-status` | Show infra container status |
| `make infra-logs` | Tail infrastructure logs |
| `make docker-build` | Build Docker image via Jib (no Dockerfile) |
| `make up` | Start infra + app container |
| `make down` | Stop everything |
| `make setup-hooks` | Point git at `.githooks/` |
| `make help` | List all targets |
---
## ๐ API Reference
Base URL: `http://localhost:3000` โ no authentication (v1).
Metrics on `http://localhost:9090/metrics` (Prometheus format).
### ๐ฉบ Health
```http
GET /health
```
```json
{ "status": "ok", "uptime_secs": 3600 }
```
### ๐ Stats
```http
GET /api/stats
```
Uses `pg_stat_user_tables.n_live_tup` for **O(1) approximate counts** โ vastly faster than `COUNT(*)` on million-row tables.
```json
{
"total_transactions": 4_812_344,
"total_failed": 1_203_111,
"total_transfers": 38_201,
"total_memos": 914_102,
"total_accounts": 87_433
}
```
### ๐งพ Transactions
| Method | Path | Query Params | Notes |
|---|---|---|---|
| `GET` | `/api/transactions` | `limit` (default 50, max 500), `offset`, `success` (optional bool) | Paginated, `created_at DESC` |
| `GET` | `/api/transactions/{signature}` | โ | Returns `TxRow` or `404` |
| `GET` | `/api/slots/{slot}` | โ | Array, not paginated, `created_at ASC` |
```bash
# List the latest 10 successful transactions
curl -s "http://localhost:3000/api/transactions?limit=10&success=true" | jq
# Look up one by signature
curl -s "http://localhost:3000/api/transactions/5Kx7aLm..." | jq
# All transactions in slot 312_701_542
curl -s "http://localhost:3000/api/slots/312701542" | jq
```
### ๐ฐ Large Transfers
```http
GET /api/transfers?limit=50&offset=0&min_amount=10.0
```
Paginated, ordered by `amount DESC`. Threshold is configurable; default `1.0 SOL`.
### ๐ Memos
```http
GET /api/memos?limit=50&offset=0
```
Paginated, ordered by `created_at DESC`.
### ๐ง Accounts
```http
GET /api/accounts/{pubkey}
```
Returns the most recent balance snapshot for a fee payer, or `404`.
### ๐จ Error Response Format
```json
{
"error": "transaction not found",
"status": 404
}
```
| Status | Meaning |
|---|---|
| `400` | Validation error (invalid base58, out-of-range pagination) |
| `404` | Resource not found (signature / pubkey) |
| `500` | Internal error (DB unreachable, etc.) |
---
## โ๏ธ Configuration Reference
Every setting is an environment variable. `IndexerConfig` is a Java record parsed via `System.getenv()` โ fail-fast on missing required vars, no binding magic, no `application.yml`.
| Variable | Required | Default | Description |
|---|---|---|---|
| `DATABASE_URL` | โ
Yes | โ | `postgresql://user:pass@host:port/db` |
| `STREAM_MODE` | No | `websocket` | `websocket` (free) or `grpc` (paid) |
| `RPC_WS_ENDPOINT` | if `STREAM_MODE=websocket` | `wss://api.mainnet-beta.solana.com` | Solana WebSocket RPC URL |
| `GRPC_ENDPOINT` | if `STREAM_MODE=grpc` | โ | Yellowstone gRPC endpoint (https required except localhost) |
| `X_TOKEN` | No | โ | Auth token for Yellowstone, injected as `x-token` metadata |
| `API_PORT` | No | `3000` | Helidon HTTP port |
| `CONSOLE_LOG` | No | `true` | `false` or `0` suppresses `[TX]`/`[MEMO]`/`[TRANSFER]` output |
| `BENCH_LOG` | No | `benchmark.log` | Path for 5-minute benchmark summary file |
**Fail-fast validation** (in `IndexerConfig.fromEnv()`):
- `DATABASE_URL` is always required.
- `STREAM_MODE` must be `websocket` or `grpc` (case-insensitive).
- `GRPC_ENDPOINT` must be a valid URI with `https` scheme (or `http` for localhost).
- `API_PORT` must be a non-negative integer โค 65535.
---
## ๐ Observability
| Layer | Technology | Details |
|---|---|---|
| **Metrics** | Micrometer + Prometheus | 8 counters (`indexer_tx_received`, `..._written`, `..._failed`, `..._memo`, `..._transfer`, `..._accounts_written`, `..._slots`, `..._batches`) scraped at `:9090/metrics` |
| **Dashboards** | Grafana | Runs at `http://localhost:3001` (admin/admin), Prometheus at `http://localhost:9091` |
| **Benchmark log** | File appender | Every 5 minutes: `timestamp | tps | recv | written | failed | failed% | memos | xfers | accts | batches | slots` |
| **Console output** | ANSI color coded | `[SLOT]` cyan ยท `[TX]` white/red ยท `[MEMO]` magenta ยท `[TRANSFER]` yellow |
| **Health** | Helidon | `GET /health` โ no DB call, uptime from process start |
Sample benchmark log line:
```text
2026-04-11T09:17:42Z | 412 | 4120 | 2581 | 1539 | 37% | 18 | 104 | 31200 | 42 | 12
```
Mainnet is a *chaotic* environment โ a 37% failure rate is completely normal (bots, MEV, failed swaps). The indexer records all of it.
---
## ๐งช Testing Strategy
Three-tier pyramid, with conventions adapted from `stablebridge-tx-recovery`:
```text
โโโโโโโโโโโโโโโโโ
โ Integration โ Testcontainers PostgreSQL, real JDBC,
โ (Docker) โ end-to-end against both stream adapters
โโโโโโโโโโโโโโโโโค
โ Architecture โ ArchUnit: hexagonal layer rules,
โ (no deps) โ no @Autowired, no System.out, no synchronized
โโโโโดโโโโโโโโโโโโโโโโดโโโโ
โ Unit Tests โ BDD Mockito + AssertJ, no Spring context,
โ (single assertion) โ fixture builders, no generic matchers
โโโโโโโโโโโโโโโโโโโโโโโโโ
```
| Tier | Source Set | Frameworks | Docker? |
|---|---|---|---|
| Unit | `src/test/` | JUnit 5, BDD Mockito (`given`/`then`), AssertJ, Awaitility | No |
| Architecture | `src/test/` | ArchUnit 1.4 | No |
| Integration | `src/integration-test/` | JUnit 5, Testcontainers PostgreSQL, direct JDBC | โ
Yes |
**Non-negotiable testing rules:**
- ๐ฏ **Single-assert pattern** โ build expected object, then `assertThat(actual).usingRecursiveComparison().isEqualTo(expected)`.
- ๐ฃ๏ธ **BDD Mockito only** โ `given().willReturn()` / `then().should()`, never `when()/verify()`.
- ๐ซ **No generic matchers** โ no `any()`, `anyString()`, `eq()`. Pass actual values.
- ๐ฌ **`// given` / `// when` / `// then` comments in every test**.
- ๐๏ธ **Fixture builders** โ `SOME_*` constants and `Builder()` in `src/testFixtures/`.
- โฑ๏ธ **Awaitility over `Thread.sleep`** โ polling with timeout, not arbitrary waits.
Run them:
```bash
./gradlew test # unit + architecture (~5 s)
./gradlew integrationTest # integration tests (requires Docker, ~30 s)
./gradlew build # everything + Spotless + ArchUnit
```
---
## ๐๏ธ Database Schema
Five tables, seven indexes, one staging table. All migrations live in `prism/src/main/resources/db/migration/`.
```text
๐ transactions primary key: signature (varchar 88)
signature varchar(88) PK
slot bigint idx_transactions_slot
success boolean idx_transactions_success
created_at timestamptz idx_transactions_created_at DESC
โ failed_transactions
id serial PK
signature varchar(88)
slot bigint
error text
created_at timestamptz idx_failed_tx_created_at DESC
๐ฐ large_transfers
id serial PK
signature varchar(88)
slot bigint
amount numeric idx_large_transfers_amount DESC
created_at timestamptz idx_large_transfers_created_at DESC
๐ memos
id serial PK
signature varchar(88)
memo text
created_at timestamptz idx_memos_created_at DESC
๐ง accounts unique: pubkey
id serial PK
pubkey varchar(88) UNIQUE
lamports bigint
slot bigint
executable boolean
rent_epoch bigint
created_at timestamptz
๐ ๏ธ staging_transactions (no constraints, no indexes)
signature varchar(88)
slot bigint
success boolean โ COPY target, truncated per flush
```
### Dual Connection Pools
```text
HikariCP Write Pool HikariCP Read Pool
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
max: 20 max: 20
min: 5 min: 5
usage: usage:
COPY staging_tx GET /api/transactions
INSERT failed_tx GET /api/transfers
INSERT memos GET /api/memos
INSERT large_transfers GET /api/accounts/:pubkey
UPSERT accounts GET /api/stats
```
**Why two pools?** During a mainnet burst the write pool can saturate all 20 connections for 50-100 ms. If the API shared that pool, every `GET` would sit in line behind the writes. With dedicated pools, API latency is independent of ingest load โ the primary complaint about every "indexer plus API on one DB" setup.
---
## ๐ง Design Decisions, Quick Reference
| # | Decision | Problem It Solves | Impact |
|---|---|---|---|
| 1 | **Unbounded tx queue** (`LinkedTransferQueue`) | Bounded queues cause producer block โ Yellowstone `lagged` disconnect | Zero dropped transactions from backpressure |
| 2 | **COPY FROM STDIN + staging merge** | `INSERT VALUES` is 5-10ร slower for high-volume tables | 5-10ร write throughput on the hottest path |
| 3 | **200 tx / 100 ms dual-trigger batch** | Per-row writes create ~200ร more DB round-trips | ~200ร fewer round-trips, bounded max latency |
| 4 | **200 acct / 2 s batch with dedup** | Per-tx account upserts spawn thousands of tasks/sec | Eliminates task churn, reduces DB pressure |
| 5 | **Exponential reconnect** (4sโ8sโ16sโ30s cap, 60s reset) | Thundering herd against a flapping endpoint | Progressive delay, fast recovery after stability |
| 6 | **Dual read/write HikariCP pools** | Write bursts starve API read queries | API latency independent of ingest load |
| 7 | **`pg_stat_user_tables` for `/api/stats`** | `COUNT(*)` on million-row tables is O(N) | O(1) approximate counts for dashboard |
| 8 | **4 parallel writes per flush** | Sequential writes to 4 tables multiply flush latency | All 4 writes (COPY + 3 INSERTs) run concurrently on virtual threads |
| 9 | **Helidon 4 SE (not Boot)** | Spring classpath scan, reflection, CDI โ slow startup | <100 ms startup, <50 MB RSS, <7 ms p99.999 |
| 10 | **`ReentrantLock` (not `synchronized`)** | `synchronized` pins virtual threads to carrier | No carrier pinning, full VT throughput |
| 11 | **MapStruct at layer boundaries** | Hand-written copy loops are bug-prone and ugly | Compile-time, type-safe, zero runtime cost |
| 12 | **ArchUnit at build time** | Architectural rules degrade without enforcement | Hexagonal rules fail the build if violated |
---
## ๐ License
Released under the **MIT License**. See [`LICENSE`](LICENSE) for full text.
---
### ๐บ Prism โ Refract the Solana firehose into a queryable data stream.
Built on **Java 25 ยท Virtual Threads ยท Helidon 4 SE ยท pgjdbc ยท PostgreSQL 16**
No Spring Boot ยท No JPA ยท No reflection ยท No apologies.
*Every block. Every signature. Every memo. Every time.*