https://github.com/mlorentedev/pollex

Text polishing API (Go) + Chrome extension + llama.cpp GPU inference on Jetson Nano. Self-hosted, private, fast.
https://github.com/mlorentedev/pollex

chrome-extension go gpu-inference jetson-nano llama-cpp llm self-hosted text-processing

Last synced: 3 months ago
JSON representation

Text polishing API (Go) + Chrome extension + llama.cpp GPU inference on Jetson Nano. Self-hosted, private, fast.

Host: GitHub
URL: https://github.com/mlorentedev/pollex
Owner: mlorentedev
License: mit
Created: 2026-02-11T04:15:11.000Z (3 months ago)
Default Branch: master
Last Pushed: 2026-02-18T03:03:38.000Z (3 months ago)
Last Synced: 2026-02-18T07:39:39.312Z (3 months ago)
Topics: chrome-extension, go, gpu-inference, jetson-nano, llama-cpp, llm, self-hosted, text-processing
Language: Go
Homepage: https://github.com/mlorentedev/pollex
Size: 953 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # Pollex

**Polish your English text** — fixes grammar, improves coherence, and tightens wording. The output sounds like a fluent non-native speaker: professional and clear, not AI-generated.

Self-hosted, private, and fast. Runs entirely on a Jetson Nano 4GB with GPU inference via llama.cpp.

![Pollex demo](docs/assets/demo.gif)

## Architecture

```mermaid

graph LR

    subgraph Your Machine

        EXT["Browser Extension
(Manifest V3)"]

    end

    subgraph Internet

        CF["Cloudflare Tunnel
(pollex.mlorente.dev)"]

    end

    subgraph Jetson Nano 4GB

        API["Pollex API
(Go · :8090)"]

        LLAMA["llama-server
(CUDA 10.2 · GPU)"]

        MODEL["Qwen 2.5 1.5B
(Q4_0 · ~1GB)"]

    end

    subgraph Monitoring

        PROM["Prometheus + Grafana"]

    end

    EXT -- "HTTPS + API Key" --> CF

    CF -- "localhost:8090" --> API

    API -- "/v1/chat/completions" --> LLAMA

    LLAMA --> MODEL

    PROM -. "/metrics" .-> CF

    style EXT fill:#4a90d9,stroke:#3a7bc8,color:#fff

    style CF fill:#f48120,stroke:#d35400,color:#fff

    style API fill:#2ecc71,stroke:#27ae60,color:#fff

    style LLAMA fill:#e67e22,stroke:#d35400,color:#fff

    style MODEL fill:#f39c12,stroke:#e67e22,color:#fff

    style PROM fill:#9b59b6,stroke:#8e44ad,color:#fff

```

| Layer | Tech | Role |

| --- | --- | --- |

| Extension | Chrome Manifest V3 | UI — paste text, select model, copy result |

| Tunnel | Cloudflare Tunnel | Zero-config ingress (Jetson behind double NAT) |

| API | Go 1.26, stdlib `net/http` | Routes text to LLM backends, returns polished result |

| LLM | llama.cpp + Qwen 2.5 1.5B Q4_0 | Local GPU inference on Jetson Nano (~3s short, ~16s medium) |

| Monitoring | Prometheus + Alertmanager + Grafana | SLO tracking, alerting, dashboards |

## How It Works

```mermaid

sequenceDiagram

    participant U as User

    participant E as Extension

    participant T as Cloudflare Tunnel

    participant A as Pollex API

    participant L as llama-server

    U->>E: Paste text + click Polish

    E->>E: Show spinner (0s...)

    E->>T: POST /api/polish (X-API-Key)

    T->>A: Forward to localhost:8090

    A->>L: POST /v1/chat/completions

    L->>L: GPU inference (~3-8s)

    L-->>A: Polished text

    A-->>T: {"polished":"...", "elapsed_ms":...}

    T-->>E: Response

    E->>E: Hide spinner, show result

    U->>E: Click Copy

```

## Quick Start

### Development (no GPU needed)

```sh

make dev    # Start API with mock adapter on :8090

make test   # Run all tests (80+ with subtests, race detector)

```

Load the extension: `chrome://extensions` → Developer mode → Load unpacked → select `extension/`.

Run `make help` for the full list of targets (35 total: dev, build, bench, docker, monitoring, deploy, loadtest, jetson remote ops).

## Performance (Jetson Nano 4GB)

Measured on Qwen 2.5 1.5B Q4_0, full GPU offload (`-ngl 999`), 128 Maxwell cores.

| Text Length | Chars | Inference Time | Throughput |

| ----------- | ----- | -------------- | ---------- |

| Short | ~50 | ~3s | ~4 tok/s |

| Medium | ~500 | ~8s | ~4 tok/s |

| Long | ~1000 | ~16s | ~4 tok/s |

**SLO targets (7-day rolling window):**

| SLI | Target |

| --- | --- |

| Availability (API + llama-server) | 99% (≤100.8 min downtime) |

| Latency p50 | < 20s |

| Latency p95 | < 60s |

| Error rate (5xx on `/api/polish`) | < 1% |

Load tested with k6 (burst: 5 VUs / 25 iterations, sustained: 12 req/min for 2 min). Alerting via Prometheus rules + Alertmanager.

## API

| Method | Path | Auth | Description |

| --------- | ----------- | ----------- | ----------------------- |

| `POST` | `/api/polish` | `X-API-Key` | Polish text via selected model |

| `GET` | `/api/models` | `X-API-Key` | List available models |

| `GET` | `/api/health` | None | Health check (per-adapter status) |

| `GET` | `/metrics` | None | Prometheus metrics |

### `POST /api/polish`

```sh

curl -X POST https://pollex.mlorente.dev/api/polish \

  -H 'Content-Type: application/json' \

  -H 'X-API-Key: YOUR_KEY' \

  -d '{"text":"i goes to store yesterday","model_id":"qwen2.5-1.5b-gpu"}'

# {"polished":"I went to the store yesterday.","model":"qwen2.5-1.5b-gpu","elapsed_ms":3200}

```

### `GET /api/health`

```json

{

  "status": "ok",

  "version": "1.4.0",

  "adapters": {

    "qwen2.5-1.5b-gpu": {"available": true},

    "claude-sonnet": {"available": false, "reason": "no API key"}

  }

}

```

### `GET /api/models`

```json

[

  {"id": "qwen2.5-1.5b-gpu", "name": "Qwen 2.5 1.5B (GPU)", "provider": "llama.cpp"},

  {"id": "mock", "name": "Mock", "provider": "mock"}

]

```

## Project Structure

```text

pollex/

├── cmd/

│   ├── pollex/              # Entry point (flags, config, wiring, shutdown)

│   └── benchmark/           # Benchmark CLI tool

├── internal/

│   ├── adapter/             # LLMAdapter interface + implementations

│   │   ├── adapter.go       #   Interface: Name(), Polish(), Available()

│   │   ├── mock.go          #   Mock (dev/testing)

│   │   ├── ollama.go        #   Ollama (legacy, optional)

│   │   ├── claude.go        #   Claude API (optional)

│   │   └── llamacpp.go      #   llama.cpp (primary, GPU)

│   ├── config/              # YAML + env overrides (POLLEX_*)

│   ├── handler/             # HTTP handlers + response helpers

│   ├── metrics/             # Prometheus metric declarations (promauto)

│   ├── middleware/           # CORS, RequestID, Logging, Metrics, APIKey, RateLimit, MaxBytes

│   └── server/              # SetupMux + integration tests

├── extension/               # Chrome extension (Manifest V3)

├── prompts/polish.txt       # System prompt

├── deploy/

│   ├── loadtest/            # k6 load test scripts (normal, burst, jetson, soak)

│   ├── systemd/             # pollex-api, llama-server, cloudflared, jetson-clocks services

│   ├── scripts/             # init, build-llamacpp, setup-cloudflared

│   ├── prometheus/          # Alert rules, scrape config, alertmanager

│   ├── grafana/             # Dashboard JSON + provisioning

│   └── config.yaml          # Production config (deployed to Jetson)

├── Dockerfile               # Multi-stage: Go builder → alpine (24.7MB)

├── docker-compose.yml       # Local dev (mock mode)

├── docker-compose.monitoring.yml  # Prometheus + Alertmanager + Grafana

├── .github/workflows/       # CI (lint+test+build) + Release (goreleaser)

└── Makefile

```

## Contributing

### Prerequisites

- Go 1.26+

- Chrome (for extension testing)

### Development Workflow

1. **Run tests first** to ensure a clean baseline:

   ```sh

   make test

   make lint

   ```

2. **Start the dev server** with the mock adapter (no LLM needed):

   ```sh

   make dev

   ```

3. **Load the extension** in Chrome (`chrome://extensions` → Load unpacked → `extension/`).

4. **Make changes** — the adapter pattern makes it easy to add new LLM backends:

   - Implement the `LLMAdapter` interface in `internal/adapter/`

   - Register it in `cmd/pollex/main.go:buildAdapters()`

   - The rest (routing, health checks, model listing) is automatic

5. **Run tests** before pushing:

   ```sh

   make test   # All tests with race detector

   make lint   # go vet + gofmt

   ```

### Middleware Chain

Request processing order (defined in `internal/middleware/chain.go`):

```text

CORS → RequestID → Logging → Metrics → APIKey → RateLimit → MaxBytes(64KB) → Timeout(120s) → Router

```

### Hardening

| Protection | Limit | Response |

| --- | --- | --- |

| API key | `X-API-Key` header, constant-time compare | 401 |

| Request body | 64KB max | 413 |

| Text length | 10,000 chars | 400 |

| Rate limit | 10 req/min/IP (sliding window) | 429 |

| Request timeout | 120s | 504 |

### CI/CD

- **Push to `master`** or **PR** → lint + test + build (amd64 + arm64)

- **Tag `v*`** → goreleaser creates GitHub release with binaries + extension zip

Commit messages follow [Conventional Commits](https://www.conventionalcommits.org/).

## Docker

```sh

make docker-build   # Build image (alpine:3.21, 24.7MB, non-root)

make docker-dev     # Run pollex in Docker (mock mode, :8090)

make docker-down    # Stop container

```

### Monitoring Stack

```sh

make dev              # Start pollex natively (mock mode)

make monitoring-up    # Start Prometheus + Alertmanager + Grafana

```

- Prometheus: [localhost:9090](http://localhost:9090) — 6 alerting rules based on SLOs

- Grafana: [localhost:3000](http://localhost:3000) — Pollex SRE Overview dashboard (auto-provisioned)

- Alertmanager: [localhost:9093](http://localhost:9093) — Slack webhook routing

```sh

make monitoring-down      # Stop monitoring stack

make monitoring-validate  # Validate Prometheus rules syntax

```

## Deploy to Jetson

### First-time setup

```sh

make deploy-init      # Packages, CUDA PATH, /etc/pollex, systemd services

make deploy-llamacpp  # Build llama.cpp with CUDA on Jetson (~85 min)

make deploy           # Binary + config + prompt

make deploy-secrets   # API key

make deploy-tunnel    # Cloudflare Tunnel

```

### Subsequent deploys

```sh

make deploy           # Build ARM64 + SCP + restart service

```

### Remote operations

```sh

make jetson-ssh             # SSH into Jetson

make jetson-status          # Health check via SSH

make jetson-test            # End-to-end polish test

make jetson-logs            # Tail API logs

make jetson-tunnel-start    # Start Cloudflare Tunnel

make jetson-tunnel-status   # Tunnel health

make jetson-tunnel-logs     # Tail tunnel logs

```

## Hardware

**Jetson Nano 4GB** — ARM64, CUDA 10.2, 128 Maxwell cores.

| Component | RAM |

| --- | --- |

| JetPack OS (headless) | ~500MB |

| llama-server (GPU) | ~200MB |

| Qwen 2.5 1.5B (Q4) | ~1.0GB |

| Pollex API | ~15MB |

| **Free** | **~2.3GB** |

## License

[MIT](LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mlorentedev/pollex

Awesome Lists containing this project

README