https://github.com/kritoke/catseye
All around code quality and security scanner, finds problems in particular in ai generated code.
https://github.com/kritoke/catseye
code-review
Last synced: 23 days ago
JSON representation
All around code quality and security scanner, finds problems in particular in ai generated code.
- Host: GitHub
- URL: https://github.com/kritoke/catseye
- Owner: kritoke
- Created: 2026-05-07T01:32:53.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-25T14:26:26.000Z (26 days ago)
- Last Synced: 2026-05-25T14:26:32.233Z (26 days ago)
- Topics: code-review
- Language: OCaml
- Homepage:
- Size: 5.05 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Catseye
**Multi-language static security analysis with taint tracking, code smell detection, and AI antipattern linting.**
Supports **Crystal, Gleam, JavaScript, TypeScript, Svelte, OCaml, and Rust** — with language-specific security rules and antipattern databases for each.
> **v0.4.4** - OCaml idiomatic rules, updated Crystal/Gleam/Svelte detectors, OCaml verbose-option detection
## Installation
### Binary Releases (Linux & macOS)
Download pre-built binaries from the [Releases](https://github.com/kritoke/catseye/releases) page:
```bash
# Linux x86_64
curl -L https://github.com/kritoke/catseye/releases/download/v0.4.3/catseye-linux-x86_64.tar.gz | tar xz
# Linux ARM64 (aarch64)
curl -L https://github.com/kritoke/catseye/releases/download/v0.4.3/catseye-linux-aarch64.tar.gz | tar xz
# macOS Apple Silicon (ARM64)
curl -L https://github.com/kritoke/catseye/releases/download/v0.4.3/catseye-macos-aarch64.tar.gz | tar xz
```
> **Note:** macOS Intel (x86_64) builds have been discontinued. Use macOS ARM64 for Apple Silicon Macs.
After extraction, run `./install-grammars.sh` to install tree-sitter grammars:
### Nix (All Platforms)
```nix
# In your project
cat > flake.nix << 'EOF'
{
inputs.catseye.url = "github:kritoke/catseye";
outputs = { self, nixpkgs, catseye }: {
devShells.x86_64-linux.default = nixpkgs.legacyPackages.x86_64-linux.mkShell {
buildInputs = [ catseye.packages.x86_64-linux.default ];
};
};
}
EOF
```
### Build from Source
**Requirements:**
- **OCaml** 5.x + **Dune** 3.x
- **tree-sitter** CLI + language grammars (JS, TS, Svelte, OCaml, Gleam, Rust)
- **Crystal** 1.x (optional — needed only for native Crystal extractor)
- OCaml libs: yojson, cmdliner, bos, rresult, logs, fmt, toml, kdl, ocamlgraph
For detailed instructions on installing dependencies without Nix, see [install.md](install.md).
```bash
# Clone the repo
git clone https://github.com/kritoke/catseye.git
cd catseye
# Build (uses tree-sitter grammars from nix by default)
just build
# Run tests
just test
```
## Quick Start
```bash
# Scan a project (auto-detects all languages)
just scan path/to/project/src
# Scan specific languages only
catseye-ocaml --lang javascript,typescript path/to/project/
# Scan with all checks
just scan-full path/to/project/src
# JSON output
just scan-json path/to/project/src
```
## Language Support
| Language | Extensions | Security Rules | AI Lint | Code Smells | Extractor |
| ---------- | -------------------------- | :------------: | :-------------------: | :-------------: | ------------------------------ |
| Crystal | `.cr` | ✅ 12 rules | ✅ 45 detectors | ✅ 16 detectors | Crystal extractor + AST bridge |
| Gleam | `.gleam` | ✅ 12 rules | ✅ 36 detectors | ✅ 16 detectors | tree-sitter |
| JavaScript | `.js` `.jsx` `.mjs` `.cjs` | ✅ 10 rules | ✅ 60+ hallucinations | ✅ 16 detectors | tree-sitter |
| TypeScript | `.ts` `.tsx` | ✅ 10 rules | ✅ (shares JS rules) | ✅ 16 detectors | tree-sitter |
| Svelte | `.svelte` | ✅ XSS/SSRF | ✅ 12 rules | ✅ 16 detectors | tree-sitter (two-pass) |
| OCaml | `.ml` `.mli` | ✅ Basic | ✅ 18 rules | ✅ 16 detectors | tree-sitter |
| Rust | `.rs` | ✅ Basic | ✅ 3 detectors | ✅ 16 detectors | tree-sitter (native) |
## CLI Reference
```
catseye [options]
-f, --format terminal (default), json, sarif, markdown, dot
-o, --output write results to file
-r, --rules rules directory (default: ~/.local/lib/catseye/rules/)
--config config file path (default: .catseye.toml in target or parents)
--lang all (default), or comma-separated: crystal,gleam,javascript,typescript,svelte,ocaml,rust
--no-color disable colored output
--no-cache disable extraction cache
--clear-cache clear cache and run full scan
--cache-dir cache directory (default: .catseye)
--cfg use IL/CFG-based taint engine (more sensitive)
--no-cfg use flat taint engine (default, fewer findings)
--analysis-timeout timeout for analysis phase (0 = disabled)
--cfg-max-blocks max blocks per function CFG (default: 500)
--cfg-timeout-ms timeout per function CFG build (default: 5000)
--predator-vision enable reachability analysis (live/dormant/safe)
--crows-nest enable supply chain audit (Crystal shard.yml + Gleam gleam.toml only; very limited CVE data)
--claws enable code smell detection
--ai-lint enable AI antipattern detection (Crystal, Gleam, Svelte, OCaml, Rust)
--suppress comma-separated rule IDs to suppress (e.g., unused-let,InsecureRandom)
--include-deps include shard dependencies in scan (Crystal only)
--no-recurse don't recurse into subdirectories (applies to all languages)
-p, --parallelism parallel workers (0 = auto)
-v, --version show version
-h, --help show help
```
## What It Detects
> **Full rule reference:** See [RULES.md](RULES.md) for complete tables of all security rules, code smells, and AI antipatterns.
### Security Rules (taint-based)
Rules are KDL files — different rule sets per language, all using the same taint engine.
| Rule | Severity | Crystal/Gleam | JS/TS | Svelte | Rust |
| ---------------------- | -------- | :-------------------------------: | :------------------------------: | :--------------------: | :---------------------: |
| **SSRF** | Critical | `HTTP::Client.get`, `hackney.get` | `$fetch`, `$get` | `$fetch` | — |
| **CommandInjection** | Critical | `system`, `Process.run` | `child_process.$exec` | — | `std::process::Command` |
| **PathTraversal** | High | `File.read`, `File.write` | `$readFile`, `$writeFile` | — | — |
| **SQLInjection** | Critical | `db.exec`, `db.query` | — | — | — |
| **XSS** | Critical | — | `innerHTML`, `document.write` | `{@html}`, `innerHTML` | — |
| **UnsafeBlock** | High | — | — | — | `unsafe {}` |
| **OpenRedirect** | Medium | `redirect_to` | `$redirect`, `location.assign` | — |
| **PrototypePollution** | High | — | `$merge`, `Object.assign` | — |
| **EvalInjection** | Critical | — | `eval`, `Function`, `setTimeout` | — |
| **EnvInjection** | High | `ENV[]=` | — | — |
| **LDAPInjection** | High | `LDAP.query` | — | — |
| **ScentLeakage** | High | `puts`, `Log.info` | `console.log` | — |
| **ReDoS** | Medium | `Regex.new` | `new RegExp` | — |
| **WeakCryptography** | Medium | `Digest::MD5` | `createHash('md5')` | — |
| **HardcodedSecrets** | Medium | `password=` | `api_key=` | — |
Rules are KDL files in `src/ocaml/rules/` — add your own by creating a `.kdl` file.
### AI Antipattern Detection (`--ai-lint`)
Catches patterns common in AI-generated code: hallucinated method calls, framework confusion, security antipatterns, and best practice violations.
#### JavaScript / TypeScript (60+ rules)
| Category | Examples |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- |
| **Hallucinated methods** | `strip()` → `.trim()`, `len()` → `.length`, `append()` → `.push()`, `print()` → `console.log()` |
| **Framework confusion** | Python (`dict`, `range`, `enumerate`), Ruby (`puts`, `select`, `compact`), Java (`System.out.println`), PHP (`var_dump`, `strlen`) |
| **Security** | `eval()`, `new Function()`, `child_process.exec()`, prototype pollution (`__proto__`), `Math.random()` for security |
| **Best practices** | `alert()`, `debugger`, `console.log` left in code, `document.write()` deprecated |
| **Code quality** | `==` instead of `===`, deep `.then()` chains (4+), `escape()`/`unescape()` deprecated, incomplete `.replace()` sanitization |
#### Svelte (12 rules)
| Category | Examples |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------- |
| **Svelte 4→5 migration** | `createEventDispatcher` → callback props, `beforeUpdate`/`afterUpdate` → `$effect()`, Svelte 4 stores → runes |
| **Svelte 5 Rune Validation** | `$state()` without init, `$effect` without cleanup (setInterval), `$derived` reassignment |
| **Framework confusion** | React hooks (`useState`, `useEffect`), Vue directives (`v-if`, `v-for`, `v-model`), Angular (`ngModel`, `ngIf`) |
| **XSS** | `{@html}` with dynamic content, `innerHTML`, `document.write` |
#### OCaml (18 rules)
| Category | Rule ID | What it catches |
| -------------------------- | -------------------------- | -------------------------------------------------------- |
| **Hallucinated functions** | `hallucinated-method` | Haskell/Scala/Python APIs (`foldl`, `putStrLn`, `range`) |
| **Unsafe operations** | `unsafe-obj-magic` | `Obj.magic` — unsafe type coercion |
| | `unsafe-deserialization` | `Marshal.from_channel`, `Marshal.from_string` |
| | `command-injection` | `Sys.command`, `Unix.exec*` with untrusted input |
| **Partial functions** | `partial-function` | `List.hd`, `List.tl`, `List.assoc`, `Option.get` |
| **Best practices** | `ocaml-verbose-option` | Nested `match` on options → use `let*` |
| | `ocaml-non-tail-recursive` | Recursive functions without tail optimization |
| | `ocaml-redundant-if-bool` | `if x then true else false` → just `x` |
| | `unused-binding` | `let` bindings that are never used |
| | `hardcoded-secrets` | API key patterns in source code |
#### Crystal & Gleam
| Rule | Languages | What it catches |
| ------------------------- | --------- | ----------------------------------------------------- |
| `hallucinated-stdlib` | Crystal | Calls to methods that don't exist (45-entry database) |
| `hardcoded-secrets` | Both | API key patterns (Stripe, GitHub, AWS, JWT, Slack) |
| `hardcoded-urls` | Crystal | Hardcoded http:// and IP addresses |
| `deprecated-syntax` | Crystal | `puts`, `p`, `pp` in production code |
| `sequential-blocking` | Crystal | 3+ sequential HTTP/DB/File blocking calls |
| `string-concat-loop` | Crystal | String concatenation inside iterators |
| `nilable-ivar-access` | Crystal | Instance variable accesses that may need nil checks |
| `panic-call` | Gleam | `panic` used instead of `Result` |
| `list-wrap-unnecessary` | Gleam | `List.wrap` on collections |
| `debug-in-library` | Gleam | `io.debug` in non-example/test code |
| `result-in-map` | Gleam | `list.map` on Result values |
| `pipeline-steps-overload` | Gleam | 5+ step pipelines |
| `use-candidate` | Gleam | 3+ nested anonymous functions — suggest `use` |
#### Rust (3 detectors)
| Rule | What it catches |
| ------------------- | -------------------------------------------------------------- |
| `RustHallucination` | Python/Ruby/Go APIs in Rust (`len()`, `range()`, `dict.get()`) |
| `UnsafePanic` | `unwrap()`, `expect()`, `panic!()` without error handling |
| `RustInefficiency` | Unnecessary clones, `String::from(&var)` |
### Code Smells (`--claws`)
All 16 code smell detectors use **AST-native analysis** via `CatseyeAST.t` — they work across all supported languages.
| Detector | Rule ID | Threshold |
| --------------------- | --------------------- | ---------------------------------- |
| Cyclomatic complexity | `HighComplexity` | M ≥ 10 |
| Long parameter list | `LongParameterList` | ≥ 5 params |
| Deep nesting | `DeepNesting` | ≥ 4 levels |
| God objects | `GodObject` | ≥ 20 defs/file |
| DRY violations | `DRYViolation` | 4+ duplicates |
| Long method | `LongMethod` | ≥ 30 nodes |
| Message chain | `MessageChain` | ≥ 5 links |
| Data class | `DataClass` | 2+ props, no behavior |
| Data clump | `DataClump` | 3+ params always together |
| Flag argument | `FlagArgument` | bool params |
| Complex match | `ComplexMatch` | ≥ 5 branches |
| Dead code | `DeadCode` | unreachable code |
| Feature envy | `FeatureEnvy` | excessive cross-class calls |
| Orphaned spawn | `OrphanedSpawn` | `spawn`/`go` without rescue/ensure |
| Muted pack | `MutedPack` | `Channel.send` without receive |
| Dead letter | `DeadLetter` | `Channel.close` before receive |
| Spaghetti code | `SpaghettiCode` | ≥ 60 body nodes |
| Large class | `LargeClass` | > 500 LOC |
| Blob | `Blob` | large + data clumps |
| Lazy class | `LazyClass` | < 3 methods |
| Hub-like module | `HubLikeModule` | > 12 dependencies |
| Shotgun surgery | `ShotgunSurgery` | 5+ calls to same module |
| Parallel inheritance | `ParallelInheritance` | same-prefix class hierarchies |
### Supply Chain Audit (`--crows-nest`)
> ⚠️ **Very limited.** Only supports Crystal `shard.yml` and Gleam `gleam.toml`. No JavaScript/TypeScript (npm/pnpm/yarn), Python, Ruby, Rust, Go, or other ecosystems. CVE data via [OSV.dev](https://osv.dev) has **very limited coverage** — most packages return no vulnerabilities even when known issues exist. Use dedicated tools like `npm audit`, `cargo audit`, or `safety` for real supply chain auditing.
What it does:
- Parses `shard.yml` → Crystal Shards dependencies (with versions from GitHub)
- Parses `gleam.toml` → Gleam Hex dependencies
- Queries OSV.dev for known CVEs (limited data coverage)
- Checks GitHub repo activity for staleness (Crystal shards with `github:` fields)
- Results cached in SQLite (24h TTL)
What it doesn't do:
- Parse `package.json`, `Cargo.toml`, `requirements.txt`, `Gemfile`, etc.
- Run ecosystem-native audit tools (`pnpm audit`, `cargo audit`, etc.)
- Provide comprehensive vulnerability coverage
- Check lockfiles for exact installed versions
## Example Output
```
Catseye v0.4.3
Target: ./src
Files: 72 Crystal, 8 JavaScript, 5 TypeScript, 4 Svelte
→ Running analysis engine (7367 nodes)...
🔴 Error SSRF src/controllers/proxy_controller.cr:32
Potential SSRF via HTTP::Client.get with tainted argument(s): url.
← Source: params (proxy_controller.cr:28)
🔴 Error XSS frontend/src/routes/+page.svelte:15
{@html} with dynamic content is an XSS risk — ensure input is sanitized
[ai:hallucinated-method] scripts/utils.js:42 - 'strip()' doesn't exist in JS — use .trim()
⚠️ Warning PathTraversal src/file_handler.cr:45
Path traversal via File.read — but path.starts_with?() validation detected, suppressing.
Found 6 Error(s), 0 Warning(s) across 89 files.
Review the findings above.
```
## How It Works
```
Source files
│
├─ Crystal (.cr) ──→ Crystal extractor (AST → JSON) ─┐
├─ Gleam (.gleam) ─→ tree-sitter (CST → XML → AST) ─┤
├─ JS/TS (.js .ts) ─→ tree-sitter (CST → XML → AST) ┤
├─ Svelte (.svelte) ─→ tree-sitter two-pass ─────────┤
└─ OCaml (.ml) ─→ tree-sitter (CST → XML → AST) ────┤
│
CatseyeAST.t (unified) ◄────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
Security Nodes AI Linter Code Smells
(taint engine) (AST rules) (Claws)
│ │ │
└────────────────┼────────────────┘
▼
KDL Rule Interpreter
│
Terminal / JSON / SARIF / Markdown / DOT
```
**Taint pipeline:** seed → propagate → returns → interproc → propagate → cross-file → guards → rules
1. **Seed** — Params named like taint sources (`url`, `request`, `params`) are marked tainted
2. **Propagate** — Fixed-point; taint flows through assignments, call chains, and **property access** (e.g., `uri.request_target` inherits taint from `uri`)
3. **Returns** — Functions with tainted bodies return tainted data
4. **Inter-procedural** — Taint crosses function boundaries
5. **Guards** — `unless path.starts_with?("/safe/")` suppresses taint (**path sensitivity**)
6. **Rules** — KDL rules match sinks against tainted variables, with `arg=N` position matching
**Path sensitivity** reduces false positives by tracking validation guards:
- `starts_with?`, `end_with?` → suppress path traversal
- `valid_url?`, `check_*`, `sanitize_*` → suppress SSRF
- Validation scope: 50 lines or to next function boundary
**CFG engine** (`--cfg`) converts CatseyeAST.t → IL → basic block CFG → forward dataflow taint analysis. Branch-aware: taint does not flow across dead branches. Dominator-based sanitizer suppression.
### Adding a Security Rule
Create `src/ocaml/rules/my_rule.kdl`:
```kdl
rule "MyRule" severity="Medium" {
sinks {
sink "Dangerous.call" arg=0 {
sanitizer "Safe.wrapper"
}
}
sources {
source "params"
source "url"
}
message "My rule: {sink} with tainted argument(s): {tainted_vars}."
}
```
`arg=0` means only flag when tainted data is in the first argument. Omit for any-arg matching.
`$var` metavariables match any receiver prefix: `sink "$client.get"` matches `http.get`, `conn.get`, `my_client.get`.
Rebuild with `just build` and test.
### Extraction Strategy
**Crystal** uses a dedicated Crystal extractor (compiled at build time). All other languages use **tree-sitter** with language-specific CST → CatseyeAST mappers.
For Crystal projects with `shard.yml`, the `lib/` directory is automatically excluded to skip shard dependencies and avoid symlink loops.
**Svelte** uses a two-pass strategy: first parse with tree-sitter-svelte to extract `` blocks, then parse the script content with the JS/TS grammar.
## Configuration
Optional `.catseye.toml` in your project root (walked up from the target directory):
```toml
[scan]
exclude = ["node_modules", ".git", "vendor", "spec"]
[analysis]
extra_sources = ["user_input", "raw_params"]
extra_sanitizers = ["sanitize_path", "escape_shell"]
parallelism = 4
[claws]
complexity_warning = 10
max_params = 5
# Suppress code smell rules by file glob
[claws.suppress]
DataClump = ["**"]
LongParameterList = ["**/repositories/**"]
# Suppress security/taint findings by file glob
[taint.suppress]
SSRF = ["**/validated_http_client.cr"]
PathTraversal = ["**/safe_io.cr"]
# Suppress specific rules by ID (CLI --suppress flag)
[suppress]
# unused-let: Gleam OTP bindings appear unused but are used by runtime
unused-let = true
guard-after-wildcard = true
```
### CLI Suppress Flag
Use `--suppress` to disable specific rules without a config file:
```bash
catseye ./src --suppress unused-let,guard-after-wildcard
# Suppress security rules
catseye ./src --suppress InsecureRandom,WeakCryptography
```
This suppresses rules in both the taint/security engine and AI lint detectors.
### Glob Patterns
- `*` matches any characters except `/`
- `**` matches any characters including `/` (cross-directory)
- `?` matches a single character
## Justfile Recipes
```
just build Build the engine
just test Unit tests + E2E
just scan <dir> Scan with terminal output
just scan-full <dir> Scan with all checks enabled
just scan-json <dir> Scan with JSON output
just scan-ai <dir> AI antipattern detection only
just scan-reports <dir> Generate JSON + SARIF + Markdown reports
just fmt Format OCaml code
just lint Check formatting
just clean Clean build artifacts
just extract <file> Run Crystal extractor on a single file (debug)
```
## Project Structure
```
catseye/
├── src/
│ ├── ocaml/
│ │ ├── bin/main.ml # CLI entry point
│ │ ├── lib/
│ │ │ ├── catseye_engine/ # Flat taint analysis + propagation, extractor registry
│ │ │ ├── catseye_il/ # IL types, CFG builder (ocamlgraph), dominator analysis
│ │ │ ├── catseye_ast/ # Unified AST + language mappers + plugin registry
│ │ │ │ ├── crystal_mapper.ml # Crystal JSON → AST
│ │ │ │ ├── gleam_mapper.ml # Gleam tree-sitter → AST
│ │ │ │ ├── javascript_mapper.ml # JS tree-sitter → AST
│ │ │ │ ├── typescript_mapper.ml # TS (extends JS mapper)
│ │ │ │ ├── svelte_mapper.ml # Svelte two-pass → AST
│ │ │ │ ├── ocaml_mapper.ml # OCaml tree-sitter → AST
│ │ │ │ ├── language_plugin.ml # Plugin interface
│ │ │ │ └── plugin_registry.ml # Plugin discovery
│ │ │ ├── ai_linter/ # AI antipattern rules
│ │ │ │ ├── crystal_rules.ml # Crystal hallucination DB (37 entries)
│ │ │ │ ├── gleam_rules.ml # Gleam antipatterns
│ │ │ │ ├── javascript_rules.ml # JS/TS hallucinations + antipatterns (60+)
│ │ │ │ ├── svelte_rules.ml # Svelte 4→5 + framework confusion (40+)
│ │ │ │ └── ocaml_rules.ml # OCaml hallucinations + unsafe ops (55+)
│ │ │ ├── catseye_claws/ # Code smell detection (AST-native, 16 detectors)
│ │ │ ├── catseye_crowsnest/ # Supply chain audit
│ │ │ ├── catseye_rules/ # KDL rule interpreter (arg, $var, fix templates)
│ │ │ ├── catseye_cli/ # CLI, orchestrator, output formats
│ │ │ └── catseye_types/ # Shared types
│ │ └── rules/ # KDL rule files
│ │ ├── crystal/*.kdl # Crystal security rules
│ │ ├── javascript.kdl # JS/TS security rules
│ │ └── gleam/*.kdl # Gleam security rules
│ └── extractor/extractor.cr # Crystal AST extractor
├── test/samples/ # Test corpus (Crystal, JS, Svelte)
├── flake.nix # Nix dev shell (all grammars)
└── justfile # Build tasks
```
## Performance
| Scan | Files | Extraction | Analysis |
| ------------------------- | ---------------------------- | ---------- | -------- |
| Crystal only (72 files) | 72 | ~0.12s | ~0.06s |
| Multi-language (89 files) | 72 Crystal + 17 JS/TS/Svelte | ~0.25s | ~6s |
| OCaml self-scan | 84 | ~0.19s | ~0.15s |
| Gleam project (144 files) | 115 Gleam + 29 TS/JS | ~0.73s | ~0.14s |
**CFG engine** scales linearly: 500 sequential branches in 0.09ms, 10,000 nodes in 2.4ms, 500-block taint analysis in 0.75ms.
## License
MIT