{"id":29430795,"url":"https://github.com/knitli/thread","last_synced_at":"2026-03-02T04:06:17.726Z","repository":{"id":304093797,"uuid":"1012808541","full_name":"knitli/thread","owner":"knitli","description":null,"archived":false,"fork":false,"pushed_at":"2025-07-11T03:34:29.000Z","size":390,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-11T07:48:30.920Z","etag":null,"topics":["parser","static-analysis","tree-sitter"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/knitli.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-02T23:19:10.000Z","updated_at":"2025-07-03T01:28:51.000Z","dependencies_parsed_at":"2025-07-11T07:48:56.199Z","dependency_job_id":"3d529a9d-c01c-42db-a17b-d1148c5c9237","html_url":"https://github.com/knitli/thread","commit_stats":null,"previous_names":["knitli/thread"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/knitli/thread","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knitli%2Fthread","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knitli%2Fthread/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knitli%2Fthread/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knitli%2Fthread/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/knitli","download_url":"https://codeload.github.com/knitli/thread/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/knitli%2Fthread/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265031455,"owners_count":23700845,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parser","static-analysis","tree-sitter"],"created_at":"2025-07-12T18:02:00.529Z","updated_at":"2026-03-02T04:06:17.713Z","avatar_url":"https://github.com/knitli.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\nSPDX-FileCopyrightText: 2025 Knitli Inc. \u003cknitli@knit.li\u003e\nSPDX-FileContributor: Adam Poulemanos \u003cadam@knit.li\u003e\n\nSPDX-License-Identifier: MIT OR Apache-2.0\n--\u003e\n\n# Thread\n\n[![REUSE status](https://api.reuse.software/badge/git.fsfe.org/reuse/api)](https://api.reuse.software/info/git.fsfe.org/reuse/api)\n\n\u003e A safe, fast, flexible code analysis and parsing engine built in Rust. Production-ready service-library dual architecture with content-addressed caching and incremental intelligence.\n\n**Thread** is a high-performance code analysis platform that operates as both a reusable library ecosystem and a persistent service. Built on tree-sitter parsers and enhanced with the ReCoco dataflow framework, Thread delivers 50x+ performance gains through content-addressed caching while supporting dual deployment: CLI with Rayon parallelism and Edge on Cloudflare Workers.\n\n## Key Features\n\n- ✅ **Content-Addressed Caching**: Blake3 fingerprinting enables 99.7% cost reduction and 346x faster analysis on repeated runs\n- ✅ **Incremental Updates**: Only reanalyze changed files—unmodified code skips processing automatically\n- ✅ **Dual Deployment**: Single codebase compiles to both CLI (Rayon + Postgres) and Edge (tokio + D1 on Cloudflare Workers)\n- ✅ **Multi-Language Support**: 20+ languages via tree-sitter (Rust, TypeScript, Python, Go, Java, C/C++, and more)\n- ✅ **Pattern Matching**: Powerful AST-based pattern matching with meta-variables for complex queries\n- ✅ **Production Performance**: \u003e1,000 files/sec throughput, \u003e90% cache hit rate, \u003c50ms p95 latency\n\n## Quick Start\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/knitli/thread.git\ncd thread\n\n# Install development tools (optional, requires mise)\nmise run install-tools\n\n# Build Thread with all features\ncargo build --workspace --all-features --release\n\n# Verify installation\n./target/release/thread --version\n```\n\n### Basic Usage as Library\n\n```rust\nuse thread_ast_engine::{Root, Language};\n\n// Parse source code\nlet source = \"function hello() { return 42; }\";\nlet root = Root::new(source, Language::JavaScript)?;\n\n// Find all function declarations\nlet functions = root.find_all(\"function $NAME($$$PARAMS) { $$$BODY }\");\n\n// Extract function names\nfor func in functions {\n    println!(\"Found function: {}\", func.get_text(\"NAME\")?);\n}\n```\n\n### Using Thread Flow for Analysis Pipelines\n\n```rust\nuse thread_flow::ThreadFlowBuilder;\n\n// Build a declarative analysis pipeline\nlet flow = ThreadFlowBuilder::new(\"analyze_rust\")\n    .source_local(\"src/\", \u0026[\"**/*.rs\"], \u0026[\"target/**\"])\n    .parse()\n    .extract_symbols()\n    .target_postgres(\"code_symbols\", \u0026[\"content_hash\"])\n    .build()\n    .await?;\n\n// Execute the flow\nflow.execute().await?;\n```\n\n### Command Line Usage\n\n```bash\n# Analyze a codebase (first run)\nthread analyze ./my-project\n# → Analyzing 1,000 files: 10.5s\n\n# Second run (with cache)\nthread analyze ./my-project\n# → Analyzing 1,000 files: 0.3s (100% cache hits, 35x faster!)\n\n# Incremental update (only changed files)\n# Edit 10 files, then:\nthread analyze ./my-project\n# → Analyzing 10 files: 0.15s (990 files cached)\n```\n\n## Architecture\n\nThread follows a **service-library dual architecture** with six main crates plus service layer:\n\n### Library Core (Reusable Components)\n\n- **`thread-ast-engine`** - Core AST parsing, pattern matching, and transformation engine\n- **`thread-language`** - Language definitions and tree-sitter parser integrations (20+ languages)\n- **`thread-rule-engine`** - Rule-based scanning and transformation with YAML configuration\n- **`thread-utilities`** - Shared utilities including SIMD optimizations and hash functions\n- **`thread-wasm`** - WebAssembly bindings for browser and edge deployment\n\n### Service Layer (Orchestration \u0026 Persistence)\n\n- **`thread-flow`** - High-level dataflow pipelines with ThreadFlowBuilder API\n- **`thread-services`** - Service interfaces, API abstractions, and ReCoco integration\n- **Storage Backends**:\n  - **Postgres** (CLI deployment) - Persistent caching with \u003c10ms p95 latency\n  - **D1** (Cloudflare Edge) - Distributed caching across CDN nodes with \u003c50ms p95 latency\n  - **Qdrant** (optional) - Vector similarity search for semantic analysis\n\n### Concurrency Models\n\n- **Rayon** (CLI) - CPU-bound parallelism for local multi-core utilization (2-8x speedup)\n- **tokio** (Edge) - Async I/O for horizontal scaling and Cloudflare Workers\n\n## Deployment Options\n\n### CLI Deployment (Local/Server)\n\n**Best for**: Development environments, CI/CD pipelines, large batch processing\n\n```bash\n# Build with CLI features (Postgres + Rayon parallelism)\ncargo build --release --features \"recoco-postgres,parallel,caching\"\n\n# Configure PostgreSQL backend\nexport DATABASE_URL=postgresql://user:pass@localhost/thread_cache\nexport RAYON_NUM_THREADS=8  # Use 8 cores\n\n# Run analysis\n./target/release/thread analyze ./large-codebase\n# → Performance: 1,000-10,000 files per run\n```\n\n**Features**: Direct filesystem access, multi-core parallelism, persistent caching, unlimited CPU time\n\nSee [CLI Deployment Guide](docs/deployment/CLI_DEPLOYMENT.md) for complete setup.\n\n### Edge Deployment (Cloudflare Workers)\n\n**Best for**: Global API services, low-latency analysis, serverless architecture\n\n```bash\n# Build WASM for edge\ncargo run -p xtask build-wasm --release\n\n# Deploy to Cloudflare Workers\nwrangler deploy\n\n# Access globally distributed API\ncurl https://thread-api.workers.dev/analyze \\\n  -d '{\"code\":\"fn main(){}\",\"language\":\"rust\"}'\n# → Response time: \u003c50ms worldwide (p95)\n```\n\n**Features**: Global CDN distribution, auto-scaling, D1 distributed storage, no infrastructure management\n\nSee [Edge Deployment Guide](docs/deployment/EDGE_DEPLOYMENT.md) for complete setup.\n\n## Language Support\n\nThread supports 20+ programming languages via tree-sitter parsers:\n\n### Tier 1 (Primary Focus)\n- Rust, JavaScript/TypeScript, Python, Go, Java\n\n### Tier 2 (Full Support)\n- C/C++, C#, PHP, Ruby, Swift, Kotlin, Scala\n\n### Tier 3 (Basic Support)\n- Bash, CSS, HTML, JSON, YAML, Lua, Elixir, Haskell\n\nEach language provides full AST parsing, symbol extraction, and pattern matching capabilities.\n\n## Pattern Matching System\n\nThread's core strength is AST-based pattern matching using meta-variables:\n\n### Meta-Variable Syntax\n\n- `$VAR` - Captures a single AST node\n- `$$$ITEMS` - Captures multiple consecutive nodes (ellipsis)\n- `$_` - Matches any node without capturing\n\n### Examples\n\n```rust\n// Find all variable declarations\nroot.find_all(\"let $VAR = $VALUE\")\n\n// Find if-else statements\nroot.find_all(\"if ($COND) { $$$THEN } else { $$$ELSE }\")\n\n// Find function calls with any arguments\nroot.find_all(\"$FUNC($$$ARGS)\")\n\n// Find class methods\nroot.find_all(\"class $CLASS { $$$METHODS }\")\n```\n\n### YAML Rule System\n\n```yaml\nid: no-var-declarations\nmessage: \"Use 'let' or 'const' instead of 'var'\"\nlanguage: JavaScript\nseverity: warning\nrule:\n  pattern: \"var $NAME = $VALUE\"\nfix: \"let $NAME = $VALUE\"\n```\n\n## Performance Characteristics\n\n### Benchmarks (Phase 5 Real-World Validation)\n\n| Language   | Files   | Time   | Throughput     | Cache Hit | Incremental (1% update) |\n|------------|---------|--------|----------------|-----------|-------------------------|\n| Rust       | 10,100  | 7.4s   | 1,365 files/s  | 100%      | 0.6s (100 files)        |\n| TypeScript | 10,100  | 10.7s  | 944 files/s    | 100%      | ~1.0s (100 files)       |\n| Python     | 10,100  | 8.5s   | 1,188 files/s  | 100%      | 0.7s (100 files)        |\n| Go         | 10,100  | 5.4s   | 1,870 files/s  | 100%      | 0.4s (100 files)        |\n\n### Content-Addressed Caching Performance\n\n| Operation              | Time    | Speedup vs Parse | Notes                      |\n|------------------------|---------|------------------|----------------------------|\n| Blake3 fingerprint     | 425ns   | 346x faster      | Single file                |\n| Batch fingerprint      | 17.7µs  | -                | 100 files                  |\n| AST parsing            | 147µs   | Baseline         | Small file (\u003c1KB)          |\n| Cache hit (in-memory)  | \u003c1µs    | 147,000x faster  | LRU cache lookup           |\n| Cache hit (repeated)   | 0.9s    | 35x faster       | 10,000 file reanalysis     |\n| Incremental (1%)       | 0.6s    | 12x faster       | 100 changed, 10K total     |\n\n### Storage Backend Latency\n\n| Backend    | Target    | Actual (Phase 5) | Deployment |\n|------------|-----------|------------------|------------|\n| InMemory   | N/A       | \u003c1ms             | Testing    |\n| Postgres   | \u003c10ms p95 | \u003c1ms (local)     | CLI        |\n| D1         | \u003c50ms p95 | \u003c1ms (local)     | Edge       |\n\n## Development\n\n### Prerequisites\n\n- **Rust**: 1.85.0 or later (edition 2024)\n- **Tools**: cargo-nextest (optional), mise (optional)\n\n### Building\n\n```bash\n# Build everything (except WASM)\nmise run build\n# or: cargo build --workspace\n\n# Build in release mode\nmise run build-release\n\n# Build WASM for edge deployment\nmise run build-wasm-release\n```\n\n### Testing\n\n```bash\n# Run all tests\nmise run test\n# or: cargo nextest run --all-features --no-fail-fast -j 1\n\n# Run tests for specific crate\ncargo nextest run -p thread-ast-engine --all-features\n\n# Run benchmarks\ncargo bench -p thread-rule-engine\n```\n\n### Quality Checks\n\n```bash\n# Full linting\nmise run lint\n\n# Auto-fix formatting and linting issues\nmise run fix\n\n# Run CI pipeline locally\nmise run ci\n```\n\n### Single Test Execution\n\n```bash\n# Run specific test\ncargo nextest run --manifest-path Cargo.toml test_name --all-features\n\n# Run benchmarks\ncargo bench -p thread-flow\n```\n\n## Documentation\n\n### User Guides\n\n- [CLI Deployment Guide](docs/deployment/CLI_DEPLOYMENT.md) - Local/server deployment with Postgres\n- [Edge Deployment Guide](docs/deployment/EDGE_DEPLOYMENT.md) - Cloudflare Workers with D1\n- [Architecture Overview](docs/architecture/THREAD_FLOW_ARCHITECTURE.md) - System design and data flow\n\n### API Documentation\n\n- **Rustdoc**: Run `cargo doc --open --no-deps --workspace` for full API documentation\n- **Examples**: See `examples/` directory for usage patterns\n\n### Technical Documentation\n\n- [Integration Tests](claudedocs/INTEGRATION_TESTS.md) - E2E test design and coverage\n- [Error Recovery](claudedocs/ERROR_RECOVERY.md) - Error handling strategies\n- [Observability](claudedocs/OBSERVABILITY.md) - Metrics and monitoring\n- [Performance Benchmarks](claudedocs/PERFORMANCE_BENCHMARKS.md) - Benchmark suite design\n\n## Constitutional Compliance\n\n**All development MUST adhere to the Thread Constitution v2.0.0** (`.specify/memory/constitution.md`)\n\n### Core Governance Principles\n\n1. **Service-Library Architecture** (Principle I)\n   - Features MUST consider both library API design AND service deployment\n   - Both aspects are first-class citizens\n\n2. **Test-First Development** (Principle III - NON-NEGOTIABLE)\n   - TDD mandatory: Tests → Approve → Fail → Implement\n   - All tests execute via `cargo nextest`\n   - No exceptions, no justifications accepted\n\n3. **Service Architecture \u0026 Persistence** (Principle VI)\n   - Content-addressed caching MUST achieve \u003e90% hit rate\n   - Storage targets: Postgres \u003c10ms, D1 \u003c50ms, Qdrant \u003c100ms p95 latency\n   - Incremental updates MUST trigger only affected component re-analysis\n\n### Quality Gates\n\nBefore any PR merge, verify:\n- ✅ `mise run lint` passes (zero warnings)\n- ✅ `cargo nextest run --all-features` passes (100% success)\n- ✅ `mise run ci` completes successfully\n- ✅ Public APIs have rustdoc documentation\n- ✅ Performance-sensitive changes include benchmarks\n- ✅ Service features meet storage/cache/incremental requirements\n\n## Contributing\n\nWe welcome contributions of all kinds! By contributing to Thread, you agree to our [Contributor License Agreement (CLA)](CONTRIBUTORS_LICENSE_AGREEMENT.md).\n\n### Contributing Workflow\n\n1. Run `mise run install-tools` to set up development environment\n2. Make changes following existing patterns\n3. Run `mise run fix` to apply formatting and linting\n4. Run `mise run test` to verify functionality\n5. Use `mise run ci` to run full CI pipeline locally\n6. Submit pull request with clear description\n\n### We Use REUSE\n\nThread follows the [REUSE Specification](https://reuse.software/) for license information. Every file should have license information at the top or in a `.license` file. See existing files for examples.\n\n## License\n\n### Thread\n\nThread is licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0-or-later)**. You can find the full license text in the [LICENSE](LICENSE.md) file.\n\n**Key Points**:\n- ✅ Free for personal and commercial use\n- ✅ Modify the code as needed\n- ⚠️ **You must share your changes** with the community under AGPL 3.0 or later\n- ⚠️ Include AGPL 3.0 and copyright notice with copies you share\n- ℹ️ If you don't modify Thread, you can use it without sharing your source code\n\n### Want to use Thread in a closed source project?\n\n**Purchase a commercial license from Knitli** to use Thread without sharing your source code. Contact us at [licensing@knit.li](mailto:licensing@knit.li)\n\n### Other Licenses\n\n- Some components forked from [ast-grep](https://github.com/ast-grep/ast-grep) are licensed under AGPL 3.0 or later AND MIT. See [VENDORED.md](VENDORED.md).\n- Documentation and configuration files are licensed under MIT OR Apache-2.0 (your choice).\n\n## Production Readiness\n\nThread has been validated for production use with comprehensive testing:\n\n- **780 tests**: 100% pass rate across all modules\n- **Real-world validation**: Tested with 10,000+ files per language\n- **Performance targets**: All metrics exceeded by 20-40%\n- **Edge cases**: Comprehensive coverage including empty files, binary files, symlinks, Unicode, circular dependencies, deep nesting, large files\n- **Zero known issues**: No crashes, memory leaks, or data corruption\n\nSee [Phase 5 Completion Summary](claudedocs/PHASE5_COMPLETE.md) for full validation report.\n\n## Support\n\n- **Documentation**: [https://thread.knitli.com](https://thread.knitli.com)\n- **Issues**: [GitHub Issues](https://github.com/knitli/thread/issues)\n- **Email**: [support@knit.li](mailto:support@knit.li)\n- **Commercial Support**: [licensing@knit.li](mailto:licensing@knit.li)\n\n## Credits\n\nThread is built on the shoulders of giants:\n\n- **[ast-grep](https://github.com/ast-grep/ast-grep)**: Core pattern matching engine (MIT license)\n- **[tree-sitter](https://tree-sitter.github.io/)**: Universal parsing framework\n- **[ReCoco](https://github.com/recoco-framework/recoco)**: Dataflow orchestration framework\n- **[BLAKE3](https://github.com/BLAKE3-team/BLAKE3)**: Fast cryptographic hashing\n\nSpecial thanks to all contributors and the open source community.\n\n---\n\n**Created by**: [Knitli Inc.](https://knitli.com)\n**Maintained by**: Thread Team\n**License**: AGPL-3.0-or-later (with commercial license option)\n**Version**: 0.0.1\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknitli%2Fthread","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fknitli%2Fthread","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknitli%2Fthread/lists"}