{"id":25864902,"url":"https://github.com/stackloklabs/secret-scanning-api","last_synced_at":"2026-05-09T12:34:47.134Z","repository":{"id":261312001,"uuid":"883735150","full_name":"StacklokLabs/secret-scanning-api","owner":"StacklokLabs","description":"Simple high performance pattern / entropy based secret discovery ","archived":false,"fork":false,"pushed_at":"2024-11-13T22:31:28.000Z","size":34,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-02T01:33:07.683Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StacklokLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-05T13:37:31.000Z","updated_at":"2024-11-22T21:52:53.000Z","dependencies_parsed_at":"2025-03-02T01:31:09.337Z","dependency_job_id":"b5d648e2-4c6a-4d17-a05d-b49f4d19b7ad","html_url":"https://github.com/StacklokLabs/secret-scanning-api","commit_stats":null,"previous_names":["stackloklabs/secret-scanning-api"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/StacklokLabs/secret-scanning-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StacklokLabs%2Fsecret-scanning-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StacklokLabs%2Fsecret-scanning-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StacklokLabs%2Fsecret-scanning-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StacklokLabs%2Fsecret-scanning-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StacklokLabs","download_url":"https://codeload.github.com/StacklokLabs/secret-scanning-api/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StacklokLabs%2Fsecret-scanning-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32819552,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T08:22:46.396Z","status":"online","status_checked_at":"2026-05-09T02:00:06.633Z","response_time":123,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-02T01:31:05.553Z","updated_at":"2026-05-09T12:34:47.116Z","avatar_url":"https://github.com/StacklokLabs.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Secret Scanning AI\n\nA high-performance Go library for detecting secrets, passwords, and API tokens in text content. Uses both pattern matching and entropy-based detection for accurate results.\n\n## Worker System\n\nThe scanner uses a worker pool pattern for parallel processing of large texts. This system provides:\n- Concurrent pattern matching\n- Controlled resource usage\n- Optimal CPU utilization\n- Configurable parallelism\n\n### How Workers Function\n\n1. **Text Chunking**:\n   - Large texts are automatically split into chunks (default 10KB each)\n   - Each chunk maintains its original position information\n   ```go\n   // Internal chunking mechanism\n   type chunk struct {\n       text   string\n       offset int\n   }\n   ```\n\n2. **Worker Pool**:\n   - Workers are implemented using goroutines and a semaphore pattern\n   - Each worker processes one chunk at a time\n   - Results are collected through a channel\n   ```go\n   // Example of worker pool configuration\n   scanner := scanner.New(scanner.WithWorkers(runtime.NumCPU()))\n   ```\n\n3. **Load Balancing**:\n   - Chunks are distributed automatically among workers\n   - Semaphore prevents worker overflow\n   - Workers process chunks concurrently until all are complete\n\n### Configuring Workers\n\n1. **Default Configuration**:\n   ```go\n   // Creates scanner with default 4 workers\n   scanner := scanner.New()\n   ```\n\n2. **Custom Worker Count**:\n   ```go\n   // Creates scanner with 8 workers\n   scanner := scanner.New(scanner.WithWorkers(8))\n   ```\n\n3. **CPU-Based Configuration**:\n   ```go\n   // Creates scanner with worker count matching CPU cores\n   scanner := scanner.New(scanner.WithWorkers(runtime.NumCPU()))\n   ```\n\n### Worker Performance Guidelines\n\n1. **Small Files** (\u003c 10KB):\n   - Single worker is sufficient\n   - Overhead of multiple workers not beneficial\n   ```go\n   scanner := scanner.New(scanner.WithWorkers(1))\n   ```\n\n2. **Medium Files** (10KB - 1MB):\n   - 4-8 workers typically optimal\n   - Balance between parallelism and overhead\n   ```go\n   scanner := scanner.New(scanner.WithWorkers(4))\n   ```\n\n3. **Large Files** (\u003e 1MB):\n   - Worker count can match or exceed CPU cores\n   - Benefits from increased parallelism\n   ```go\n   // For large file processing\n   scanner := scanner.New(scanner.WithWorkers(runtime.NumCPU() * 2))\n   ```\n\n### Example: Worker Configuration\n\n```go\npackage main\n\nimport (\n    \"context\"\n    \"runtime\"\n    \"github.com/stackloklabs/secret-scanning-ai/scanner\"\n)\n\nfunc main() {\n    // Create scanner with CPU-optimized workers\n    s := scanner.New(scanner.WithWorkers(runtime.NumCPU()))\n\n    // Create context with timeout\n    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)\n    defer cancel()\n\n    // Process large file\n    results, err := s.Scan(ctx, largeText)\n    if err != nil {\n        panic(err)\n    }\n\n    // Results are automatically merged from all workers\n    for _, result := range results {\n        // Process results...\n    }\n}\n```\n\n### Benchmark Results with Different Worker Counts\n\n```\nBenchmarkScanner/small/1_workers    ~54ns/op    0 B/op    0 allocs/op\nBenchmarkScanner/small/4_workers    ~54ns/op    0 B/op    0 allocs/op\nBenchmarkScanner/small/8_workers    ~56ns/op    0 B/op    0 allocs/op\nBenchmarkScanner/small/16_workers   ~54ns/op    0 B/op    0 allocs/op\n\nBenchmarkScanner/medium/1_workers   ~3.3µs/op   2 B/op    0 allocs/op\nBenchmarkScanner/medium/4_workers   ~3.3µs/op   2 B/op    0 allocs/op\nBenchmarkScanner/medium/8_workers   ~3.3µs/op   3 B/op    0 allocs/op\nBenchmarkScanner/medium/16_workers  ~3.4µs/op   3 B/op    0 allocs/op\n\nBenchmarkScanner/large/1_workers    ~35µs/op    366 B/op  0 allocs/op\nBenchmarkScanner/large/4_workers    ~34µs/op    371 B/op  0 allocs/op\nBenchmarkScanner/large/8_workers    ~35µs/op    377 B/op  0 allocs/op\nBenchmarkScanner/large/16_workers   ~34µs/op    423 B/op  0 allocs/op\n```\n\n4. **Streaming Operations**:\n   ```go\n   func (s *Scanner) StreamScan(ctx context.Context, reader io.Reader) (\u003c-chan Result, error) {\n       resultsChan := make(chan Result)\n       go func() {\n           defer close(resultsChan)\n           scanner := bufio.NewScanner(reader)\n           for scanner.Scan() {\n               select {\n               case \u003c-ctx.Done():\n                   return\n               default:\n                   // Process line and send results\n               }\n           }\n       }()\n       return resultsChan, nil\n   }\n   ```\n\nThis multi-level context awareness ensures:\n- Immediate response to cancellation requests\n- Proper resource cleanup\n- Prevention of goroutine leaks\n- Coordinated cancellation of related operations\n\n## Performance\n\nThe scanner is optimized for high performance across different workloads:\n\n### Parallel Processing\n- Small files (~1KB): ~54ns per operation\n- Medium files (~100KB): ~3.3µs per operation\n- Large files (~1MB): ~34µs per operation\n- Configurable worker pool (default: 4 workers)\n- Near-zero memory allocations for cached results\n\n### Memory Efficiency\n- Parallel processing: 0-423 bytes/op\n- Streaming mode for handling large files\n- Efficient memory usage through chunked processing\n\n### Caching\n- Cached lookups: ~3.3µs/op with zero allocations\n- Thread-safe cache implementation\n- Automatic caching of frequently scanned content\n\n## Installation\n\n```bash\ngo get github.com/stackloklabs/secret-scanning-api\n```\n\n## Usage\n\n### As a Library\n\n```go\npackage main\n\nimport (\n    \"context\"\n    \"fmt\"\n    \"github.com/stackloklabs/secret-scanning-api/scanner\"\n    \"github.com/stackloklabs/secret-scanning-api/patterns\"\n)\n\nfunc main() {\n    // Initialize scanner with custom worker count\n    s := scanner.New(scanner.WithWorkers(8))\n\n    // Add default patterns\n    for name, pattern := range patterns.GetAllPatterns() {\n        s.AddPattern(name, pattern)\n    }\n\n    // Create context with timeout\n    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)\n    defer cancel()\n\n    // Scan text\n    text := \"config.password = 'MySecretPass123!'\"\n    results, err := s.Scan(ctx, text)\n    if err != nil {\n        panic(err)\n    }\n\n    // Process results\n    for _, result := range results {\n        fmt.Printf(\"Found %s: %s (Confidence: %.2f)\\n\",\n            result.Type,\n            result.Value,\n            result.Confidence)\n    }\n}\n```\n\n### Command Line Usage\n\n```bash\n# Scan a file\nsecret-scanner -file config.json\n\n# Scan text directly\nsecret-scanner -text \"api_key=1234567890abcdef\"\n\n# Scan from stdin\ncat config.json | secret-scanner\n\n# Use only entropy-based detection\nsecret-scanner -entropy-only -file config.json\n```\n\n## Contributing\n\nContributions are welcome! Areas for improvement:\n\n1. Additional secret patterns\n2. Performance optimizations\n3. Integration examples\n4. Documentation improvements\n\n## License\n\nApache 2.0\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstackloklabs%2Fsecret-scanning-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstackloklabs%2Fsecret-scanning-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstackloklabs%2Fsecret-scanning-api/lists"}