{"id":46937950,"url":"https://github.com/closed-systems/strangerstrings","last_synced_at":"2026-03-11T06:04:30.146Z","repository":{"id":312672767,"uuid":"1048273804","full_name":"closed-systems/strangerstrings","owner":"closed-systems","description":"A little tool to filter the stranger strings from a binary so you can analyze the good ones","archived":false,"fork":false,"pushed_at":"2025-09-11T08:41:30.000Z","size":362,"stargazers_count":45,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-12T11:57:20.834Z","etag":null,"topics":["malware-analysis","reverse-engineering","security-tools","strings","stringsearch"],"latest_commit_sha":null,"homepage":"https://closed.systems","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/closed-systems.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-01T07:39:10.000Z","updated_at":"2025-10-04T13:59:35.000Z","dependencies_parsed_at":"2025-09-05T23:48:32.899Z","dependency_job_id":null,"html_url":"https://github.com/closed-systems/strangerstrings","commit_stats":null,"previous_names":["closed-systems/strangerstrings","closed-systems/stranger-strings"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/closed-systems/strangerstrings","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/closed-systems%2Fstrangerstrings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/closed-systems%2Fstrangerstrings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/closed-systems%2Fstrangerstrings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/closed-systems%2Fstrangerstrings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/closed-systems","download_url":"https://codeload.github.com/closed-systems/strangerstrings/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/closed-systems%2Fstrangerstrings/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30372567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T21:41:54.280Z","status":"online","status_checked_at":"2026-03-11T02:00:07.027Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["malware-analysis","reverse-engineering","security-tools","strings","stringsearch"],"created_at":"2026-03-11T06:04:10.014Z","updated_at":"2026-03-11T06:04:30.139Z","avatar_url":"https://github.com/closed-systems.png","language":"TypeScript","readme":"# Stranger Strings\n![Stranger Strings](strangerstrings.png)\n\n\nA TypeScript module for extracting human-readable strings from binary files and determining which are most likely to be useful for human analysts. This implementation is compatible with the Ghidra string analysis algorithm and uses trigram-based scoring to filter out random character sequences while preserving meaningful strings.\n\n## *Update*\nSee https://github.com/closed-systems/stranger-strings-rs for a Rust re-write, featuring enhanced speed, non-English detection and all the common encodings you'd likely see.\n\n## Features\n\n- **Trigram-based scoring**: Uses character trigram probabilities to score string quality\n- **Ghidra compatibility**: Compatible with Ghidra's .sng model files and scoring algorithm.  \n- **Binary analysis**: Extract and analyze strings directly from binary files\n- **Adaptive thresholds**: Length-based scoring thresholds (shorter strings need higher scores)\n- **ASCII normalization**: Handles non-ASCII characters and space normalization\n- **TypeScript**: Full type safety and modern ES features, but I'll likely cut it over to Rust\n- **No Runtime Dependencies**: We don't drag in the kitchen sink, just the Typescript basics and a testing library.\n\n## Effectiveness\nI'm not going over 9600 failed strings to find FN, but spot checking was all strangely perfect.\n\n```\n$ strings ./tasmota-UK.bin|wc -l\n   12695\n\n$ strangerstrings -v ./tasmota-UK.bin |head -n 20\nLoading model: ./StringModel.sng\nModel type: lowercase, Lowercase: true\nAnalyzing file: ./tasmota-UK.bin\nExtracted 11011 candidate strings (min length: 4)\nString              Score       Threshold   Offset    Valid\n----------------------------------------------------------------------\n\" tER\"              -2.026      10.000      0x69DD5   ✗\n\"AND \"              -2.274      10.000      0x96982   ✗\n\"tele\"              -2.386      -2.710      0x97BD2   ✓\n\"none\"              -2.395      -2.710      0x9A02B   ✓\n\"none\"              -2.395      -2.710      0xA4CF8   ✓\n\" GET\"              -2.398      10.000      0x98116   ✗\n\"STATE\"             -2.415      -3.260      0x97B80   ✓\n\"  tRa\"             -2.420      10.000      0x49A16   ✗\n\"Action\"            -2.423      -3.520      0x96DCF   ✓\n\"user\"              -2.443      -2.710      0x9830B   ✓\n\"scan\"              -2.453      -2.710      0x99310   ✓\n\"center\"            -2.472      -3.520      0x97A76   ✓\n\"POST\"              -2.498      -2.710      0xA5186   ✓\n\"       aRA\"              -2.505      10.000      0x69849   ✗\n\"Done\"              -2.506      -2.710      0x9B8AB   ✓\n\"stat\"              -2.518      -2.710      0x97BD7   ✓\n\"BASE\"              -2.524      -2.710      0x9BC91   ✓\n\"Mode\"              -2.542      -2.710      0x95F91   ✓\n\nSummary:\n  Accepted: 1375 strings\n  Rejected: 9636 strings\n  Total: 11011 strings\n  Acceptance rate: 12.5%\n```\n\n## Just use it as a CLI app\n\n```bash\nnpx closed-systems/strangerstrings --help\n```\n\n```bash\nnpm install -g closed-systems/strangerstrings\nstrangerstrings --help\n```\n\n## Installation in a project\n\n```bash\nnpm install closed-systems/strangerstrings\n```\n\nor with pnpm:\n\n```bash\npnpm add closed-systems/strangerstrings\n```\n\n## Quick Start\n\n```typescript\nimport { StrangerStrings } from 'strangerstrings';\nimport * as fs from 'fs';\n\n// Initialize analyzer with model\nconst analyzer = new StrangerStrings();\nawait analyzer.loadModel({ modelPath: './StringModel.sng' });\n\n// Analyze individual strings\nconst result = analyzer.analyzeString('hello world');\nconsole.log(`Valid: ${result.isValid}, Score: ${result.score}`);\n\n// Analyze binary file\nconst binaryData = fs.readFileSync('./program.exe');\nconst validStrings = analyzer.analyzeBinaryFile(binaryData);\nconsole.log(`Found ${validStrings.length} valid strings`);\n```\n\n## API Reference\n\n### StrangerStrings Class\n\n#### Constructor\n```typescript\nconst analyzer = new StrangerStrings();\n```\n\n#### loadModel(options)\nLoad a trigram model from file or string content.\n\n```typescript\n// From file\nawait analyzer.loadModel({ modelPath: './StringModel.sng' });\n\n// From string content  \nawait analyzer.loadModel({ modelContent: modelFileContent });\n```\n\n#### analyzeString(candidateString)\nAnalyze a single string and return detailed scoring information.\n\n```typescript\nconst result = analyzer.analyzeString('hello world');\n// Returns: StringAnalysisResult\n// {\n//   originalString: 'hello world',\n//   score: -4.123,\n//   threshold: -5.42,\n//   isValid: true,\n//   normalizedString: 'hello world'\n// }\n```\n\n#### analyzeStrings(candidateStrings)\nAnalyze multiple strings at once.\n\n```typescript\nconst results = analyzer.analyzeStrings(['hello', 'world', 'xZ#@$%']);\n```\n\n#### extractValidStrings(candidateStrings)\nGet only the valid strings from a list of candidates.\n\n```typescript\nconst validOnly = analyzer.extractValidStrings(['hello', 'world', 'xZ#@$%']);\n// Returns only strings that pass the scoring threshold\n```\n\n#### analyzeBinaryFile(buffer, options?)\nExtract and analyze strings from binary data.\n\n```typescript\nconst binaryData = fs.readFileSync('./program.exe');\nconst validStrings = analyzer.analyzeBinaryFile(binaryData, { \n  minLength: 6  // minimum string length to extract\n});\n```\n\n#### extractStringsFromBinary(buffer, minLength?)\nExtract raw strings from binary data without scoring.\n\n```typescript\nconst strings = analyzer.extractStringsFromBinary(binaryData, 4);\n```\n\n### Convenience Functions\n\n```typescript\nimport { analyzeStringsWithModel, analyzeBinaryWithModel } from 'strangerstrings';\n\n// Quick analysis without creating class instance\nconst results = await analyzeStringsWithModel(\n  ['string1', 'string2'], \n  './StringModel.sng'\n);\n\nconst validStrings = await analyzeBinaryWithModel(\n  binaryBuffer, \n  './StringModel.sng'\n);\n```\n\n## Model Files\n\nThe module uses `.sng` model files containing trigram frequency data. These are tab-delimited text files with the format:\n\n```\n# Model Type: lowercase\n# Training file: words.txt\n# [^] denotes beginning of string  \n# [$] denotes end of string\n# [SP] denotes space\n\nchar1\tchar2\tchar3\tcount\n[^]\th\te\t1234\nh\te\tl\t5678\nl\tl\to\t9012\no\t[$]\t[$]\t3456\n```\n\n## Algorithm Details\n\nThe scoring algorithm works as follows:\n\n1. **Normalization**: Convert to lowercase (if using lowercase model), trim spaces, replace non-ASCII with spaces\n2. **Trigram Analysis**: Calculate probability for each 3-character sequence\n3. **Scoring**: Sum log probabilities and divide by string length  \n4. **Thresholding**: Compare against length-based thresholds\n5. **Smoothing**: Apply Laplace smoothing for unseen trigrams\n\n### Scoring Formula\n```\nscore = (Σ log10(P(trigram))) / string_length\n```\n\n### Length-based Thresholds\n- Length 4: -2.71\n- Length 5: -3.26  \n- Length 10: -4.55\n- Length 50+: -6.13\n- Length 100+: -6.3\n\nStrings shorter than 4 characters receive a default score of -20 and thresholds impossible to pass.\n\n## Examples\n\nSee the `examples/` directory for complete usage examples:\n\n- `basic-usage.ts` - Basic string analysis\n- `binary-analysis.ts` - Binary file processing\n\n## Testing\n\n```bash\n# Run tests\npnpm test\n\n# Run tests with UI\npnpm test:ui\n\n# Run tests once\npnpm test:run\n```\n\n## Compatibility\n\nThis implementation produces scoring results compatible with the Java implementation from Ghidra. The algorithm uses:\n\n- Base-10 logarithms (`Math.log10`)\n- Identical smoothing and probability calculations  \n- Same threshold values and length-based scoring\n- Compatible .sng model file format\n- Needs some proper kicking but spot checking seems right\n\n\n## Future Ideas\n\n\n- I could nitpick the corpus that the Ghidra team used to make the dataset but I can't think of anything better with any baseline of truth so I stick with theirs. As noted in https://github.com/NationalSecurityAgency/ghidra/issues/2106 it's a bit tricky to adapt or extend their model but with a good idea of finding the wheat and chaffe for a training set a good multilingual approach could likely be found (I'm looking at you stats nerds).\n- it could benefit in a re-write in Rust\n- Multilanguage is definitely a gap, as is base64.\n- If it was going to be really fancy it could examine the binary to determine the base and look for pointers as string offsets (for languages without C strings)\n\n## License\n\nLets say Apache License 2.0, as thats what StringModel.sng is under, most likely \n\n## Contributing\n\nContributions welcome! Please ensure tests pass and new features include appropriate test coverage.\n\nClaude helped with this implementation but if you're going to submit LLM aided code you better understand it lest you be mocked.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclosed-systems%2Fstrangerstrings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclosed-systems%2Fstrangerstrings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclosed-systems%2Fstrangerstrings/lists"}