{"id":17978496,"url":"https://github.com/ad-si/textalyzer","last_synced_at":"2025-04-05T21:05:53.090Z","repository":{"id":27673478,"uuid":"31159518","full_name":"ad-si/Textalyzer","owner":"ad-si","description":"Analyze key metrics like number of words, readability, complexity, code duplication, … of any kind of text","archived":false,"fork":false,"pushed_at":"2025-03-11T20:44:11.000Z","size":860,"stargazers_count":57,"open_issues_count":9,"forks_count":4,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-29T20:08:06.241Z","etag":null,"topics":["analysis","cli","code-duplication","complexity","readability","text"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ad-si.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["ad-si"],"custom":["https://www.paypal.me/adriansieber"]}},"created_at":"2015-02-22T10:25:14.000Z","updated_at":"2025-03-11T20:38:29.000Z","dependencies_parsed_at":"2024-03-12T19:52:02.660Z","dependency_job_id":null,"html_url":"https://github.com/ad-si/Textalyzer","commit_stats":{"total_commits":30,"total_committers":2,"mean_commits":15.0,"dds":"0.033333333333333326","last_synced_commit":"758ccca936473c35c791ecd862a8b61763320c3b"},"previous_names":["adius/textalyzer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ad-si%2FTextalyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ad-si%2FTextalyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ad-si%2FTextalyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ad-si%2FTextalyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ad-si","download_url":"https://codeload.github.com/ad-si/Textalyzer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399871,"owners_count":20932876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","cli","code-duplication","complexity","readability","text"],"created_at":"2024-10-29T17:34:01.449Z","updated_at":"2025-04-05T21:05:53.051Z","avatar_url":"https://github.com/ad-si.png","language":"Rust","readme":"# Textalyzer\n\nAnalyze key metrics like number of words, readability, complexity, etc.\nof any kind of text.\n\nCLI | Web\n--- | ---\n![CLI Screenshot][cli_ss] | ![Web Screenshot][web_ss]\n\n[cli_ss]: ./images/2024-03-08t1219_cli_screenshot.png\n[web_ss]: ./images/2024-03-08t1213_web_screenshot.png\n\n\n## Usage\n\n```sh\n# Word frequency histogram\ntextalyzer histogram \u003cfilepath\u003e\n\n# Find duplicated code blocks (default: minimum 3 non-empty lines)\ntextalyzer duplication \u003cpath\u003e [\u003cadditional paths...\u003e]\n\n# Find duplications with at least 5 non-empty lines\ntextalyzer duplication --min-lines=5 \u003cpath\u003e [\u003cadditional paths...\u003e]\n\n# Include single-line duplications\ntextalyzer duplication --min-lines=1 \u003cpath\u003e [\u003cadditional paths...\u003e]\n```\n\nThe duplication command analyzes files for duplicated text blocks. It can:\n- Analyze multiple files or recursively scan directories\n- Filter duplications based on minimum number of non-empty lines with `--min-lines=N` (default: 2)\n- Detect single-line duplications when using `--min-lines=1`\n- Rank duplications by number of consecutive lines\n- Show all occurrences with file and line references\n- Utilize multithreaded processing for optimal performance on all available CPU cores\n- Use memory mapping for efficient processing of large files with minimal memory overhead\n\n\n## Related\n\n- [jscpd] - Copy/paste detector for programming source code.\n- [megalinter] - Code quality and linter tool.\n- [pmd] - Source code analysis tool.\n- [qlty] - Code quality and security analysis tool.\n- [superdiff] - Find duplicate code blocks in files.\n- [wf] - Command line utility for counting word frequency.\n\n[jscpd]: https://github.com/kucherenko/jscpd\n[megalinter]: https://megalinter.io\n[pmd]: https://github.com/pmd/pmd\n[qlty]: https://github.com/qltysh/qlty\n[superdiff]: https://github.com/chuck-sys/superdiff\n[wf]: https://github.com/jarcane/wf\n\n\n## Rewrite in Rust\n\nThis CLI tool was originally written in JavaScript and was later\nrewritten in Rust to improve the performance.\n\nBefore:\n\n```txt\nhyperfine --warmup 3 'time ./cli/index.js examples/1984.txt'\nBenchmark #1: time ./cli/index.js examples/1984.txt\n  Time (mean ± σ):     390.3 ms ±  15.6 ms    [User: 402.6 ms, System: 63.5 ms]\n  Range (min … max):   366.7 ms … 425.7 ms\n```\n\nAfter:\n\n```txt\nhyperfine --warmup 3 'textalyzer histogram examples/1984.txt'\nBenchmark #1: textalyzer histogram examples/1984.txt\n  Time (mean ± σ):      40.4 ms ±   2.5 ms    [User: 36.0 ms, System: 2.7 ms]\n  Range (min … max):    36.9 ms …  48.7 ms\n```\n\nPretty impressive 10x performance improvement! 😁\n","funding_links":["https://github.com/sponsors/ad-si","https://www.paypal.me/adriansieber"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fad-si%2Ftextalyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fad-si%2Ftextalyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fad-si%2Ftextalyzer/lists"}