{"id":49809828,"url":"https://github.com/chrstphdm/extstat","last_synced_at":"2026-05-13T00:02:38.492Z","repository":{"id":325665985,"uuid":"1101985914","full_name":"chrstphdm/extstat","owner":"chrstphdm","description":"Fast disk usage analyzer that groups files by extension. Parallel scanning, beautiful terminal output with visual bars, and detailed statistics. Perfect for finding what's taking up space on your drives.","archived":false,"fork":false,"pushed_at":"2025-11-22T16:12:01.000Z","size":20,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-22T18:06:50.611Z","etag":null,"topics":["bioinformatics","cli","disk-usage","disk-usage-analyzer","file-management","parallel","performance","rust","terminal"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chrstphdm.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-22T15:56:31.000Z","updated_at":"2025-11-22T16:12:05.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/chrstphdm/extstat","commit_stats":null,"previous_names":["chrstphdm/extstat"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/chrstphdm/extstat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrstphdm%2Fextstat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrstphdm%2Fextstat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrstphdm%2Fextstat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrstphdm%2Fextstat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chrstphdm","download_url":"https://codeload.github.com/chrstphdm/extstat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrstphdm%2Fextstat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32961785,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-12T23:30:32.555Z","status":"ssl_error","status_checked_at":"2026-05-12T23:30:18.191Z","response_time":102,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","cli","disk-usage","disk-usage-analyzer","file-management","parallel","performance","rust","terminal"],"created_at":"2026-05-13T00:02:36.303Z","updated_at":"2026-05-13T00:02:38.483Z","avatar_url":"https://github.com/chrstphdm.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# extstat\n\n[![CI](https://github.com/chrstphdm/extstat/workflows/CI/badge.svg)](https://github.com/chrstphdm/extstat/actions)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Rust](https://img.shields.io/badge/rust-1.70%2B-orange.svg)](https://www.rust-lang.org/)\n\n⚡ Fast disk usage analyzer that groups files by extension. Perfect for finding what's taking up space on your drives.\n\nA parallel, high-performance CLI tool that scans directories and displays disk usage statistics grouped by file extension, with beautiful terminal output.\n\n## Features\n\n✅ **Parallel scanning** - Uses all CPU cores for maximum speed\n✅ **Beautiful table output** - Color-coded results with visual bars\n✅ **Flexible filtering** - Minimum file size, top N extensions\n✅ **File count tracking** - See how many files per extension\n✅ **No dependencies** - Single binary, works everywhere\n\n## Installation\n\n### From source (recommended for now)\n\n```bash\n# Clone or copy the project\ncd extstat\n\n# Build release version (optimized)\ncargo build --release\n\n# Binary will be in target/release/extstat\n# Copy to your PATH\nsudo cp target/release/extstat /usr/local/bin/\n```\n\n## Usage\n\n### Basic usage\n\n```bash\n# Analyze current directory\nextstat\n\n# Analyze specific directory\nextstat /path/to/directory\n\n# Show file counts\nextstat -c\n\n# Filter small files (e.g., min 1MB)\nextstat -s 1048576\n\n# Show only top 20 extensions\nextstat -n 20\n\n# Combine options\nextstat /data -c -s 1000000 -n 10\n```\n\n### Examples\n\n```bash\n# Analyze your home directory\nextstat ~\n\n# Find what's taking space in /var\nextstat /var -n 15\n\n# Show detailed stats for current project\nextstat . -c\n```\n\n## Command Line Options\n\n```\nOptions:\n  \u003cPATH\u003e              Directory to analyze [default: .]\n  -s, --min-size      Minimum file size to include (in bytes) [default: 0]\n  -n, --top           Maximum number of extensions to display [default: 50]\n  -c, --show-count    Show file count\n  -h, --help          Print help\n  -V, --version       Print version\n```\n\n## Output Explanation\n\n```\n╭────────────┬──────────┬─────────┬──────────────────────────────────╮\n│ Extension  │ Size     │ % Total │ Visual                           │\n├────────────┼──────────┼─────────┼──────────────────────────────────┤\n│ .fastq     │ 2.5 GiB  │ 45.23%  │ ██████████████░░░░░░░░░░░░░░░░░░ │\n│ .bam       │ 1.2 GiB  │ 21.67%  │ ███████░░░░░░░░░░░░░░░░░░░░░░░░░ │\n│ .fasta     │ 567 MiB  │ 10.11%  │ ███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │\n╰────────────┴──────────┴─────────┴──────────────────────────────────╯\n```\n\n- **Extension**: File extension (or [no extension] for files without one)\n- **Size**: Total size for all files with this extension (human-readable)\n- **% Total**: Percentage of total scanned space\n- **Visual**: Bar chart representation\n\n## Performance\n\n- Parallel scanning using Rayon (uses all CPU cores)\n- Typical performance: ~500k files/second on modern SSD\n- Memory efficient: doesn't load file contents, only metadata\n\n## Development\n\n### Project Structure\n\n```\nextstat/\n├── Cargo.toml       # Rust dependencies\n├── src/\n│   └── main.rs      # Main application code\n└── README.md        # This file\n```\n\n### Building for development\n\n```bash\n# Build debug version (faster compilation)\ncargo build\n\n# Run directly\ncargo run -- /path/to/scan\n\n# Run with options\ncargo run -- . -c -n 10\n```\n\n### Understanding the code\n\n**Key Rust concepts used:**\n\n1. **Parallel iteration with Rayon**: \n   ```rust\n   files.par_iter()  // Process files in parallel\n   ```\n\n2. **Result handling with `?`**:\n   ```rust\n   let metadata = entry.metadata().ok()?;  // Return None if error\n   ```\n\n3. **Pattern matching**:\n   ```rust\n   path.extension()\n       .and_then(|s| s.to_str())  // Chain operations safely\n   ```\n\n4. **HashMap aggregation**:\n   ```rust\n   let entry = acc.entry(ext).or_insert((0, 0));  // Get or create\n   entry.0 += size;  // Update tuple\n   ```\n\n### Adding features\n\nWant to add more features? Common additions:\n\n1. **JSON export**: Add `serde` and `serde_json` dependencies\n2. **Interactive TUI**: Add `ratatui` and `crossterm`\n3. **Progress bar**: Add `indicatif` dependency\n4. **Date filtering**: Use file metadata `modified()` time\n\n## Troubleshooting\n\n**Permission denied errors**: \n- Use `sudo` for system directories\n- Or skip inaccessible files (feature coming soon)\n\n**Slow on network drives**:\n- Network I/O is the bottleneck, not the tool\n- Consider scanning locally first\n\n**Out of memory**:\n- Only happens with millions of different extensions\n- Try filtering with `-s` to reduce file count\n\n## Why Rust?\n\n- **Speed**: As fast as C/C++, often faster than Go/Python\n- **Safety**: No segfaults, data races prevented at compile time\n- **Modern**: Great tooling (cargo), helpful compiler errors\n- **Dependencies**: Easy to manage, reproducible builds\n\n## Next Steps (Version 2)\n\nPlanned features:\n- [ ] Interactive TUI mode (like ncdu)\n- [ ] Drill-down: click extension → see directories\n- [ ] Export to JSON/CSV\n- [ ] Progress bar during scan\n- [ ] Filter by date modified\n- [ ] Compare two scans (before/after cleanup)\n\n## Contributing\n\nContributions are welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for a list of changes in each release.\n\n## License\n\nMIT License - Feel free to use, modify, distribute\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrstphdm%2Fextstat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchrstphdm%2Fextstat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrstphdm%2Fextstat/lists"}