Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mre/freq
🗼 A CLI term frequency analyzer. Counts the number of occurrences of each word in an input and creates formatted output or a histogram.
https://github.com/mre/freq
frequency histogram occurences words
Last synced: 2 months ago
JSON representation
🗼 A CLI term frequency analyzer. Counts the number of occurrences of each word in an input and creates formatted output or a histogram.
- Host: GitHub
- URL: https://github.com/mre/freq
- Owner: mre
- License: apache-2.0
- Created: 2021-03-04T13:48:28.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-03-15T14:52:16.000Z (almost 4 years ago)
- Last Synced: 2024-10-05T09:41:16.691Z (3 months ago)
- Topics: frequency, histogram, occurences, words
- Language: Rust
- Homepage:
- Size: 185 KB
- Stars: 3
- Watchers: 5
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE-APACHE
Awesome Lists containing this project
README
# freq
A commandline tool that counts the number of word occurrences in an input.
[![James Munns on Twitter](assets/tweet.png)](https://twitter.com/bitshiftmask/status/1367451210987544580)
This is just a placeholder repository for now.
Please create issues for feature request and collaboration.## Usage
### Commandline
```sh
echo "b a n a n a" | freq0.16666667 - 1 - b
0.33333334 - 2 - n
0.5 - 3 - a
```### Library
```rust
use std::error::Error;fn main() -> Result<(), Box> {
let frequencies = freq::count("fixtures/sample.txt")?;
println!("{:?}", frequencies);
Ok(())
}
```## Features
- [x] Ignore words ([regex pattern](https://docs.rs/regex/latest/regex/struct.RegexSet.html)) [[issue 5](https://github.com/mre/freq/issues/5)]
- [x] Different output formats (plaintext, JSON)
- [x] freq.toml configuration file
- [x] Filter stopwords (similar to NLTK's stopwords)
- [ ] Performance (SIMD support, async execution)
- [ ] Recursion support
- [ ] Allow skipping files
- [ ] Allow specifying ignored words in a separate file
- [ ] Generate "heat bars" for words like shell-hist does
- [ ] Split report by file/folder (sort of like `sloc` does for code)
- [ ] Choose language for stopwords (`--lang fr`)
- [ ] Format output (e.g. justify counts a la `uniq -c`)
- [ ] Interactive mode (shows stats while running) (`--interactive`)
- [ ] Calculate TF-IDF score in a multi-file scenario
- [ ] Limit the output to the top N words (e.g. `--top 3`)
- [ ] Ignore hidden files (begins with `.`)
- [ ] Minimize number of allocations
- [ ] No-std support?
- [ ] Ignore "words" only consisting of special characters, e.g. `///`
- [ ] Multiple files as inputs
- [ ] Glob input patterns
- [ ] If directory is given, walk contents of folder recursively (walker)
- [ ] Verbose output (show currently analyzed file etc)
- [ ] Library usage
- [ ] https://github.com/jonhoo/evmap
- [ ] Automated abstract generation with Luhn's algorithm [Issue #1](https://github.com/mre/freq/issues/1)Idea contributors:
- [@jamesmunns](https://github.com/jamesmunns)
- [@M3t0r](https://github.com/M3t0r)
- [@themihel](https://github.com/themihel)
- [@AlexanderThaller](https://github.com/AlexanderThaller)
- [@pizzamig](https://github.com/pizzamig)
- Want to see your name here? Create an issue!## Similar tools
**tot-up**
Similar tool written in Rust with nice graphical output
https://github.com/payload/tot-up**uniq**
A basic version would be
```sh,ignore
curl -L 'https://github.com/mre/freq/raw/main/README.md' | tr -cs '[:alnum:]' "\n" | grep -vEx 'and|or|for|a|of|to|an|in' | sort | uniq -c | sort
```This works, but it's not very extensible by normal users.
It would also lack most of the features listed above.**Lucene**
Has all the bells and whistles, but there is no official CLI interface and requires a full Java installation.
**wordcount**
`freqword freq`
Nice and simple. Doesn't exclude stopwords and no regex support, though.
https://github.com/juditacs/wordcount**word-frequency**
Haskell-based approach: Includes features like min length for words, or min occurrences of words in a text.
https://github.com/cbzehner/word-frequency**What else?**
There must be more tools out there. Can you help me find them?