Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vsivsi/wordcounter
Tool that probabilistically estimates the number of unique words in its input with constrained memory
https://github.com/vsivsi/wordcounter
Last synced: 19 days ago
JSON representation
Tool that probabilistically estimates the number of unique words in its input with constrained memory
- Host: GitHub
- URL: https://github.com/vsivsi/wordcounter
- Owner: vsivsi
- License: mit
- Created: 2024-05-22T03:42:14.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-10-03T18:57:42.000Z (about 1 month ago)
- Last Synced: 2024-10-08T01:40:12.972Z (30 days ago)
- Language: Go
- Size: 2.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Wordcounter
This project implements the Monte Carlo based unique word counting method described in the paper [*Distinct Elements in Streams: An Algorithm for
the (Text) Book*](https://arxiv.org/abs/2301.10191)It probabilistically estimates the number of unique words in its input (from either stdin or a filename arg) using a maximum amount of word storage.
### To build/run
This is a [Go language](https://go.dev/) project. Once you have Go installed locally:
To directly run:
`go run main.go `
To build an executable:
`go build -o wordcounter`
### Examples
To run the wordcounter with the default memory size (1000 words) and the input from warandpeace.txt:
`go run main.go warandpeace.txt`
To run the wordcounter with a memory size of 2000 words and the input from warandpeace.txt:
`go run main.go -m 2000 warandpeace.txt`