Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/timdream/wordfreq
Text corpus calculation in Javascript. Supports Chinese, English.
https://github.com/timdream/wordfreq
Last synced: 30 days ago
JSON representation
Text corpus calculation in Javascript. Supports Chinese, English.
- Host: GitHub
- URL: https://github.com/timdream/wordfreq
- Owner: timdream
- License: mit
- Created: 2012-10-18T13:46:48.000Z (about 12 years ago)
- Default Branch: gh-pages
- Last Pushed: 2021-01-31T03:38:05.000Z (almost 4 years ago)
- Last Synced: 2024-10-06T09:21:45.478Z (about 1 month ago)
- Language: HTML
- Homepage: https://wordfreq.timdream.org/
- Size: 122 KB
- Stars: 80
- Watchers: 8
- Forks: 22
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# wordfreq [![Build Status](https://secure.travis-ci.org/timdream/wordfreq.png)](http://travis-ci.org/timdream/wordfreq)
[Text corpus](https://en.wikipedia.org/wiki/Text_corpus) calculation in Javascript.
Supports Chinese, English.
See [demo](http://timdream.org/wordfreq/).This library is a spin-off project from [HTML5 Word Cloud](https://github.com/timdream/wordcloud).
## Simple usage
Load `wordfreq.js` script to the web page, and run:
// Create an options object for initialization
var options = {
workerUrl: 'path/to/wordfreq.worker.js' };
// Initialize and run process() function
var wordfreq = WordFreq(options).process(text, function (list) {
// console.log the list returned in this callback.
console.log(list);
});`WordFreq()` methods are chainable, for example,
// Process 3 strings and get corpus of all the texts.
WordFreq(options)
.process(text).process(text2).process(text3)
.getList(function (list) {
console.log(list);
});To use this library synchronously, load `wordfreq.worker.js` and use the `WordFreqSync` interface. Check `API.md` for available options and methods.
### Command-line interface
Command-line interface is available, powered by [Node.js](http://nodejs.org/). To install globally, run
npm install -g wordfreq
Example usage:
wordfreq ~/mypost.txt | less
cat ~/mypost.txt | wordfreq - | less## Algorithm
Corpus is calculated with a simple N-gram algorithm and sub-string filter.
Here is [an article](http://www.openfoundry.org/tw/foss-forum/8339--open-web-html5-) in Traditional Chinese on how HTML5 Word Cloud is being done.[Porter Stemming Algorithm](http://tartarus.org/~martin/PorterStemmer/) is included for processing English.
## Testing
To run tests, first you would need to pull the required QUnit library by running
git submodule init
git submodule updateThen, start a localhost HTTP server, for example,
python -m SimpleHTTPServer 8009
Point your browser to [http://localhost:8009/test/](http://localhost:8009/test/) to start testing.
You may also run the tests with PhantomJS by running
phantomjs test/qunit/addons/phantomjs/runner.js http://localhost:8009/test/
You will find all the information you need to write testcases on the [QUnit](http://qunitjs.com) website.
All non-trivial code submission are expected to accompany with testcases.**Known IE10 issue**: Looks like IE10 suffers from [the same issue](https://bugzilla.mozilla.org/show_bug.cgi?id=785248) with Firefox <= 17, where Web Worker will choke and couldn't finish the entire test suite.