Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/words/subtlex-word-frequencies
A list of words from the SUBTLEX movie subtitles corpus, sorted by frequency.
https://github.com/words/subtlex-word-frequencies
american count en en-us english frequence subtlex subtlexus word
Last synced: about 2 months ago
JSON representation
A list of words from the SUBTLEX movie subtitles corpus, sorted by frequency.
- Host: GitHub
- URL: https://github.com/words/subtlex-word-frequencies
- Owner: words
- License: isc
- Created: 2015-07-29T05:35:32.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2020-02-13T08:21:35.000Z (over 4 years ago)
- Last Synced: 2024-07-25T05:03:04.097Z (2 months ago)
- Topics: american, count, en, en-us, english, frequence, subtlex, subtlexus, word
- Language: JavaScript
- Homepage:
- Size: 2.08 MB
- Stars: 30
- Watchers: 4
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: license
Awesome Lists containing this project
README
# `subtlex-word-frequencies`
[![Build][build-badge]][build]
[![Downloads][downloads-badge]][downloads]
[![Size][size-badge]][size]List of 74,286 words sorted by frequency of use in spoken English.
The word counts are derived from [SUBTLEXus][], a corpus of American English
subtitles of movies.## Install
[npm][]:
```sh
npm install subtlex-word-frequencies
```## Use
```js
var subtlex = require('subtlex-word-frequencies')console.log(words.length)
console.log(words.slice(0, 3))
console.log(words.filter(d => d.word.match(/chick/)).slice(0, 5))
```Yields:
```js
74286
[
{word: 'you', count: 2134713},
{word: 'I', count: 2038529},
{word: 'the', count: 1501908}
]
[
{word: 'chicken', count: 3148},
{word: 'chick', count: 1334},
{word: 'chicks', count: 742},
{word: 'chickens', count: 520},
{word: 'chickenshit', count: 85}
]
```## API
### `subtlexWordFrequencies`
`Array.` — List of all entries in SUBTLEXus.
Each entry has the following properties:* `word` (`string`) — Unique word
(example: `git`)
* `value` (`number`) — Number of times the word appears in the corpus
(example: `101`)`word` starts with a capital when the word more often starts with an uppercase
letter than with a lowercase letter (example: `I`).The entire original corpus consists of 51 million words.
## License
[ISC][license] © [Zeke Sikelianos][author]
[build-badge]: https://img.shields.io/travis/words/subtlex-word-frequencies.svg
[build]: https://travis-ci.org/words/subtlex-word-frequencies
[downloads-badge]: https://img.shields.io/npm/dm/subtlex-word-frequencies.svg
[downloads]: https://www.npmjs.com/package/subtlex-word-frequencies
[size-badge]: https://img.shields.io/bundlephobia/minzip/subtlex-word-frequencies.svg
[size]: https://bundlephobia.com/result?p=subtlex-word-frequencies
[npm]: https://docs.npmjs.com/cli/install
[license]: license
[author]: http://zeke.sikelianos.com
[subtlexus]: https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus