https://github.com/yoeight/ngrams-loader
Ngrams loader based on http://www.ngrams.info format
https://github.com/yoeight/ngrams-loader
Last synced: over 1 year ago
JSON representation
Ngrams loader based on http://www.ngrams.info format
- Host: GitHub
- URL: https://github.com/yoeight/ngrams-loader
- Owner: YoEight
- License: mit
- Created: 2014-03-23T15:57:33.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2014-03-25T09:45:04.000Z (about 12 years ago)
- Last Synced: 2025-03-15T00:33:52.313Z (over 1 year ago)
- Language: Haskell
- Size: 199 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
ngrams-loader
=============
Ngrams loader based on http://www.ngrams.info format
[](https://travis-ci.org/YoEight/ngrams-loader)
Installation
------------
Supposed you have at least `cabal 1.18` installed
```
$ cabal sandbox init
$ cabal install --only-dependencies
$ cabal configure
$ cabal install
-- program located in ~/.cabal-sandbox/bin
```
Usage
-----
```
usage: ngrams-loader [options]
[-2,--bigram] Parses bigrams
[-3,--trigram] Parses trigrams
[-4,--quadgram] Parses 4-grams
[-5,--pentagram] Parses 5-grams
[-c,--create] Creates table before inserts
N-grams file
SQlite db file
```
Example
-------
```
ngrams-loader --bigram --create w2.txt bigram.db
```
It parses each line of `w2.txt` as a bigram, create bigram table before performing inserts and saves everything in `bigram.db`
Figures
-------
Specs
- Core i7 3770 @ 3.4GHz
- Gentoo with 3.12.13 Linux kernel (64bits)
- 1.055.386 lines bigram file
`ngrams-loader --bigram --create w2.txt bigram.db` gets
```
real 0m16.244s
user 0m15.597s
sys 0m0.143s
```
Sql Schemas
-----------
Bigram
```sql
create table bigrams(
frequence int,
word1 varchar(100),
word2 varchar(100)
);
```
Trigram
```sql
create table tridgrams(
frequence int,
word1 varchar(100),
word2 varchar(100),
word3 varchar(100)
);
```
4-gram
```sql
create table quadgrams(
frequence int,
word1 varchar(100),
word2 varchar(100),
word3 varchar(100),
word4 varchar(100)
);
```
5-gram
```sql
create table pentagrams(
frequence int,
word1 varchar(100),
word2 varchar(100),
word3 varchar(100),
word4 varchar(100),
word5 varchar(100)
);
```