{"id":15882462,"url":"https://github.com/yoeight/ngrams-loader","last_synced_at":"2025-03-17T13:31:23.658Z","repository":{"id":15307101,"uuid":"18036963","full_name":"YoEight/ngrams-loader","owner":"YoEight","description":"Ngrams loader based on http://www.ngrams.info format","archived":false,"fork":false,"pushed_at":"2014-03-25T09:45:04.000Z","size":204,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-15T00:33:52.313Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YoEight.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-03-23T15:57:33.000Z","updated_at":"2022-12-10T19:47:23.000Z","dependencies_parsed_at":"2022-08-25T17:00:13.729Z","dependency_job_id":null,"html_url":"https://github.com/YoEight/ngrams-loader","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YoEight%2Fngrams-loader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YoEight%2Fngrams-loader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YoEight%2Fngrams-loader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YoEight%2Fngrams-loader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YoEight","download_url":"https://codeload.github.com/YoEight/ngrams-loader/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243864808,"owners_count":20360357,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T04:03:32.645Z","updated_at":"2025-03-17T13:31:23.188Z","avatar_url":"https://github.com/YoEight.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"ngrams-loader\n=============\n\nNgrams loader based on http://www.ngrams.info format\n\n[![Build Status](https://travis-ci.org/YoEight/ngrams-loader.png?branch=master)](https://travis-ci.org/YoEight/ngrams-loader)\n\nInstallation\n------------\nSupposed you have at least `cabal 1.18` installed\n\n```\n$ cabal sandbox init\n$ cabal install --only-dependencies\n$ cabal configure\n$ cabal install\n\n-- program located in ~/.cabal-sandbox/bin\n```\n\nUsage\n-----\n\n```\nusage: ngrams-loader [options] \u003cn-grams file\u003e \u003cSQLite file\u003e\n  [-2,--bigram]     Parses bigrams\n  [-3,--trigram]    Parses trigrams\n  [-4,--quadgram]   Parses 4-grams\n  [-5,--pentagram]  Parses 5-grams\n  [-c,--create]     Creates table before inserts\n  \u003cn-grams file\u003e    N-grams file\n  \u003cSQLite file\u003e     SQlite db file\n```\n\nExample\n-------\n\n```\nngrams-loader --bigram --create w2.txt bigram.db\n\n```\nIt parses each line of `w2.txt` as a bigram, create bigram table before performing inserts and saves everything in `bigram.db`\n\nFigures\n-------\n\nSpecs\n\n- Core i7 3770 @ 3.4GHz\n- Gentoo with 3.12.13 Linux kernel (64bits)\n- 1.055.386 lines bigram file\n \n`ngrams-loader --bigram --create w2.txt bigram.db` gets\n\n```\nreal\t0m16.244s\nuser\t0m15.597s\nsys\t  0m0.143s\n\n```\n\nSql Schemas\n-----------\n\nBigram\n\n```sql\ncreate table bigrams(\n  frequence int,\n  word1 varchar(100),\n  word2 varchar(100)\n);\n```\n\nTrigram\n\n```sql\ncreate table tridgrams(\n  frequence int,\n  word1 varchar(100),\n  word2 varchar(100),\n  word3 varchar(100)\n);\n```\n\n4-gram\n\n```sql\ncreate table quadgrams(\n  frequence int,\n  word1 varchar(100),\n  word2 varchar(100),\n  word3 varchar(100),\n  word4 varchar(100)\n);\n```\n\n5-gram\n\n```sql\ncreate table pentagrams(\n  frequence int,\n  word1 varchar(100),\n  word2 varchar(100),\n  word3 varchar(100),\n  word4 varchar(100),\n  word5 varchar(100)\n);\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoeight%2Fngrams-loader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyoeight%2Fngrams-loader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoeight%2Fngrams-loader/lists"}