{"id":17166652,"url":"https://github.com/willf/microsoft_ngram","last_synced_at":"2025-06-29T07:32:47.639Z","repository":{"id":1052830,"uuid":"885088","full_name":"willf/microsoft_ngram","owner":"willf","description":"Ruby code to access Microsoft's Ngram data","archived":false,"fork":false,"pushed_at":"2012-04-12T00:40:24.000Z","size":364,"stargazers_count":20,"open_issues_count":0,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-06-21T05:37:11.982Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/willf.png","metadata":{"files":{"readme":"README.md","changelog":"History.txt","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2010-09-03T06:18:05.000Z","updated_at":"2022-12-05T21:22:24.000Z","dependencies_parsed_at":"2022-08-16T11:55:15.218Z","dependency_job_id":null,"html_url":"https://github.com/willf/microsoft_ngram","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/willf/microsoft_ngram","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fmicrosoft_ngram","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fmicrosoft_ngram/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fmicrosoft_ngram/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fmicrosoft_ngram/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/willf","download_url":"https://codeload.github.com/willf/microsoft_ngram/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willf%2Fmicrosoft_ngram/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261475950,"owners_count":23164074,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-14T23:06:16.665Z","updated_at":"2025-06-29T07:32:47.614Z","avatar_url":"https://github.com/willf.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"microsoft_ngram\n===============\n\nThis is a simple ruby gem to access the Bing Ngram data. It's loosely based on Microsoft's Python library.\nSource code at [github.com/willf/microsoft_ngram](http://github.com/willf/microsoft_ngram).\n\nInstallation\n------------\n\nEmail [webngram@microsoft.com](mailto:webngram@microsoft.com?subject=Token%20Request) and request a token.\nWhen you get your token, add it to your .bashrc or .bash_profile:\n\n    export NGRAM_TOKEN=\"YOUR-TOKEN-HERE\"\n    \nThen install the gem:\n\n    gem install microsoft_ngram\n\nUsage\n-----\n\nTo get a list of currently available models:\n \n    \u003e\u003e Bing::Ngram.models\n    =\u003e [\"bing-anchor/jun09/1\", \"bing-anchor/jun09/2\", \"bing-anchor/jun09/3\", \"bing-anchor/jun09/4\", \"bing-body/jun09/1\", \"bing-body/jun09/2\", \"bing-body/jun09/3\", \"bing-title/jun09/1\", \"bing-title/jun09/2\", \"bing-title/jun09/3\", \"bing-title/jun09/4\", \"bing-query/jun09/1\", \"bing-query/jun09/2\", \"bing-query/jun09/3\", \"bing-title/apr10/1\", \"bing-title/apr10/2\", \"bing-title/apr10/3\", \"bing-title/apr10/4\", \"bing-title/apr10/5\", \"bing-anchor/apr10/1\", \"bing-anchor/apr10/2\", \"bing-anchor/apr10/3\", \"bing-anchor/apr10/4\", \"bing-anchor/apr10/5\", \"bing-body/apr10/1\", \"bing-body/apr10/2\", \"bing-body/apr10/3\", \"bing-body/apr10/4\", \"bing-body/apr10/5\"] \n \nTo see the default model:\n\n    \u003e MicrosoftNgram.default_model            \n    =\u003e \"bing-body/jun09/3\" \n\nParameters to the initializer are:\n\n    :model =\u003e \u003ci\u003estring\u003c/i\u003e (sets model)\n    :user_token =\u003e \u003ci\u003estring\u003c/i\u003e (sets user token)\n    :debug =\u003e \u003ci\u003eboolean\u003c/i\u003e (will show GET/POST calls)\n \nSo, to use the 2-gram title model:\n\n    \u003e model = MicrosoftNgram.new(:model =\u003e \"bing-title/jun09/2\")\n\nTo get a single joint probability, or multiple joint probabilities (If\nyou know you want multiple joint probabilities, it is better to ask\nfor several at once):\n\n    \u003e MicrosoftNgram.new.jps(['fish sticks', 'frog sticks'])\n    =\u003e [[\"fish sticks\", -6.853792], [\"frog sticks\", -9.91852]] \n    \u003e MicrosoftNgram.new.jp(\"fish sticks\")\n    =\u003e -6.853792 \n\nTo get a single conditional probability, or multiple conditional probabilities (If you know you want multiple conditional probabilities, it is better to ask for several at once):\n\n    \u003e MicrosoftNgram.new.cp(\"fish sticks\")\n    =\u003e -2.712575 \n    \u003e MicrosoftNgram.new.cps(['fish sticks', 'frog sticks'])\n    =\u003e [[\"fish sticks\", -2.712575], [\"frog sticks\", -4.788582]] \n\nTo yield the most probable next token using the default model:\n\n    \u003e MicrosoftNgram.new.generate(\"Microsoft Windows\",5)  {|x| puts x.join(' ')}\n    xp -0.6964428\n    vista -0.9242383\n    server -1.106876\n    2000 -1.145312\n    currentversion -1.168404\n\nTo use the query model for the same thing:\n\n    \u003e MicrosoftNgram.new(:model =\u003e 'bing-query/jun09/3').generate(\"Microsoft Windows\",5)  {|x| puts x.join(' ')}\n    xp -0.5429792\n    \u003c/s\u003e -1.062959\n    update -1.08291\n    vista -1.199022\n    installer -1.248958\n    \nYou can also get a list of the N most likely candidates (could be slower for long lists):\n\n    \u003e MicrosoftNgram.new(:model =\u003e 'bing-query/jun09/3').generate_list(\"Microsoft Windows\",5).each  {|x| puts x.join(' ')}\n    xp -0.5429792\n    \u003c/s\u003e -1.062959\n    update -1.08291\n    vista -1.199022\n    installer -1.248958 \n       \nSample Script\n-------------\n\n```ruby\nrequire 'rubygems'\nrequire 'microsoft_ngram'\nl = []\nBing::Ngram.new(:model =\u003e \"bing-body/apr10/5\").generate('a bum',50){ |w,_| l \u003c\u003c w }\nl.join(\"; \")\n```\n\nMore Info\n---------\n\nSee the [REST API](http://web-ngram.research.microsoft.com/info/rest.html) and the \n[terms of use](http://web-ngram.research.microsoft.com/info/TermsOfUse.htm) for accessing the Microsoft data.\n\nLicense\n-------\n\n(The MIT License)\n\nCopyright (c) 2010/2011\n\nPermission is hereby granted, free of charge, to any person obtaining\na copy of this software and associated documentation files (the\n'Software'), to deal in the Software without restriction, including\nwithout limitation the rights to use, copy, modify, merge, publish,\ndistribute, sublicense, and/or sell copies of the Software, and to\npermit persons to whom the Software is furnished to do so, subject to\nthe following conditions:\n\nThe above copyright notice and this permission notice shall be\nincluded in all copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,\nEXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\nMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.\nIN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY\nCLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,\nTORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE\nSOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillf%2Fmicrosoft_ngram","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwillf%2Fmicrosoft_ngram","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillf%2Fmicrosoft_ngram/lists"}