{"id":13465551,"url":"https://github.com/keredson/wordninja","last_synced_at":"2025-05-14T14:08:29.009Z","repository":{"id":38375701,"uuid":"88914456","full_name":"keredson/wordninja","owner":"keredson","description":"Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies.","archived":false,"fork":false,"pushed_at":"2023-02-19T00:44:37.000Z","size":758,"stargazers_count":848,"open_issues_count":18,"forks_count":110,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-05-12T17:38:16.714Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/keredson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-04-20T22:05:42.000Z","updated_at":"2025-05-05T12:27:17.000Z","dependencies_parsed_at":"2022-07-14T03:20:42.423Z","dependency_job_id":"d7b4d4cf-e7e0-4713-826a-8392bd6c3275","html_url":"https://github.com/keredson/wordninja","commit_stats":{"total_commits":15,"total_committers":6,"mean_commits":2.5,"dds":"0.33333333333333337","last_synced_commit":"0421d148cd3d88e4f075aa9c703006f31809e9e5"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keredson%2Fwordninja","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keredson%2Fwordninja/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keredson%2Fwordninja/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keredson%2Fwordninja/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/keredson","download_url":"https://codeload.github.com/keredson/wordninja/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254159789,"owners_count":22024564,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T15:00:31.831Z","updated_at":"2025-05-14T14:08:28.987Z","avatar_url":"https://github.com/keredson.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"![image](https://user-images.githubusercontent.com/2049665/29219793-b4dcb942-7e7e-11e7-8785-761b0e784e04.png)\n\nWord Ninja\n==========\n\nSlice your munged together words!  Seriously, Take anything, `'imateapot'` for example, would become `['im', 'a', 'teapot']`.  Useful for humanizing stuff (like database tables when people don't like underscores).\n\nThis project is repackaging the excellent work from here: http://stackoverflow.com/a/11642687/2449774\n\nUsage\n-----\n```\n$ python\n\u003e\u003e\u003e import wordninja\n\u003e\u003e\u003e wordninja.split('derekanderson')\n['derek', 'anderson']\n\u003e\u003e\u003e wordninja.split('imateapot')\n['im', 'a', 'teapot']\n\u003e\u003e\u003e wordninja.split('heshotwhointhewhatnow')\n['he', 'shot', 'who', 'in', 'the', 'what', 'now']\n\u003e\u003e\u003e wordninja.split('thequickbrownfoxjumpsoverthelazydog')\n['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']\n```\n\nPerformance\n-----------\nIt's super fast!\n\n```\n\u003e\u003e\u003e def f():\n...   wordninja.split('imateapot')\n... \n\u003e\u003e\u003e timeit.timeit(f, number=10000)\n0.40885152100236155\n```\n\nIt can handle long strings:\n```\n\u003e\u003e\u003e wordninja.split('wethepeopleoftheunitedstatesinordertoformamoreperfectunionestablishjusticeinsuredomestictranquilityprovideforthecommondefencepromotethegeneralwelfareandsecuretheblessingsoflibertytoourselvesandourposteritydoordainandestablishthisconstitutionfortheunitedstatesofamerica')\n['we', 'the', 'people', 'of', 'the', 'united', 'states', 'in', 'order', 'to', 'form', 'a', 'more', 'perfect', 'union', 'establish', 'justice', 'in', 'sure', 'domestic', 'tranquility', 'provide', 'for', 'the', 'common', 'defence', 'promote', 'the', 'general', 'welfare', 'and', 'secure', 'the', 'blessings', 'of', 'liberty', 'to', 'ourselves', 'and', 'our', 'posterity', 'do', 'ordain', 'and', 'establish', 'this', 'constitution', 'for', 'the', 'united', 'states', 'of', 'america']\n```\nAnd scales well.  (This string takes ~7ms to compute.) \n\nHow to Install\n--------------\n\n```\npip3 install wordninja\n```\n\nCustom Language Models\n----------------------\n#1 most requested feature!  If you want to do something other than english (or want to specify your own model of english), this is how you do it.\n\n```\n\u003e\u003e\u003e lm = wordninja.LanguageModel('my_lang.txt.gz')\n\u003e\u003e\u003e lm.split('derek')\n['der','ek']\n```\n\nLanguage files must be gziped text files with one word per line in decreasing order of probability.\n\nIf you want to make your model the default, set:\n\n```\nwordninja.DEFAULT_LANGUAGE_MODEL = wordninja.LanguageModel('my_lang.txt.gz')\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeredson%2Fwordninja","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkeredson%2Fwordninja","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeredson%2Fwordninja/lists"}