{"id":22591907,"url":"https://github.com/georgesalkhouri/l3wtransformer","last_synced_at":"2025-04-10T23:23:17.967Z","repository":{"id":62574920,"uuid":"96646578","full_name":"GeorgesAlkhouri/l3wtransformer","owner":"GeorgesAlkhouri","description":"A word hashing method based on vectors of letter n-grams. Currently transforms text into sequences of numbers.","archived":false,"fork":false,"pushed_at":"2018-02-27T15:35:41.000Z","size":23,"stargazers_count":10,"open_issues_count":2,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-10-14T09:40:58.399Z","etag":null,"topics":["bag-of-words","data-science","feature-extraction","letter-trigram-word-hashing","python","text-processing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GeorgesAlkhouri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-08T22:02:42.000Z","updated_at":"2021-11-19T09:08:11.000Z","dependencies_parsed_at":"2022-11-03T18:37:00.631Z","dependency_job_id":null,"html_url":"https://github.com/GeorgesAlkhouri/l3wtransformer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GeorgesAlkhouri%2Fl3wtransformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GeorgesAlkhouri%2Fl3wtransformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GeorgesAlkhouri%2Fl3wtransformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GeorgesAlkhouri%2Fl3wtransformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GeorgesAlkhouri","download_url":"https://codeload.github.com/GeorgesAlkhouri/l3wtransformer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248313230,"owners_count":21082821,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bag-of-words","data-science","feature-extraction","letter-trigram-word-hashing","python","text-processing"],"created_at":"2024-12-08T09:14:31.626Z","updated_at":"2025-04-10T23:23:17.918Z","avatar_url":"https://github.com/GeorgesAlkhouri.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"l3wtransformer\n==============\n\n\u003e A word hashing method to reduce the dimensionality of the bag-of-words term vectors. It is based on letter n-gram. Given a word (e.g. good), it first adds word starting and ending marks to the word (e.g. #good#). Then, breaks the word into letter n-grams (e.g. letter trigrams: #go, goo, ood, od#). Finally, the word is represented using a vector of letter n-grams.\n\n[Huang et al.2013, Learning Deep Structured Semantic Models for Web Search using Clickthrough Data]\n\n---\n\nThis implementation supports the transformation from **text into sequences of numbers**, with the numbers indicating the descending word frequency.\n\nFor example:\n\n*Lorem ipsum dolor sit amet, consectetuer adipiscing elit...* is transformed into *23, 1, 80, 86, 47, 50001, 21, 59, 83, 93, 14, 50003, 4, 7*\n\nAlso, after each word flags indicating lower case, upper case, mixed case or initial capitalization are added.\n\n### To do\n\nThere will be an implementation supporting the transformation from **text into bag-of-word vectors**.\n\nInstall\n-------\n\n```\npip install l3wtransformer\n```\n\nUsage\n-----\n\n```\nfrom l3wtransformer import L3wTransformer\n\nl3wt = L3wTransformer()\n\nl3wt.fit_on_texts(['First example.', 'And one more!'])\nl3wt.texts_to_sequences(['One example', '2nd exa.'])\n\n# [[5, 18, 17, 50001, 2, 10, 24, 6, 15, 20, 50003], [16, 50003, 2, 10, 50003]]\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeorgesalkhouri%2Fl3wtransformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgeorgesalkhouri%2Fl3wtransformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeorgesalkhouri%2Fl3wtransformer/lists"}