{"id":27210804,"url":"https://github.com/rekram1-node/text-processor","last_synced_at":"2025-07-30T23:16:18.466Z","repository":{"id":65551595,"uuid":"594450823","full_name":"rekram1-node/text-processor","owner":"rekram1-node","description":"NLP utility library to interact with text. Initially this is to process English documents, essays, etc","archived":false,"fork":false,"pushed_at":"2023-01-29T19:03:54.000Z","size":165,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-07-28T15:48:02.922Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rekram1-node.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2023-01-28T15:41:52.000Z","updated_at":"2023-03-26T19:29:26.000Z","dependencies_parsed_at":"2023-02-15T19:30:56.504Z","dependency_job_id":null,"html_url":"https://github.com/rekram1-node/text-processor","commit_stats":null,"previous_names":[],"tags_count":6,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rekram1-node%2Ftext-processor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rekram1-node%2Ftext-processor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rekram1-node%2Ftext-processor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rekram1-node%2Ftext-processor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rekram1-node","download_url":"https://codeload.github.com/rekram1-node/text-processor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248139724,"owners_count":21054163,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-10T01:27:39.708Z","updated_at":"2025-04-10T01:27:40.426Z","avatar_url":"https://github.com/rekram1-node.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text Processor\n\n[![Go Report](https://goreportcard.com/badge/github.com/rekram1-node/text-processor)](https://goreportcard.com/report/github.com/rekram1-node/text-processor) [![license](http://img.shields.io/badge/license-MIT-red.svg?style=flat)](https://github.com/rekram1-node/text-processor/blob/main/LICENSE) ![Build Status](https://github.com/rekram1-node/text-processor/actions/workflows/main.yml/badge.svg)\n\n\nNLP utility library to interact with text documents using a [Word2vec Model](https://developer.syn.co.in/tutorial/bot/oscova/pretrained-vectors.html) Library parses out sentences and paragraphs, removes stop words and tokenizes sentences in order to be consumed by the word2vec comparison functions\n\n## Features\n\n* Extract Sentences and Paragraphs from text\n* Show how similar sentences and paragraphs are using word2vec model\n* Tokenization, stop word removing, vectorization (done internally)\n* Getting key phrases and general sentiment\n\n## How it works\n\n![text-processor](docs/assets/flowChart.png)\n\n## Getting Started\n\n### Prerequisites\n- [Go](https://go.dev/)\n- [Word2vec Model](https://developer.syn.co.in/tutorial/bot/oscova/pretrained-vectors.html) \n- Note: model must be unzipped and in working directory!\n\n## Installing Model\n\nGo to [Word2vec Model](https://developer.syn.co.in/tutorial/bot/oscova/pretrained-vectors.html) and select one of their models to download, I use the 300 Dimension Google News One\n\n### Getting blinkgo\n\nWith [Go module](https://github.com/golang/go/wiki/Modules) support, simply add the following import\n\n```go\nimport \"github.com/rekram1-node/text-processor/text\"\n```\n\nto your code, and then `go [build|run|test]` will automatically fetch the necessary dependencies.\n\nOtherwise, run the following to install the `text-processor` library\n\n```shell\n$ go get -u github.com/rekram1-node/text-processor/text\n```\n\n## Usage\n\n### Basic Text Comparison\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\n\t\"github.com/rekram1-node/text-processor/text\"\n)\n\nfunc main() {\n    t1 := \"So much of modern-day life revolves around using opposable thumbs, from holding a hammer to build a home to ordering food delivery on our smartphones. But for our ancestors, the uses were much simpler. Strong and nimble thumbs meant that they could better create and wield tools, stones and bones for killing large animals for food\"\n    t2 := \"A lot of life today involves using opposable thumbs, from using a hammer to build a house to ordering something on our smartphones. But for our predecessors, the uses were much more simple. Powerful and dexterous thumbs meant that they could better make and use tools, stones and bones for killing large animals to eat\"\n\n    // load the word2vec model\n    m, err := text.LoadModel(\"yourModelFile\")\n\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    // extract paragraphs and sentences from the text\n    t1Paragraphs, t1Sentences, err := text.ExtractAll(t1)\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    // extract paragraphs and sentences from the text\n    t2Paragraphs, t2Sentences, err := text.ExtractAll(t2)\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    // compare the two texts and a map of sentences (from 1st document)\n    // paired to sentences (from 2nd document) along with a similarity score \n    sim, err := m.MostSimilarSentences(t1Sentences, t2Sentences)\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    // iterate over sentence array and display data\n    for _, sentence := range t1Sentences {\n        simSentence := sim[sentence]\n        if simSentence.Sentence != \"\" {\n            fmt.Println()\n            fmt.Println(sentence, \"is most similar to:\", simSentence.Sentence)\n            fmt.Printf(\"similarity: %v\\n\", simSentence.Score)\n            fmt.Println()\n        }\n    }\n\n    // compare the two texts and a map of paragraphs (from 1st document)\n    // paired to paragraphs (from 2nd document) along with a similarity score \n    simPara, err := m.MostSimilarParagraphs(t1Paragraphs, t2Paragraphs)\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    // iterate over paragraph array and display data\n    for _, para := range t1Paragraphs {\n        simParagraph := simPara[para]\n        if simParagraph.Paragraph != \"\" {\n            fmt.Println()\n            fmt.Println(para, \"is most similar to:\", simParagraph.Paragraph)\n            fmt.Printf(\"similarity: %v\\n\", simParagraph.Score)\n            fmt.Println()\n        }\n    }\n}\n```\n\n\n\n## Issues\n\nIf you have an issue: report it on the [issue tracker](https://github.com/rekram1-node/text-processor/issues)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frekram1-node%2Ftext-processor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frekram1-node%2Ftext-processor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frekram1-node%2Ftext-processor/lists"}