{"id":13413558,"url":"https://github.com/go-ego/gse","last_synced_at":"2026-03-12T05:02:18.744Z","repository":{"id":38409207,"uuid":"95233790","full_name":"go-ego/gse","owner":"go-ego","description":"Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.","archived":false,"fork":false,"pushed_at":"2024-08-22T19:12:20.000Z","size":17601,"stargazers_count":2647,"open_issues_count":17,"forks_count":219,"subscribers_count":62,"default_branch":"master","last_synced_at":"2025-04-29T14:17:41.979Z","etag":null,"topics":["chinese","english","go","gse","hmm","hmm-viterbi-algorithm","japanese","jieba","nlp","segment","trie"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/go-ego.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-23T15:42:35.000Z","updated_at":"2025-04-26T07:43:58.000Z","dependencies_parsed_at":"2023-02-10T17:30:40.045Z","dependency_job_id":"6d25b92b-2314-408d-ba2a-a877c305e0dc","html_url":"https://github.com/go-ego/gse","commit_stats":{"total_commits":586,"total_committers":6,"mean_commits":97.66666666666667,"dds":0.008532423208191142,"last_synced_commit":"82fc9e41d1b5a03820cf5a3548193e337cb8726a"},"previous_names":[],"tags_count":60,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/go-ego%2Fgse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/go-ego%2Fgse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/go-ego%2Fgse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/go-ego%2Fgse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/go-ego","download_url":"https://codeload.github.com/go-ego/gse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251516953,"owners_count":21601911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chinese","english","go","gse","hmm","hmm-viterbi-algorithm","japanese","jieba","nlp","segment","trie"],"created_at":"2024-07-30T20:01:43.142Z","updated_at":"2026-03-12T05:02:18.728Z","avatar_url":"https://github.com/go-ego.png","language":"Go","funding_links":[],"categories":["Microsoft Office","Uncategorized","开源类库","自然语言处理","Go","Open source library","Natural Language Processing","Bot Building","Relational Databases","Chinese NLP Toolkits 中文NLP工具","\u003cspan id=\"自然语言处理-natural-language-processing\"\u003e自然语言处理 Natural Language Processing\u003c/span\u003e"],"sub_categories":["Tokenizers","Uncategorized","搜索推荐","分词器","Search Recommendations","Advanced Console UIs","交流","Strings","Chinese Word Segment 中文分词","暂未分类","暂未分类这些库被放在这里是因为其他类别似乎都不适合。","\u003cspan id=\"高级控制台用户界面-advanced-console-uis\"\u003e高级控制台用户界面 Advanced Console UIs\u003c/span\u003e"],"readme":"# gse\r\n\r\nGo efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.\r\nAnd supports with [elasticsearch](https://github.com/vcaesar/go-gse-elastic) and [bleve](https://github.com/vcaesar/gse-bleve).\r\n\r\n\u003c!--\u003cimg align=\"right\" src=\"https://raw.githubusercontent.com/go-ego/ego/master/logo.jpg\"\u003e--\u003e\r\n\u003c!--\u003ca href=\"https://circleci.com/gh/go-ego/ego/tree/dev\"\u003e\u003cimg src=\"https://img.shields.io/circleci/project/go-ego/ego/dev.svg\" alt=\"Build Status\"\u003e\u003c/a\u003e--\u003e\r\n\r\n[![Build Status](https://github.com/go-ego/gse/workflows/Go/badge.svg)](https://github.com/go-ego/gse/commits/master)\r\n[![CircleCI Status](https://circleci.com/gh/go-ego/gse.svg?style=shield)](https://circleci.com/gh/go-ego/gse)\r\n[![codecov](https://codecov.io/gh/go-ego/gse/branch/master/graph/badge.svg)](https://codecov.io/gh/go-ego/gse)\r\n\u003c!-- [![Build Status](https://travis-ci.org/go-ego/gse.svg)](https://travis-ci.org/go-ego/gse) --\u003e\r\n[![Go Report Card](https://goreportcard.com/badge/github.com/go-ego/gse)](https://goreportcard.com/report/github.com/go-ego/gse)\r\n[![GoDoc](https://godoc.org/github.com/go-ego/gse?status.svg)](https://godoc.org/github.com/go-ego/gse)\r\n[![GitHub release](https://img.shields.io/github/release/go-ego/gse.svg)](https://github.com/go-ego/gse/releases/latest)\r\n[![Join the chat at https://gitter.im/go-ego/ego](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/go-ego/ego?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\r\n\r\n\u003c!-- [![Release](https://github-release-version.herokuapp.com/github/go-ego/gse/release.svg?style=flat)](https://github.com/go-ego/gse/releases/latest) --\u003e\r\n\u003c!--\u003ca href=\"https://github.com/go-ego/ego/releases\"\u003e\u003cimg src=\"https://img.shields.io/badge/%20version%20-%206.0.0%20-blue.svg?style=flat-square\" alt=\"Releases\"\u003e\u003c/a\u003e--\u003e\r\n\r\n[简体中文](https://github.com/go-ego/gse/blob/master/README_zh.md) | [日本語](https://github.com/go-ego/gse/blob/master/README_ja.md)\r\n\r\nGse is implements jieba by golang, and try add NLP support and more feature\r\n\r\n## Feature:\r\n\r\n- Support common, search engine, full mode, precise mode and HMM mode multiple word segmentation modes;\r\n- Support user and embed dictionary, Part-of-speech/POS tagging, analyze segment info, stop and trim words\r\n- Support multilingual: English, Chinese, Japanese and others\r\n- Support Traditional Chinese\r\n- Support HMM cut text use Viterbi algorithm\r\n- Support NLP by TensorFlow (in work)\r\n- Named Entity Recognition (in work)\r\n- Supports with [elasticsearch](https://github.com/vcaesar/go-gse-elastic) and bleve\r\n- run\u003ca href=\"https://github.com/go-ego/gse/blob/master/tools/server/server.go\"\u003e JSON RPC service\u003c/a\u003e.\r\n\r\n## Algorithm:\r\n\r\n- [Dictionary](https://github.com/go-ego/gse/blob/master/dictionary.go) with double array trie (Double-Array Trie) to achieve\r\n- [Segmenter](https://github.com/go-ego/gse/blob/master/dag.go) algorithm is the shortest path (based on word frequency and dynamic programming), and DAG and HMM algorithm word segmentation.\r\n\r\n## Text Segmentation speed:\r\n\r\n- \u003ca href=\"https://github.com/go-ego/gse/blob/master/tools/benchmark/benchmark.go\"\u003e single thread\u003c/a\u003e 9.2MB/s\r\n- \u003ca href=\"https://github.com/go-ego/gse/blob/master/tools/benchmark/goroutines/goroutines.go\"\u003egoroutines concurrent\u003c/a\u003e 26.8MB/s.\r\n- HMM text segmentation single thread 3.2MB/s. (2core 4threads Macbook Pro).\r\n\r\n## Binding:\r\n\r\n[gse-bind](https://github.com/vcaesar/gse-bind), binding JavaScript and other, support more language.\r\n\r\n## Install / update\r\n\r\nWith Go module support (Go 1.11+), just import:\r\n\r\n```go\r\nimport \"github.com/go-ego/gse\"\r\n```\r\n\r\nOtherwise, to install the gse package, run the command:\r\n\r\n```\r\ngo get -u github.com/go-ego/gse\r\n```\r\n\r\n## Use\r\n\r\n```go\r\npackage main\r\n\r\nimport (\r\n\t_ \"embed\"\r\n\t\"fmt\"\r\n\r\n\t\"github.com/go-ego/gse\"\r\n)\r\n\r\n//go:embed testdata/test_en2.txt\r\nvar testDict string\r\n\r\n//go:embed testdata/test_en.txt\r\nvar testEn string\r\n\r\nvar (\r\n\ttext  = \"To be or not to be, that's the question!\"\r\n\ttest1 = \"Hiworld, Helloworld!\"\r\n)\r\n\r\nfunc main() {\r\n\t// keep the origin capital letter\r\n\t// gse.ToLower = false\r\n\r\n\tvar seg1 gse.Segmenter\r\n\tseg1.DictSep = \",\"\r\n\terr := seg1.LoadDict(\"./testdata/test_en.txt\")\r\n\tif err != nil {\r\n\t\tfmt.Println(\"Load dictionary error: \", err)\r\n\t}\r\n\r\n\ts1 := seg1.Cut(text)\r\n\tfmt.Println(\"seg1 Cut: \", s1)\r\n\t// seg1 Cut:  [to be   or   not to be ,   that's the question!]\r\n\r\n\tvar seg2 gse.Segmenter\r\n\tseg2.AlphaNum = true\r\n\tseg2.LoadDict(\"./testdata/test_en_dict3.txt\")\r\n\r\n\ts2 := seg2.Cut(test1)\r\n\tfmt.Println(\"seg2 Cut: \", s2)\r\n\t// seg2 Cut:  [hi world ,   hello world !]\r\n\r\n\tvar seg3 gse.Segmenter\r\n\tseg3.AlphaNum = true\r\n\tseg3.DictSep = \",\"\r\n\terr = seg3.LoadDictEmbed(testDict + \"\\n\" + testEn)\r\n\tif err != nil {\r\n\t\tfmt.Println(\"loadDictEmbed error: \", err)\r\n\t}\r\n\ts3 := seg3.Cut(text + test1)\r\n\tfmt.Println(\"seg3 Cut: \", s3)\r\n\t// seg3 Cut:  [to be   or   not to be ,   that's the question! hi world ,   hello world !]\r\n\r\n\t// example2()\r\n}\r\n```\r\n\r\nExample2:\r\n\r\n```go\r\npackage main\r\n\r\nimport (\r\n\t\"fmt\"\r\n\t\"regexp\"\r\n\r\n\t\"github.com/go-ego/gse\"\r\n\t\"github.com/go-ego/gse/hmm/pos\"\r\n)\r\n\r\nvar (\r\n\ttext = \"Hello world, Helloworld. Winter is coming! こんにちは世界, 你好世界.\"\r\n\r\n\tnew, _ = gse.New(\"zh,testdata/test_en_dict3.txt\", \"alpha\")\r\n\r\n\tseg gse.Segmenter\r\n\tposSeg pos.Segmenter\r\n)\r\n\r\nfunc main() {\r\n\t// Loading the default dictionary\r\n\tseg.LoadDict()\r\n\t// Loading the default dictionary with embed\r\n\t// seg.LoadDictEmbed()\r\n\t//\r\n\t// Loading the Simplified Chinese dictionary\r\n\t// seg.LoadDict(\"zh_s\")\r\n\t// seg.LoadDictEmbed(\"zh_s\")\r\n\t//\r\n\t// Loading the Traditional Chinese dictionary\r\n\t// seg.LoadDict(\"zh_t\")\r\n\t//\r\n\t// Loading the Japanese dictionary\r\n\t// seg.LoadDict(\"jp\")\r\n\t//\r\n\t// Load the dictionary\r\n\t// seg.LoadDict(\"your gopath\"+\"/src/github.com/go-ego/gse/data/dict/dictionary.txt\")\r\n\r\n\tcut()\r\n\r\n\tsegCut()\r\n}\r\n\r\nfunc cut() {\r\n\thmm := new.Cut(text, true)\r\n\tfmt.Println(\"cut use hmm: \", hmm)\r\n\r\n\thmm = new.CutSearch(text, true)\r\n\tfmt.Println(\"cut search use hmm: \", hmm)\r\n\tfmt.Println(\"analyze: \", new.Analyze(hmm, text))\r\n\r\n\thmm = new.CutAll(text)\r\n\tfmt.Println(\"cut all: \", hmm)\r\n\r\n\treg := regexp.MustCompile(`(\\d+年|\\d+月|\\d+日|[\\p{Latin}]+|[\\p{Hangul}]+|\\d+\\.\\d+|[a-zA-Z0-9]+)`)\r\n\ttext1 := `헬로월드 헬로 서울, 2021年09月10日, 3.14`\r\n\thmm = seg.CutDAG(text1, reg)\r\n\tfmt.Println(\"Cut with hmm and regexp: \", hmm, hmm[0], hmm[6])\r\n}\r\n\r\nfunc analyzeAndTrim(cut []string) {\r\n\ta := seg.Analyze(cut, \"\")\r\n\tfmt.Println(\"analyze the segment: \", a)\r\n\r\n\tcut = seg.Trim(cut)\r\n\tfmt.Println(\"cut all: \", cut)\r\n\r\n\tfmt.Println(seg.String(text, true))\r\n\tfmt.Println(seg.Slice(text, true))\r\n}\r\n\r\nfunc cutPos() {\r\n\tpo := seg.Pos(text, true)\r\n\tfmt.Println(\"pos: \", po)\r\n\tpo = seg.TrimPos(po)\r\n\tfmt.Println(\"trim pos: \", po)\r\n\r\n\tpos.WithGse(seg)\r\n\tpo = posSeg.Cut(text, true)\r\n\tfmt.Println(\"pos: \", po)\r\n\r\n\tpo = posSeg.TrimWithPos(po, \"zg\")\r\n\tfmt.Println(\"trim pos: \", po)\r\n}\r\n\r\nfunc segCut() {\r\n\t// Text Segmentation\r\n\ttb := []byte(text)\r\n\tfmt.Println(seg.String(text, true))\r\n\r\n\tsegments := seg.Segment(tb)\r\n\t// Handle word segmentation results, search mode\r\n\tfmt.Println(gse.ToString(segments, true))\r\n}\r\n\r\n```\r\n\r\n[Look at an custom dictionary example](/examples/dict/main.go)\r\n\r\n```Go\r\npackage main\r\n\r\nimport (\r\n\t\"fmt\"\r\n\t_ \"embed\"\r\n\r\n\t\"github.com/go-ego/gse\"\r\n)\r\n\r\n//go:embed test_en_dict3.txt\r\nvar testDict string\r\n\r\nfunc main() {\r\n\t// var seg gse.Segmenter\r\n\t// seg.LoadDict(\"zh, testdata/zh/test_dict.txt, testdata/zh/test_dict1.txt\")\r\n\t// seg.LoadStop()\r\n\tseg, err := gse.NewEmbed(\"zh, word 20 n\"+testDict, \"en\")\r\n\t// seg.LoadDictEmbed()\r\n\tseg.LoadStopEmbed()\r\n\r\n\ttext1 := \"Hello world, こんにちは世界, 你好世界!\"\r\n\ts1 := seg.Cut(text1, true)\r\n\tfmt.Println(s1)\r\n\tfmt.Println(\"trim: \", seg.Trim(s1))\r\n\tfmt.Println(\"stop: \", seg.Stop(s1))\r\n\tfmt.Println(seg.String(text1, true))\r\n\r\n\tsegments := seg.Segment([]byte(text1))\r\n\tfmt.Println(gse.ToString(segments))\r\n}\r\n```\r\n\r\n[Look at an Chinese example](/examples/main.go)\r\n\r\n[Look at an Japanese example](/examples/jp/main.go)\r\n\r\n## Elasticsearch\r\n\r\nHow to use it with elasticsearch?\r\n\r\n[go-gse-elastic](https://github.com/vcaesar/go-gse-elastic)\r\n\r\n## Authors\r\n\r\n- [Maintainers](https://github.com/orgs/go-ego/people)\r\n- [Contributors](https://github.com/go-ego/gse/graphs/contributors)\r\n\r\n## License\r\n\r\nGse is primarily distributed under the terms of \"the Apache License (Version 2.0)\".\r\nSee [LICENSE-APACHE](http://www.apache.org/licenses/LICENSE-2.0), [LICENSE](https://github.com/go-vgo/robotgo/blob/master/LICENSE).\r\n\r\nThanks for [sego](https://github.com/huichen/sego) and [jieba](https://github.com/fxsjy/jieba)([jiebago](https://github.com/wangbin/jiebago)).\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgo-ego%2Fgse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgo-ego%2Fgse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgo-ego%2Fgse/lists"}