{"id":13380765,"url":"https://github.com/fangpenlin/loso","last_synced_at":"2025-04-15T20:31:54.161Z","repository":{"id":41092711,"uuid":"1619210","full_name":"fangpenlin/loso","owner":"fangpenlin","description":"Chinese segmentation library","archived":false,"fork":false,"pushed_at":"2011-04-15T15:01:44.000Z","size":118,"stargazers_count":82,"open_issues_count":0,"forks_count":23,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-29T02:04:35.799Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fangpenlin.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-04-15T14:17:20.000Z","updated_at":"2025-01-21T04:04:30.000Z","dependencies_parsed_at":"2022-07-30T21:08:03.165Z","dependency_job_id":null,"html_url":"https://github.com/fangpenlin/loso","commit_stats":null,"previous_names":["victorlin/loso"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fangpenlin%2Floso","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fangpenlin%2Floso/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fangpenlin%2Floso/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fangpenlin%2Floso/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fangpenlin","download_url":"https://codeload.github.com/fangpenlin/loso/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249148002,"owners_count":21220459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T09:00:38.376Z","updated_at":"2025-04-15T20:31:53.840Z","avatar_url":"https://github.com/fangpenlin.png","language":"Python","funding_links":[],"categories":["Python","Natural Language Processing","Chinese NLP Toolkits 中文NLP工具"],"sub_categories":["General-Purpose Machine Learning","Chinese Word Segment 中文分词"],"readme":"What is loso?\n=============\n\nloso is a Chinese segmentation system written in Python.  It was developed by Victor Lin (bornstub@gmail.com) for Plurk Inc.\n\nCopyright \u0026 Licnese\n===================\n\nCopyright of loso owns by Plurk Inc.  It is an open source under BSD license.\n\nSetup loso\n==========\n\nTo install loso, clone the repo and run following command\n\n::\n\n   cd loso\n   python setup.py develop\n\nAlso, you need to run a redis_ database for storing the lexicon database. Also, you need to copy configuration template and modify it.  \n\n::\n\n   cp default.yaml myconf.yaml\n   vim myconf.yaml\n\nTo use your configuration, you have to set the configuration environment variable LOSO_CONFIG_FILE. For example:\n\n::\n\n   LOSO_CONFIG_FILE=myconfig.yaml python setup.py server\n\n.. _redis: http://redis.io/\n\nUse loso\n========\n\nLoso determines segmentation according to the lexicon database, and the algorithm is based on Hidden Makov Model, therefore, it is not possible to use the service before building a lexicon database.\n\nTo feed a text file to the database, here you can run\n\n::\n\n   python setup.py feed -f /home/victorlin/plurk_src/realtime_search/word_segment/sample_data/sample_tr_ch\n\n\nTo clean the database, you can run\n\n::\n\n   python setup.py reset\n\nTo interact and test for splitting terms, here you can run\n\n::\n\n   python setup.py interact\n\n\nFor example\n\n::\n\n   Text: 留下鉅細靡遺的太空梭發射影片，供世人回味\n   ....\n   留下 鉅細靡遺 的 太空梭 發射 影片 供 世人 回味\n\n\nTo use the segmentation service as XMLRPC service, here you can run\n\n\n::\n\n   python setup.py serve\n\n\nFollowing is a simple Python program for showing how to use it\n\n::\n\n   import xmlrpclib\n   \n   proxy = xmlrpclib.ServerProxy(\"http://localhost:5566/\")\n   \n   terms = proxy.splitTerms(u'留下鉅細靡遺的太空梭發射影片，供世人回味')\n   print ' '.join(terms)\n\nAnd the output should be \n\n\n::\n\n  留下 鉅細靡遺 的 太空梭 發射 影片 供 世人 回味\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffangpenlin%2Floso","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffangpenlin%2Floso","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffangpenlin%2Floso/lists"}