https://github.com/fangpenlin/loso
Chinese segmentation library
https://github.com/fangpenlin/loso
Last synced: 11 months ago
JSON representation
Chinese segmentation library
- Host: GitHub
- URL: https://github.com/fangpenlin/loso
- Owner: fangpenlin
- License: bsd-3-clause
- Created: 2011-04-15T14:17:20.000Z (almost 15 years ago)
- Default Branch: master
- Last Pushed: 2011-04-15T15:01:44.000Z (almost 15 years ago)
- Last Synced: 2025-03-29T02:04:35.799Z (11 months ago)
- Language: Python
- Homepage:
- Size: 115 KB
- Stars: 82
- Watchers: 5
- Forks: 23
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-machine-learning - loso - Another Chinese segmentation library. **[Deprecated]** (Python / General-Purpose Machine Learning)
- awesome-machine-learning - loso - Another Chinese segmentation library. **[Deprecated]** (Python / General-Purpose Machine Learning)
- fucking-awesome-machine-learning - loso - Another Chinese segmentation library. **[Deprecated]** (Python / General-Purpose Machine Learning)
- awesome-machine-learning - loso - Another Chinese segmentation library. **[Deprecated]** (Python / General-Purpose Machine Learning)
- awesome-chinese-nlp - loso 中文分词
- awesome-machine-learning - loso - Another Chinese segmentation library. **[Deprecated]** (Python / General-Purpose Machine Learning)
README
What is loso?
=============
loso is a Chinese segmentation system written in Python. It was developed by Victor Lin (bornstub@gmail.com) for Plurk Inc.
Copyright & Licnese
===================
Copyright of loso owns by Plurk Inc. It is an open source under BSD license.
Setup loso
==========
To install loso, clone the repo and run following command
::
cd loso
python setup.py develop
Also, you need to run a redis_ database for storing the lexicon database. Also, you need to copy configuration template and modify it.
::
cp default.yaml myconf.yaml
vim myconf.yaml
To use your configuration, you have to set the configuration environment variable LOSO_CONFIG_FILE. For example:
::
LOSO_CONFIG_FILE=myconfig.yaml python setup.py server
.. _redis: http://redis.io/
Use loso
========
Loso determines segmentation according to the lexicon database, and the algorithm is based on Hidden Makov Model, therefore, it is not possible to use the service before building a lexicon database.
To feed a text file to the database, here you can run
::
python setup.py feed -f /home/victorlin/plurk_src/realtime_search/word_segment/sample_data/sample_tr_ch
To clean the database, you can run
::
python setup.py reset
To interact and test for splitting terms, here you can run
::
python setup.py interact
For example
::
Text: 留下鉅細靡遺的太空梭發射影片,供世人回味
....
留下 鉅細靡遺 的 太空梭 發射 影片 供 世人 回味
To use the segmentation service as XMLRPC service, here you can run
::
python setup.py serve
Following is a simple Python program for showing how to use it
::
import xmlrpclib
proxy = xmlrpclib.ServerProxy("http://localhost:5566/")
terms = proxy.splitTerms(u'留下鉅細靡遺的太空梭發射影片,供世人回味')
print ' '.join(terms)
And the output should be
::
留下 鉅細靡遺 的 太空梭 發射 影片 供 世人 回味