https://github.com/davidar/pytextcat
https://github.com/davidar/pytextcat
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/davidar/pytextcat
- Owner: davidar
- License: lgpl-2.1
- Created: 2009-07-12T04:04:44.000Z (almost 17 years ago)
- Default Branch: master
- Last Pushed: 2009-09-25T09:00:05.000Z (over 16 years ago)
- Last Synced: 2025-02-09T23:48:13.261Z (over 1 year ago)
- Language: Python
- Homepage: http://da.vidr.cc/projects/pytextcat/
- Size: 402 KB
- Stars: 2
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README
- License: COPYING
Awesome Lists containing this project
README
PyTextCat[1] guesses the language of a given input text from over 70 different
languages.
It is an implementation of the classification technique described by William B.
Cavnar & John M. Trenkle (1994) in N-Gram-Based Text Categorization[2], and is
based upon Gertjan van Noord's Perl implementation[3].
textcat.py provides a command-line interface to the library. Run it with no
arguments to see usage information.
PyTextCat is released under the LGPLv2.1 (see COPYING.LESSER and COPYING).
The lm files and test texts are from TextCat[3], and are licensed under the
same license.
[1] http://da.vidr.cc/projects/pytextcat/
[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9367
[3] http://www.let.rug.nl/vannoord/TextCat/