https://github.com/percyliang/brown-cluster
C++ implementation of the Brown word clustering algorithm.
https://github.com/percyliang/brown-cluster
Last synced: about 1 month ago
JSON representation
C++ implementation of the Brown word clustering algorithm.
- Host: GitHub
- URL: https://github.com/percyliang/brown-cluster
- Owner: percyliang
- Created: 2012-07-24T18:23:25.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2023-09-10T05:05:39.000Z (over 1 year ago)
- Last Synced: 2024-07-31T22:43:58.816Z (9 months ago)
- Language: C++
- Size: 57.6 KB
- Stars: 424
- Watchers: 32
- Forks: 136
- Open Issues: 15
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
- low-resource-languages - brown-cluster - C++ implementation of the Brown word clustering algorithm. (Software / Utilities)
README
Implementation of the Brown hierarchical word clustering algorithm.
Percy Liang
Release 1.3
2012.07.24Input: a sequence of words separated by whitespace (see input.txt for an example).
Output: for each word type, its cluster (see output.txt for an example).
In particular, each line is:
Runs in $O(N C^2)$, where $N$ is the number of word types and $C$
is the number of clusters.References:
Brown, et al.: Class-Based n-gram Models of Natural Language
http://acl.ldc.upenn.edu/J/J92/J92-4003.pdfLiang: Semi-supervised learning for natural language processing
http://cs.stanford.edu/~pliang/papers/meng-thesis.pdfCompile:
make
Run:
# Clusters input.txt into 50 clusters:
./wcluster --text input.txt --c 50
# Output in input-c50-p1.out/paths============================================================
Change Log1.3: compatibility updates for newer versions of g++ (courtesy of Chris Dyer).
1.2: make compatible with MacOS (replaced timespec with timeval and changed order of linking).
1.1: Removed deprecated operators so it works with GCC 4.3.============================================================
(C) Copyright 2007-2012, Percy Lianghttp://cs.stanford.edu/~pliang
Permission is granted for anyone to copy, use, or modify these programs and
accompanying documents for purposes of research or education, provided this
copyright notice is retained, and note is made of any changes that have been
made.These programs and documents are distributed without any warranty, express or
implied. As the programs were written for research purposes only, they have
not been tested to the degree that would be advisable in any important
application. All use of these programs is entirely at the user's own risk.