{"id":13740199,"url":"https://github.com/ldmt-muri/alignment-with-openfst","last_synced_at":"2025-05-08T19:36:39.838Z","repository":{"id":4366697,"uuid":"5503152","full_name":"ldmt-muri/alignment-with-openfst","owner":"ldmt-muri","description":null,"archived":false,"fork":false,"pushed_at":"2016-12-09T17:42:36.000Z","size":55741,"stargazers_count":21,"open_issues_count":63,"forks_count":8,"subscribers_count":19,"default_branch":"master","last_synced_at":"2024-11-15T10:41:16.619Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ldmt-muri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-08-22T01:07:14.000Z","updated_at":"2023-11-06T07:13:23.000Z","dependencies_parsed_at":"2022-09-14T15:50:19.569Z","dependency_job_id":null,"html_url":"https://github.com/ldmt-muri/alignment-with-openfst","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ldmt-muri%2Falignment-with-openfst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ldmt-muri%2Falignment-with-openfst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ldmt-muri%2Falignment-with-openfst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ldmt-muri%2Falignment-with-openfst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ldmt-muri","download_url":"https://codeload.github.com/ldmt-muri/alignment-with-openfst/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253135782,"owners_count":21859687,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T04:00:44.336Z","updated_at":"2025-05-08T19:36:38.931Z","avatar_url":"https://github.com/ldmt-muri.png","language":"C++","funding_links":[],"categories":["Software"],"sub_categories":["Utilities"],"readme":"#disclaimer: \nThis is work in progress. If you encounter any problems while compiling or using it, it is likely our mistake not yours. Please contact wammar@cs.cmu.edu with questions, comments, and suggestions.\n\n\n#description:\nThis is an implementation of the CRF autoencoder framework for four tasks:\n* bitext word alignment\n* part-of-speech tagging\n* code switching\n* dependency parsing\n\nOur NIPS 2014 [paper](http://arxiv.org/pdf/1411.1147v2.pdf) describes the CRF autoencoder framework as well as the bitext word alignment and part-of-speech induction tasks in detail. Details on code-switching can be found in our EMNLP shared task [paper](http://www.aclweb.org/anthology/W14-3909).\n\n#dependencies:\n* [cdec](https://github.com/redpony/cdec)\n* [boost 1.54](http://www.boost.org/users/history/version_1_54_0.html) \n* [libLBFGS-1.10](https://github.com/downloads/chokkan/liblbfgs/liblbfgs-1.10.tar.gz) \n* [MPI-1.8](http://www.open-mpi.org/software/ompi/v1.8/) \n* [openfst-1.3.2](http://www.openfst.org/twiki/bin/view/FST/FstDownload) \n* [python 2.7](https://www.python.org/download/releases/2.7.3/) \n\n#how to build\nI'm assuming your default compiler is either gcc 4.6.3, clang 3.1-8 (or later \"fingers crossed\")\n* bitext word alignment: make -f Makefile-latentCrfAligner\n* part-of-speech tagging: make -f Makefile-latentCrfPosTagger\n* code switching: make -f Makefile-latentCrfPosTagger (this is not a typo)\n* dependency parsing: make -f Makefile-latentCrfParser (still in the works)\n\n# example invocations: \n## part of speech tagging:\n```train-latentCrfPosTagger \n  --output-prefix prefix # just a filename prefix for files generated during training\n  --train-data sent-per-line-space-delimited-tokens.txt # example file below\n  --feat LABEL_BIGRAM --feat PRECOMPUTED --feat EMISSION \n  --feat BOUNDARY_LABELS --feat PRECOMPUTED_XIM2 --feat PRECOMPUTED_XIM1 \n  --feat PRECOMPUTED_XI --feat PRECOMPUTED_XIP1 --feat PRECOMPUTED_XIP2 \n  --feat OTHER_ALIGNERS\n  --min-relative-diff 0.001\n  --optimizer adagrad --minibatch-size 8000\n  --max-iter-count 50\n  --cache-feats true                                                                                                       \n  --wordpair-feats word-level-features```\n\nfor a list of all options: execute ``latentCrfAligner --help``\n\n### snippet of the file ``sent-per-line-space-delimited-tokens.txt``\n```\nMs. Haag plays Elianti .\nRolls-Royce Motor Cars Inc. said it expects its U.S. sales to remain steady at about 1,200 cars in 1990 .\n```\n\n### snippet of the file ``word-level-features``\n```\nexpects  starts-with-e 1 starts-with-ex 1 ends-with-ts 1 ends-with-s 1\nplays  starts-with-p 1 starts-with-pl 1 ends-with-ys 1 ends-with-s 1\n```\n\n## using multiprocesses:\n```mpirun 32 train-latentCrfAligner [options]```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fldmt-muri%2Falignment-with-openfst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fldmt-muri%2Falignment-with-openfst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fldmt-muri%2Falignment-with-openfst/lists"}