{"id":13538225,"url":"https://github.com/clab/fast_align","last_synced_at":"2025-04-09T11:06:59.651Z","repository":{"id":41308982,"uuid":"9280298","full_name":"clab/fast_align","owner":"clab","description":"Simple, fast unsupervised word aligner","archived":false,"fork":false,"pushed_at":"2022-07-19T17:12:12.000Z","size":64,"stargazers_count":750,"open_issues_count":37,"forks_count":161,"subscribers_count":24,"default_branch":"master","last_synced_at":"2025-04-02T10:08:27.357Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-04-07T17:32:37.000Z","updated_at":"2025-03-24T08:46:35.000Z","dependencies_parsed_at":"2022-08-04T10:30:23.894Z","dependency_job_id":null,"html_url":"https://github.com/clab/fast_align","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clab%2Ffast_align","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clab%2Ffast_align/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clab%2Ffast_align/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clab%2Ffast_align/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clab","download_url":"https://codeload.github.com/clab/fast_align/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248027407,"owners_count":21035594,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T09:01:08.225Z","updated_at":"2025-04-09T11:06:59.622Z","avatar_url":"https://github.com/clab.png","language":"C++","readme":"fast_align\n==========\n\n`fast_align` is a simple, fast, unsupervised word aligner.\n\nIf you use this software, please cite:\n* [Chris Dyer](http://www.cs.cmu.edu/~cdyer), [Victor Chahuneau](http://victor.chahuneau.fr), and [Noah A. Smith](http://www.cs.cmu.edu/~nasmith). (2013). [A Simple, Fast, and Effective Reparameterization of IBM Model 2](http://www.ark.cs.cmu.edu/cdyer/fast_valign.pdf). In *Proc. of NAACL*.\n\nThe source code in this repository is provided under the terms of the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0.html).\n\n## Input format\n\nInput to `fast_align` must be tokenized and aligned into parallel sentences. Each line is a source language sentence and its target language translation, separated by a triple pipe symbol with leading and trailing white space (` ||| `). An example 3-sentence German–English parallel corpus is:\n\n    doch jetzt ist der Held gefallen . ||| but now the hero has fallen .\n    neue Modelle werden erprobt . ||| new models are being tested .\n    doch fehlen uns neue Ressourcen . ||| but we lack new resources .\n\n## Compiling and using `fast_align`\n\nBuilding `fast_align` requires a modern C++ compiler and the [CMake]() build system. Additionally, the following libraries can be used to obtain better performance\n\n * OpenMP (included with some compilers, such as GCC)\n * libtcmalloc (part of Google's perftools)\n * libsparsehash\n\nTo install these on Ubuntu:\n    \n    sudo apt-get install libgoogle-perftools-dev libsparsehash-dev\n\nTo compile, do the following\n\n    mkdir build\n    cd build\n    cmake ..\n    make\n\nRun `fast_align` to see a list of command line options.\n\n`fast_align` generates *asymmetric* alignments (i.e., by treating either the left or right language in the parallel corpus as primary language being modeled, slightly different alignments will be generated). The usually recommended way to generate *source–target* (left language–right language) alignments is:\n\n    ./fast_align -i text.fr-en -d -o -v \u003e forward.align\n\nThe usually recommended way to generate *target–source* alignments is to just add the `-r` (“reverse”) option:\n\n    ./fast_align -i text.fr-en -d -o -v -r \u003e reverse.align\n\nThese can be symmetrized using the included `atools` command using a variety of standard symmetrization heuristics, for example:\n\n    ./atools -i forward.align -j reverse.align -c grow-diag-final-and\n\n## Output\n\n`fast_align` produces outputs in the widely-used `i-j` “Pharaoh format,” where a pair `i-j` indicates that the \u003ci\u003ei\u003c/i\u003eth word (zero-indexed) of the left language (by convention, the *source* language) is aligned to the \u003ci\u003ej\u003c/i\u003eth word of the right sentence (by convention, the *target* language). For example, a good alignment of the above German–English corpus would be:\n\n    0-0 1-1 2-4 3-2 4-3 5-5 6-6\n    0-0 1-1 2-2 2-3 3-4 4-5\n    0-0 1-2 2-1 3-3 4-4 5-5\n\n## Acknowledgements\n\nThe development of this software was sponsored in part by the U.S. Army Research Laboratory and the U.S. Army Research Ofﬁce under contract/grant number W911NF-10-1-0533.\n\n","funding_links":[],"categories":["AWESOME: Aligning Word Embedding Spaces of Multilingual Encoders","Software","Mehrsprachiges Korpus-Alignment"],"sub_categories":["Model performance","Utilities","Wort-Alignment-Tools"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclab%2Ffast_align","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclab%2Ffast_align","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclab%2Ffast_align/lists"}