{"id":24315475,"url":"https://github.com/asb-capfan/alpaco","last_synced_at":"2025-10-13T22:08:53.224Z","repository":{"id":136337825,"uuid":"43304966","full_name":"asb-capfan/Alpaco","owner":"asb-capfan","description":"Alpaco is a Perl/Tk based GUI that allows users to manually align parallel text. It is based on the Blinker project and uses their format for parallel text","archived":false,"fork":false,"pushed_at":"2015-10-01T20:32:44.000Z","size":216,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-13T22:08:52.958Z","etag":null,"topics":["alignment-files","application","blinker","computational-linguistics","perl"],"latest_commit_sha":null,"homepage":"http://www.d.umn.edu/~tpederse/parallel.html","language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/asb-capfan.png","metadata":{"files":{"readme":"README.md","changelog":"Changes","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-09-28T14:12:45.000Z","updated_at":"2021-03-25T10:50:45.000Z","dependencies_parsed_at":"2023-03-13T10:42:38.259Z","dependency_job_id":null,"html_url":"https://github.com/asb-capfan/Alpaco","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/asb-capfan/Alpaco","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asb-capfan%2FAlpaco","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asb-capfan%2FAlpaco/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asb-capfan%2FAlpaco/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asb-capfan%2FAlpaco/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/asb-capfan","download_url":"https://codeload.github.com/asb-capfan/Alpaco/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asb-capfan%2FAlpaco/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279017152,"owners_count":26085983,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment-files","application","blinker","computational-linguistics","perl"],"created_at":"2025-01-17T11:15:47.953Z","updated_at":"2025-10-13T22:08:53.172Z","avatar_url":"https://github.com/asb-capfan.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ALPACO\n\n## Aligner for Parallel Corpora\n\nCopyright (C) 2003\n\n* Brian Rassier, rass0028@d.umn.edu\n* Ted Pedersen, tpederse@umn.edu\n* University of Minnesota, Duluth\n    \nhttp://www.d.umn.edu/~tpederse/parallel.html\n\n\n## 1. Introduction:\n\nAlpaco (Aligner for Parallel Corpora) is a program that is designed to \nalign parallel texts.  If two files are known to be translations \nof each other, Alpaco can be used to manually align them (word-by-word or\nphrase-by-phrase) and save the alignments for future reference.  \n\nAlpaco can take the following as input:  raw text files, Blinker data \n(explained in section 3), and previously aligned text files (Alpaco format).  \nAlpaco format is basically a superset of the Blinker data, which is explained\nin sections 3 and 4.  Alpaco also has the ability to read in raw text files \nline-by-line for easier use with large text files.  This gets a bit more \ncomplicated with the naming scheme, which is explained later in the README\n(section 5 and 6.1.2).\n\nThis README continues with brief notes about Alpaco, and how a user would \ntypically use this alignment tool.  \n\n\n\n## 2. Packages Needed:\n\nAlpaco was written using perl  v5.6 and Tk v800.023.  Any versions at this \nlevel or higher will work with Alpaco.  Any lesser versions may work, but \nthey have not been tested.\n\nAlpaco uses two modules which were necessary for making the tool easier for \nthe user.  The modules necessary are Tk:HistEntry.pm (version 0.37 or higher)\nand Tk:SimpleFileSelect.pm (version 0.66 or higher).   They can both be found\nat: http://www.perl.com/CPAN/modules/by-module/Tk/.\nThey are both distributed with Readme files, which will explain the \ninstallation.  To see if they were installed correctly, make a simple perl/Tk\nscript which contains the following:\n\n    #!/usr/local/bin/perl\n    use Tk;\n    use Tk::HistEntry;\n    use Tk::SimpleFileSelect;\n    \n    my $mw = MainWindow-\u003enew;\n    $e1 = $mw-\u003eHistEntry(\n        -textvariable =\u003e \\$file1,\n        -background =\u003e \"white\",\n        -takefocus =\u003e 1,\n    )-\u003epack(-side,'top', -anchor,'w');\n    $mw-\u003eButton(\n        -text =\u003e 'Find A File',\n        -command =\u003e \\\u0026find,\n    )-\u003epack(-side =\u003e 'bottom');\n    $mw-\u003eButton(\n        -text =\u003e 'Close',\n        -command =\u003e sub{exit;},\n    )-\u003epack(-side =\u003e 'bottom');\n    \n    MainLoop;\n    \n    sub find{\n        $top = $mw-\u003eSimpleFileSelect;\n        $found = $top-\u003eShow;\n        $file1 = $found;\n        $e1-\u003ehistoryAdd($found);\n    }\n\nThis script will make a simple interface that uses both modules.  If \neverything is working properly, this program will work also.  If there are \nerrors, there may be a problem with the installation of the modules.  In that\ncase, try visiting www.perl.org or www.cpan.org. \n\n\n\n## 3. Blinker Data:\n\nThere was a similar project done at NYU, which was named the Blinker.  The \nAlpaco project is based on the Blinker project.  Information about the \nBlinker research, and the Blinker data is available at: \nhttp://www.cs.nyu.edu/cs/projects/proteus/blinker. The file structure to \nAlpaco is based on the Blinker system, and Alpaco can also read and edit \nthe Blinker data.  The Blinker data is stored in a format of two columns of \nnumbers.  Each column of numbers refers to the word number in the respective \ntext.  The numbers in each row are alignments of each other.  A simple \nexample will make this more clear.  A sample blinker file may consist of two\ncolumns as follows:\n\n    1 4\n    3 3\n    2 1\n    0 2\n\nThe first column would correspond to words in the source text, and the second \ncolumn would correspond to words in the target text.  The connections would be\nas follows: \n\n* The first word in the source text would be connected to the fourth word in the target text.\n* The third word in the source text would be connected to the third word in the target text.\n* The second word in the source text would be connected to the first word in the target text.\n* The second word in the target text would have no connection to the source text (Null Connection).\n\nThe last alignment is a bit different. If there is a zero in this format, it \nindicates a null connection.  This means there is no word that is an alignment \nin the other text.  For example in row four, it shows that word two of the \ntarget has no alignment to the source.  Null connections are explained a bit \nmore in sections 4 and section 6.5.2.\n \nThere was a complex naming scheme to the Blinker data, which is related to \nAlpaco's usage in the file blinker_data.txt included with this package.  It \nis also explained next.\n\nThe Blinker was used to align verses of the English Bible to the French \nBible.  There were seven annotators that worked with this tool, and they \naligned 25 files with 10 verses in each file.  The 10 verses are broken into\nsub-sections of the larger text file, and are separated by a newline character.\nThe aligned files are named in the format samp*.SentPair? where the \ncorresponding verse number is calculated by: \n\n    verse # = (samp# - 1) * 10 + SentPair# + 1  \n\nThis is the equation that is referred to throughout the README.  The parallel \ntext files are named EN.sample.* for English and FR.sample.* for French.  Here \nis an example of some Blinker data from the given web-site.  These samples, and\nthe 10 alignment files are included with the Alpaco package.  The 10 \nalignment files are in the A1 directory.  They can be opened as Alpaco files \nto see a typical alignment.\n\n### EN.sample.1 (only verse one (sub-section one) is shown here)\n\n    After all , if you were cut out of an olive tree that is wild by nature , and contrary to nature were grafted into a cultivated olive tree , how much more readily will these , the natural branches , be grafted into their own olive tree !\n\n### FR.sample.1 (only verse one(sub-section one) is shown here)\n\n    Si toi , tu as été coupé de le olivier naturellement sauvage , et enté contrairement à ta nature sur le olivier franc , à plus forte raison eux seront - ils entés selon leur nature sur leur propre olivier .\n\n\n### A1/samp1.SentPair0 ---\u003e (connections for the first Annotator, for the first verse of the given text samples)\n\n    4 1\n    5 2\n    5 4\n    7 7\n    9 8\n    8 8\n    10 9\n    12 10\n    11 10\n    16 11\n    17 11\n    13 12\n    14 12\n    15 12\n    18 13\n    19 14\n    24 15\n    20 16\n    21 17\n    22 18\n    22 19\n    23 5\n    23 6\n    6 5\n    6 6\n    25 20\n    26 21\n    28 22\n    29 22\n    27 23\n    30 24\n    1 0\n    2 0\n    3 3\n    49 41\n    47 40\n    48 40\n    46 39\n    45 38\n    44 37\n    43 33\n    32 26\n    33 26\n    34 27\n    34 28\n    36 29\n    35 30\n    42 30\n    39 34\n    39 35\n    39 36\n    40 32\n    37 0\n    38 0\n    41 0\n    31 0\n    0 31\n    0 25\n\nThese are all the connections from Annotator 1 for the verses (sub-sections) \nfound by the equation given above.  The order to these numbers does not \nmatter, it is just in the order that the alignments were made.  This file\nshows an alignment of the 4th word in the English text to the 1st word in the\nFrench text etc.  There are cases of a single word aligning to multiple words \nin this file. For example, word 22 in English (left column) is aligned to two \nwords in French (right column), the 18th and 19th words.  Multiple words in one\nlanguage can be aligned to multiple words in the other (phrase-by-phrase \nalignments), but this file does not have an example of this.  The zeros in each\ncolumn, as explained earlier, represent that there is no alignment for that \nword to the other file.  These alignments are much easier seen when the file is\nloaded into Alpaco.  To view this particular alignment in Alpaco, click the \n\"file\" menu, then the \"Open an Alpaco File\" option.  In the box that appears, \ntype \"A1/samp1.SentPair0\".  This will open the files from above, and load the \nalignments for viewing/editing. \n\n\n\n## 4. Alpaco data:\n\nAs stated earlier, Alpaco data is stored much like the Blinker data.  The \nBlinker files are a subset of the Alpaco files.  This just means that Alpaco\ncan use the blinker files explained earlier, plus one other type.  The second\ntype is identical to Blinker files, except it has a different first line. The \nfirst line consists of two file names.  These lines are then loaded so an \nequation is not needed to find the appropriate files.  This file type is very\nuseful if text files are aligned as a whole.  The file naming then becomes \nmuch simpler, which is explained more in section 5.  After this first line,\nthere are two columns of numbers which are used the exact same way as explained\nearlier.  Another note to make is that if there is ever a zero in a column, \nit means that the opposite column has no connection, or a null connection (as \nexplained earlier).  An example of a typical Alpaco file is as follows:\n\n    file1 file2\n    1 1\n    2 4\n    3 2\n    0 3\n\nIn this file, there would be two files named file1 and file2.  They would \nhave the following manual alignments:\n\n* The first word in file1 would be connected to the first word in file2.\n* The second word in file1 would be connected to the fourth word in file2.\n* The third word in file1 would be connected to the second word in file2\n* The third word in file2 would have no connection to file1 (Null Connection).\n\nThese may not be the only words in the files, but they are the only words \nthat have been manually aligned.\n\nThere is one case where Alpaco files are not saved with file names at the top.\nAlpaco can be used to break up text files into sub-sections by newline \ncharacters.   This way the alignments can be made on a smaller scale (explained\nmore in section 6).  In this case, Alpaco will use an equation similar to the \nBlinker, so names are not stored with the connections.  The Alpaco equation is:\n\n    sub-section # = (samp# - 1) * sub-sections + SentPair# + 1  \n\nThe only difference between this equation and the Blinker equation is that \nthere is a variable (sub-sections) in the place of the number 10 in the Blinker\nequation.  This is for flexibility so that users can break files up into any \nnumber of sub-sections.  If files are not broken up into 10 sub-sections then \nthe data limits must be changed, which is explained in more detail in section \n6.12.  Throughout the Readme, if files are said to be in the \"Blinker\" format, \nit just means that the naming scheme follows this equation or the similar \nBlinker equation given earlier. \n\nAlpaco adjusts for the two types of files when files are opened.  It looks for \nfile names at the top of the Alpaco file, but if they are not found it tries \nloading with the given equation.  If Alpaco is used with the equation, naming \nstandards for the files are very strict, which is explained in the next \nsection.  This is so the equation can be used properly.  This format is very \nuseful when looking at connections, because users can skip through to the next \nfile/annotator by simply pressing a button.  This means that they do not have \nto type the whole file path in order to open the next file.\n\n\n\n## 5. File Naming Standards:\n\nThere are two different ways Alpaco can be used, as far as file naming is \nconcerned.  When deciding which naming standard to choose, users should \ndetermine if there will be a large amount of data to be aligned, or if just a \nfew files will be aligned.  The choice is up to the user, but both standards \nwill be explained next.  Throughout this section it is said that parallel text \nfiles are broken up into \"sub-sections\".  This simply means that there is a \nnewline character that separates sections of the larger text file.  These \nsub-sections must be made by the user, and are assumed to represent alignments \nat a sentence level.  This means that the sub-sections are known to be \nalignments of each other, and Alpaco will help to make alignments at the \nword/phrase level.\n\n\n### 5.1. File Naming with the Alpaco Format\n\nThis format is very useful if there will only be small amounts of data/files\nto be aligned.  Files can be aligned as a whole or line-by line in this \nformat.  In order to save in the line-by-line format with this naming scheme,\nusers must save with the \"Save Current Work to File\" option from the\nfile menu (see section 6.3.2).  In the non-line-by-line format, simply use the \n\"Save an Alpaco File\" option from the file menu.  Users will then be prompted \nfor a name for the file.  In both of these situations the tool will take the \nnames of the two text files, insert them at the top of the saved Alpaco file, \nthen save the connections after the file names.  This will ensure that the user\nwon't have to keep track of a naming scheme.  The user can name the files what \nthey choose, and load/edit them by that name.\n\n\n### 5.2. File Naming With the Blinker Format\n\nThis format is helpful when large amounts of data/files are needed.  The \nformat is very strict, but if followed can be very beneficial.  \n\nA large benefit for this format is that large files can be broken up to be \naligned in smaller sections.  This must be done in line-by-line mode.  When in \nline-by-line mode, the tool will look for a newline character, and load \nsegments separated by this character.  The input files must be split up by \nthese newlines by the user before the file is loaded.  This way the user can \nseparate the files how they wish.  A good example of how to break up the files \nis seen in the Blinker data included with Alpaco (EN.sample.1 and FR.sample.1).\n  \nThe first naming stipulation is that of all the following items are in the same\ndirectory:\n\n* annotators' directories\n* parallel text files\n\nThe annotators' directories must be named in the format A# with the # being \nthe number of annotator it is.  If only one annotator is used the user will \nsimply have one directory named A1. \n\nIn these directories will be the files with the connection information listed\npreviously.  The format listed previously holds here too.  They will be named\nsamp*.SentPair? where the samp number is the text file associated with this \nalignment, and SentPair number is the sub-section number (starting at 0) \nwithin this text file.\n\nThe default data values for Alpaco is having 7 annotators (7 directories named\nA1 - A7), 25 text samples (25 different prefixes named similar to \nA1/samp1.SentPair? - A1/samp25.SentPair? for all 7 annotators), and 10 \nsub-sections per text sample (each text file is separated by 10 newlines, thus \nbroken into 10 smaller alignments. Naming follows: A1/samp1.SentPair0 - A1/samp1.SentPair9).\nThese defaults are from the Blinker data, so Alpaco is set up to\nread/edit Blinker data by default.  These data limits can be changed by \nchoosing \"Change Data Limits\" from Alpaco's options menu.  This must be done \nevery time Alpaco starts if the data limits differ from Blinker's standards.  \nThis is explained further in section 6.\n\nThe parallel text files must be in the main directory, but outside the \nannotators' directories.  They must be separated by the two languages that are \nrepresented.  One language must be named EN.sample.?, and the other must be\nFR.sample.?. These question marks must match up with parallel text files \nbeing the same.  The default number for this, as stated earlier is 25 files. \n(EN.sample.1 - EN.sample.25 and FR.sample.1 - FR.sample.25).  This data limit\ncan be changed the same way as listed previously, and is explained further in\nsection 6.\n\nHere is an example of a naming standard.  There is one directory, in this \nexample named test_data.  In this directory should be: annotators' directories\nand parallel text files.  In this example, if there were 2 annotators, \nthere would be 2 directories named A1 and A2.  If there were 10 parallel text \nfiles, they would be within the test_data directory, and named \ntest_data/EN.sample.1 - test_data/EN.sample.10 and test_data/FR.sample.1 - \ntest_data/FR.sample.10.  Within the annotators' folders would be the aligned \nfiles.  If each text sample was separated into 3 sub-sections, then the aligned\nfiles would be named test_data/A1/samp1.SentPair0 - \ntest_data/A1/samp10.SentPair2.  The same would be done for the A2 directory.  \nWith this example, the files to be loaded are the SentPair files (aligned \nfiles).  Simply choose \"Open an Alpaco File\" from the file menu and enter a \nfile similar to test_data/A1/samp1.SentPair0.  The tool will then find the \ncorrect text and connection information, and load it.\n\nAnother example of this standard is with the included files.  EN.sample.1 and\nFR.sample.1 are included, which are Blinker sample texts.  These text samples \nare broken up into 10 sub-sections, so in A1 there are 10 example alignment \nfiles (samp1.SentPair0 - samp1.SentPair1).  The files in the A1 directory can \nactually be opened as Alpaco files, and then the alignments can be seen/edited.\n\nAlthough this is a very strict naming strategy, it must be done for the Blinker\nequation to be used.  It is also a must for the benefits that come from it \n(next/previous file and annotator etc).  These benefits are further explained\nin section 6.\n\n\n\n## 6. Alpaco Usage:\n\nAfter the file naming standards are learned, the rest of Alpaco is very simple\nto use.  This section will go through the main abilities Alpaco has, and the \nthings a typical user would do with Alpaco.\n\n\n### 6.1. Loading Raw Text Files \n\nLoading raw text files is a great way to start learning the Alpaco tool.  This\nis also the main way that alignments can be made with Alpaco. The other way \nalignments can be made is by loading previously aligned files, and editing\nthem.  The second option is explained later in this section.  If raw text \nfiles are used, Alpaco considers tokens as anything separated by a space.  If\npunctuation, or anything else is to be considered as its own token, the file\nmust be separated in this fashion before loading into Alpaco.  Included in \nthis package is a helper tool that can separate these tokens for a user.  See\nsection 7 for more information about this helper tool.\n\n#### 6.1.1. Loading Raw Text Files as a Whole\n\nLoading a raw text file as a whole is very simple.  Just type in the file name\nin the entry box (for source or target), and press enter.  The term source \nand target files are used occasionally with Alpaco.  The source file is on \nthe left side of the interface, and the target file is on the right.  Once \nsource and target texts are loaded, the alignment process can begin.\n\n#### 6.1.2. Loading Raw Text Files Line-by-Line\n\nLoading text files line-by-line is a little more complicated. First the \noption must be enabled by pressing the button to the right of the file entry\nboxes.  Then files can be loaded exactly like a whole file, but it will only\nread one line at a time (separated by a newline character).  Files can be \nsaved in the Blinker format by saving as an Alpaco file.  The strict \nnaming standards must be followed in this form, so name the files \naccordingly.  Alignments can also be saved in the Alpaco format in line-by-line\nmode.  To do this the user must choose the \"Save Current Work to File\" option.\nThis will allow users to enter separate file names for the current lines of\ntext that they are aligning.  Alpaco will then save these file names at the top\nof the alignment file.  This way Alpaco doesn't need to use the equation to \nfind the text associated with the alignment.\n\n\n### 6.2. Opening an Alpaco File\n\nThere are two types of Alpaco files.  One which is exactly like the Blinker \nfiles and will use the Blinker equation to find the associated text files, and \nanother that is similar to Blinker files, only it lists the text file names\nat the top, before the alignment information.  Both types, if saved and named \ncorrectly, can be loaded by selecting the \"Open an Alpaco File\" from the file \nmenu (or use the shortcut by pressing Ctrl + o).  Alpaco will find the \nassociated files, and load the connections.  Then the user can view the \nalignments, or edit them where needed.\n\nThere is also a find button when loading an Alpaco file.  If the user doesn't\nknow where the file is saved, they can press this button.  A window will pop\nup with the current directory information in it.  The user can find the Alpaco\nfile that is desired, and the tool will load it once the file is accepted.\n\n\n### 6.3. Saving an Alpaco File\n\nThere are two ways to save an Alpaco file.  One is just by giving a name for\nit, in which case Alpaco has the necessary information to retrieve the files\nfor this alignment.  The other way is by specifying filenames, then the name\nfor the Alpaco file.  Both situations are explained next.\n\n#### 6.3.1. \"Save an Alpaco File\" \n\nThis option is used when Alpaco knows how to get the necessary files for the\nalignment.  An example of this use is when a whole file is aligned, and the\nuser wants to save with the more simple naming standard (see section 5).  \nThis will load the filenames given at the top of the file, and then load the\nalignments.  The other time this option is used is when files are aligned\nline-by-line, and the user wants to save with the equation.  In this case the \nuser must save according to the naming standards in section 5.2.  Then Alpaco\nwill be able to find the needed files using the Blinker equation, and it only \nneeds the name of the actual Alpaco file. \n\n#### 6.3.2. \"Save Current Work to File\"\n\nThis option is used in two different situations.  The first is if there is a \nfile that has been saved with the Blinker format, and the user wants to \nchange it over to the Alpaco format (see section 5).  Then simply load the file\nand select this option from the file menu.  The user will have to enter two \nfile names for the text files, and a name for the Alpaco file.  The Alpaco file\nwill no longer need the equation, because it made new files for the current \ntext, and will record these file names at the top of the Alpaco file.  The \nsecond way to use this option is if a user is aligning line-by-line, and wants \nto save without the naming standards explained in section 5.2.  Then the user \nmust give names for the two separate lines from the larger file, and Alpaco \nwill save them as their own files.  Then give a name for the Alpaco file, and \nthe tool will access the files in this format instead of with the Blinker \nequation.\n\n\n### 6.4. Edit/Browse mode\n\nThese two modes will change whether or not the user has the ability to change\nthe alignment data.  Browse mode will show the alignments, but won't allow \nchanges to the data.  Edit mode will show the alignments, and will allow the\nuser to change the data.  Please use caution in the Edit mode, because \nalignment information can be lost or altered.  The mode can be changed by \nclicking on the desired button on the interface, or by selecting \"Change Mode\" \nfrom the options menu.  Browse mode is the default when Alpaco is started.\n\n\n### 6.5 Making Connections\n\nThere are two different kind of connections, which are regular connections,\nand null connections.  Both are explained here.\n\n#### 6.5.1 Regular Connections\n\nThis is used when there is an alignment that should be made between word(s) \nin the source text to word(s) in the target text. To make the alignment \nAlpaco must be in edit mode.  If the tool is in edit mode, simply click the \nword(s) in the source list along with the word(s) in the target list.  The \nwords should change color, indicating that they were selected.  Then hit the \nConnect button on the interface, or choose connect from the options menu.  \nThere is also a shortcut, which is dependent on the mouse being used.  Either \nhit the right mouse button, or button 1 + 2, which will also make the \nconnection.  There is also a keyboard related shortcut, which can be done by  \npressing Ctrl+c to make the connection.  Lines will be drawn to the aligned \nwords, indicating the connection.  The words will also change color, to help \nindicate which words have not been aligned yet.  When a file is saved, the \nconnections will be saved with it.\n\n#### 6.5.2. Null Connections\n\nThis option is used when there are word(s) in one file that have no matches\nin the other file.  To make a null connection, Alpaco must also be in edit\nmode.  Then simply select the word(s) with no match, and they will change \ncolor indicating their selection.  Then hit the Null Connect button, or \nchoose Null Connect from the options menu.  There is a shortcut  for this \noption also.  When the correct words are selected, press Ctrl + n, and the\nnull connection will be done.  These null connections will be indicated by a\nchange in color.  If a word has a null connection, it will turn black, just \nlike the null connect button itself.  When a file is saved, the null \nconnections will also be saved with it.\n\n\n### 6.6. Undo and Redraw Connections\n\nOnce connections are drawn, they are not necessarily permanent.  They can\nbe undone in a few different ways.  The undone connections can also be \nredrawn.\n\n#### 6.6.1. Undoing a Connection\n\nThere are three different ways to undo connections, which make undoing them\nmuch more convenient.  All three can be done the same way, by pressing the \nUndo button, by selecting Undo Connection from the options menu, or using the \nshortcut.  The shortcut is done by pressing Ctrl + u.  \n\nThe three ways to undo differ by how many words are selected.  If two words \nare selected (one from the target and one from the source), Alpaco will look \nfor a single connection between the two words, and remove it if it exists.  \nThis is helpful if only one connection needs to be removed, but words have \nmultiple connections associated with them.  If only one word is selected before\nan undo, then all the connections associated with that word will be removed.  \nThis is helpful if a word is aligned poorly, and the user just wants to start \nover with its alignment.  The final way to undo a connection is by selecting \nzero words.  This will undo the last connection that was done, one alignment at\na time.  This is useful if there were some recent mistakes that were made. \n\n#### 6.6.2. Redrawing a Connection\n\nIf a connection is undone, Alpaco will remember them until a new file is \nloaded, the files/connections are cleared, or Alpaco is exited.  To redraw an \nundone move, simply press the Redraw button on the interface, or select \n\"Redraw Connection\" from the options menu.  This option will redraw a \nconnection, one line at a time, until all the undone connections are redrawn.\n\n\n### 6.7. View Sentences\n\nThis option is very helpful when making alignments.  It helps sometimes to \nread the sentences that are to be aligned, which is what this option was made\nfor.  To view the sentences, select \"View Sentences\" from the options menu.\n\n\n### 6.8. Clear Connections/All\n\nThe first of these options, Clear Connections, is helpful if the file that is\nbeing aligned is aligned very poorly.  This way the connections that have been \nmade are removed, yet the files can still be worked on.  This will start the \naligning process from the beginning with the two files.  To clear the \nconnections, choose \"Clear Connections\" from the file menu.\n\nThe second option, Clear All, is used when the user wants to start from\nscratch.  This will remove all the connections, and the files that were being \nworked with.  To clear the entry completely, choose the \"Clear All\" \nchoice from the file menu.\n\n\n### 6.9. Finding Text Files\n\nThis choice is very helpful if the user is not very familiar with their files\nand file structures.  For this option, the user must have the SimpleFileSelect\nmodule installed (see section 2).  To find a text file, choose \"Find Text \nFiles\" from the file menu.  A window will appear with the current directory \ninformation in it.  The user can then browse the directories to look for a file\nto use.  Once a file is chosen, press the accept button.  Another window will \npop up. This window will give choices of how to use the found file.  Choose if \nit is to be used for the source (left) or target (right).  These options will \nopen the file just as if the regular open functions were used in its place.\n\n\n### 6.10. Resizing\n\nThis option can be useful if many connections are in a small area, and it is \ndifficult to see.  It is best used in browse mode only. This is because \nresizing can change the files being used to temporary sizing files, thus \nconfusing the saving process.  To resize, simply click on the + or - sign by\nthe \"Resize\" section on the right portion of the interface.  This will \nadd or reduce the space between the words, in the vertical direction.\n\n\n### 6.11. Next/Previous File and Annotator\n\nThese options are available if the more strict naming standards are used. \nThese options are a great benefit that comes from the pain of the strict \nnaming standards. For example, if a user was looking at the Blinker data set,\nthey could go to the next/previous file by pressing the respective button\non the interface.  They can also see how other annotators aligned the text by \npressing the next/previous annotator button.  These options are only available\nonce a file saved with the Blinker format has been opened.  Buttons for these \noptions will appear in this situation. \n\n\n### 6.12. Change Data Limits\n\nIf a different data set is used than the Blinker standards, then this option\nis a must.  The defaults for Alpaco are seven data annotators, 25 parallel \ntexts, and 10 sub-sections per text file(An explanation of a file and its \nsub-sections are given in section 5).  If a data set has different limits than \nthis, then they can be changed.  Choose the \"Change Data Limits\" choice from\nthe options menu.  In the corresponding entry boxes enter the new data limits\nfor: Number of sub-sections per text file, Number of parallel texts, and Number\nof annotators.  This option must be done every time Alpaco starts, if limits \nare to be used differently then the defaults.\n\n\n\n## 7. Alpaco_helper.pl:\n\nIncluded in the Alpaco package is alpaco_helper.pl.  Alpaco considers tokens\nas anything separated by a space in the input file, and many times the text \nis not prepared this way.  Many corpora alignments need to have punctuation \nand other sequences to be considered their own tokens, and the text may not \nhave spaces to separate these sequences.  This is where Alpaco_Helper.pl \ncomes in.  It is a simple text editor that can open files, and separate \ndifferent sequences of characters given by the user.  It will separate the \ncharacter sequences by a space, then the user can save the file as desired.  \nThis way the text is prepared to load into Alpaco, and aligned how the user\ndesires.\n\nTo split up a file, first load it into into the alpaco_helper.  To do this\nsimply type the name into the entry box and press enter, or select the \"Open\nText File\" option from the file menu.  Then to split up the file by sequences\nof characters, select \"Split up Tokens\" from the options menu.  A window will\npop up.  In the entry box in this window, enter the sequences of characters \nthat should be considered their own tokens.  They must be entered with a \nspace between them in the entry box.  For example, if a user wanted question \nmarks, quotation marks, and exclamation points to be their own tokens, type\n{? \" !} in the entry box (without the curly braces).  The file will then \nshow the change with spaces separating these sequences.  Then save the file\nas desired, and it is more prepared for aligning with Alpaco.  alpaco_helper\nhas a default splitting rule, just so users can see an example.  \n\n\n\n## 8. Known Problems / Future Enhancements:\n\n* When saving Alpaco files in the line-by-line format, the user still has to\nkeep track of the naming standard.  Because this naming standard is very \nstrict, and may cause confusion, we plan on enabling the tool to save \nautomatically.  Alpaco will be able to come up with the correct file name\ndepending on where the user is in the input files.  This will make the tool\nmuch more user friendly in later releases.  \n\n* If large files are loaded into Alpaco, memory usage can become a problem.  \nThe tool itself takes about 10MB to run, and the more options are used, the \nmore memory it needs.  Also, each word in the file is it's own button.  This \nmeans that a 10,000 word file loaded as a whole will create 10,000 buttons.\nThese buttons have many options, and need memory to use.  This will slow down\nthe use of the tool in some cases, depending on the file size and how much \nmemory is available.  If the larger files are broken up into smaller sections,\nthe tool should have no memory issues.  Our tests have shown that Alpaco takes\nroughly 3MB per 2,000 words loaded into it.\n\n* Because of the last problem of memory usage, there may be a new design \npossibility in the future.  The alternative design idea that we have includes\nusing text in the place of the buttons for each word.  This may be less user\nfriendly, but with larger files it would be much more convenient with respect\nto the memory issue.    \n\n## 9. Acknowlegements\n\nThis work has been partially supported by a National Science Foundation Faculty Early CAREER Development award (#0092784) and by a grant from the Undergraduate Research Opportunities Program (UROP) of the University of Minnesota.  \n\n## Copying\n\nThis program package is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.\n\nThis program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\nSee the GNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License along with this program;\nif not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.\n\nNote: The text of the GNU General Public License is provided in the file LICENSE that you should have received with this distribution. ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasb-capfan%2Falpaco","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasb-capfan%2Falpaco","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasb-capfan%2Falpaco/lists"}