{"id":13740422,"url":"https://github.com/LowResourceLanguages/champollion","last_synced_at":"2025-05-08T20:31:25.690Z","repository":{"id":45501478,"uuid":"53846757","full_name":"LowResourceLanguages/champollion","owner":"LowResourceLanguages","description":"Import of https://sourceforge.net/projects/champollion","archived":false,"fork":false,"pushed_at":"2016-03-14T10:22:28.000Z","size":2697,"stargazers_count":18,"open_issues_count":0,"forks_count":8,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-08-04T04:06:33.837Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LowResourceLanguages.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-03-14T10:21:54.000Z","updated_at":"2024-04-09T19:35:14.000Z","dependencies_parsed_at":"2022-07-15T08:51:32.482Z","dependency_job_id":null,"html_url":"https://github.com/LowResourceLanguages/champollion","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LowResourceLanguages%2Fchampollion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LowResourceLanguages%2Fchampollion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LowResourceLanguages%2Fchampollion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LowResourceLanguages%2Fchampollion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LowResourceLanguages","download_url":"https://codeload.github.com/LowResourceLanguages/champollion/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224765461,"owners_count":17366123,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T04:00:47.653Z","updated_at":"2024-11-15T10:30:39.832Z","avatar_url":"https://github.com/LowResourceLanguages.png","language":"Perl","readme":"                     Champollion Tool Kit V1.2\n\n\n\nAbout CTK\n----------\n\nChampollion Tool Kit (CTK) is a tool kit aiming to provide\nready-to-use parallel text sentence alignment tools for as many\nlanguage pairs as possible.\n\nBuilt around the LDC champollion sentence aligner kernel, the tool kit\nprovides essential components required for accurate sentence\nalignment, including sentence breakers, stemmers, pre-processing\nscripts, dictionaries (if possible), post-processing scripts etc.\n\nCurrently, CTK includes tools to align English text with Arabic, Chinese,\n and Hindi translations. It can be easily expanded to other language pairs.\n\nCTK welcomes contributions from other researchers.\n\nCTK is written in perl.\n\nInstallation\n------------\n\nAfter unpack the CTK distribution, you need to set the enviorment\nvariable CTK to the directory where the package is unpacked, which is\nthis directory if you haven't done anything funny. And that's it.\n\nTo test the installation, try run the following command:\n\n./test_installtion\n\nIt will tell you either the installation is good, or bad and in which\ncase minimum diagnosis will be given.\n\nPlease note, the first time you run champollion (or test_installation\nwhich runs champollion internally), the program needs to build certain\ndatabases, which can take up to five minutes.\n\nInput and Output\n----------------\n\nThe input files for both sides should be one segment (sentence) per\nline.\n\nThe output (alignment file) looks like the following:\n\nomitted \u003c=\u003e 1\nomitted \u003c=\u003e 2\nomitted \u003c=\u003e 3\n1 \u003c=\u003e 4\n2 \u003c=\u003e 5\n3 \u003c=\u003e 6\n4,5 \u003c=\u003e 7\n6 \u003c=\u003e 8\n7 \u003c=\u003e 9\n8 \u003c=\u003e 10\n9 \u003c=\u003e omitted\n\n\nEvery alignment is in the format of:\n\n        language1 sentence ids \u003c=\u003e language2 sentence ids\n\nwhere each language1/language2 sentence ids may contain up to four sentence ids\ndelimited by commas, it also can be \"omitted\" indicating no translation\nwas found. The sentence ids start at 1.\n\nLanguages\n---------\n\nCTK v1.2 supports three language pairs:\n\n\tEnglish Chinese(GB)\n\tEnglish Chinese(UTF8)\n\tEnglish Arabic (UTF8)\n\tEnglish Hindi (UTF8)\n\n\nIMPORTANT: Because we don't have IPRs to distribute the dictionaries\nwe're using internally, the dictionaries included in this package\nare rather small: English Chinese dictionary (about 5K headwords) \nand English Arabic dictionary (about 4K headwords). Our experiment \nshows that bigger dictionary usually leads better performance, which \nmeans that you may want to use your own dictionary, if it has better\ncoverage than the one we provide.\n\nCommand Line\n------------\n\nCommand line to run English Chinese sentence aligner:\n\nyour_CTK_path/bin/champollion.EC_GB \u003cenglish sentence file\u003e \u003cchinese sentence file\u003e \u003calignment file\u003e\n\nor \n\nyour_CTK_path/bin/champollion.EC_utf8 \u003cenglish sentence file\u003e \u003cchinese sentence file\u003e \u003calignment file\u003e\n\nCommand line to run English Arabic sentence aligner:\n\nyour_CTK_path/bin/champollion.EA \u003cenglish sentence file\u003e \u003carabic sentence file\u003e \u003calignment file\u003e\n\nCommand line to run English Hindi sentence aligner:\n\nyour_CTK_path/bin/champollion.EH \u003cenglish sentence file\u003e \u003chindi sentence file\u003e \u003calignment file\u003e\n\nIn addition, there is champollion.generic which can align unknown\nlanguage pairs. To run it:\n\nyour_CTK_path/bin/champollion.generic \u003cLX sentence file\u003e \u003cLY sentence file\u003e \u003calignment file\u003e \u003cdictionary\u003e\n\nFor languages not included in the package, it's strongly recommended\nthat you write a tokenizer following existing examples, and a stemmer\nif possible. Even an imperfect stemmer will make a big difference.\n\nEvaluation Corpus\n-----------------\n\nTo facilitate the development of better sentence aligners, this\npackage also includes the manually aligned English Chinese data as an\nevaluation corpus. The evaluation corpus is in 'eval' directory.\n\nThe data were selected from three sources: UN, Sinorama, and Hong Kong Hansards.\n\nThe Chinese files are:\n\n198706005.c.txt         921008fc.txt            UN19990209_010.c.txt\n200110006.c.txt         930422fc.txt\n890621fc.txt            UN19930101_020.c.txt\n\nThe English files are:\n\n198706005.e.txt         921008fe.txt            UN19990209_010.e.txt\n200110006.e.txt         930422fe.txt\n890621fe.txt            UN19930101_020.e.txt\n\nThe gold alignment files are:\n\n198706005.gold.align            930422f.gold.align\n200110006.gold.align            UN19930101_020.gold.align\n890621f.gold.align              UN19990209_010.gold.align\n921008f.gold.align\n\n\nCOPYRIGHT\n---------\n\nThis software are protected by Common Public License, see LICENSE for\ndetail.\n\nContributions, Questions, Bug report, etc.\n------------------------------------------\n\nPlease contact Xiaoyi Ma at xma@ldc.upenn.edu.\n\n\nXiaoyi Ma 6/20/2011\nLinguistic Data Consortium\nxma@ldc.upenn.edu\n","funding_links":[],"categories":["Software"],"sub_categories":["Utilities"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLowResourceLanguages%2Fchampollion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLowResourceLanguages%2Fchampollion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLowResourceLanguages%2Fchampollion/lists"}