{"id":15519111,"url":"https://github.com/danieldk/citar-cxx","last_synced_at":"2025-06-25T00:36:40.391Z","repository":{"id":137794154,"uuid":"1038548","full_name":"danieldk/citar-cxx","owner":"danieldk","description":"Citar part of speech tagger","archived":false,"fork":false,"pushed_at":"2016-03-28T15:29:09.000Z","size":683,"stargazers_count":40,"open_issues_count":0,"forks_count":13,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-23T04:18:45.796Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://github.com/danieldk/citar/wiki","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danieldk.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2010-10-30T23:32:13.000Z","updated_at":"2023-06-16T12:13:02.000Z","dependencies_parsed_at":"2023-05-22T14:15:28.957Z","dependency_job_id":null,"html_url":"https://github.com/danieldk/citar-cxx","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/danieldk/citar-cxx","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fcitar-cxx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fcitar-cxx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fcitar-cxx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fcitar-cxx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danieldk","download_url":"https://codeload.github.com/danieldk/citar-cxx/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldk%2Fcitar-cxx/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261782608,"owners_count":23208905,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T10:20:16.732Z","updated_at":"2025-06-25T00:36:40.348Z","avatar_url":"https://github.com/danieldk.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"Citar - A simple Trigram HMM part-of-speech tagger\n\n== Introduction ==\n\nCitar is a simple part-of-speech tagger, based on a trigram Hidden\nMarkov Model (HMM). It (partly) implements the ideas set forth in\n[1]. Citar is written in C++.\n\nCitar is licensed under the GNU Lesser General Public License\nversion 3.0.\n\n== Warning ==\n\nThe Citar API will be highly unstable for the first few versions!\n\n== Building Citar ==\n\nBuiling Citar requires a C++ standard library with TR1 extensions,\nsuch as a recent version of libstdc++ as included with GNU g++. This\nrelease was tested with g++ 4.3.2. cmake is used for creating build\ninfrastructure.\n\nYou can create the build infrastructure by running \"ccmake .\" in the\ntop-level Citar directory. This will allow you to configure various\nsettings. The WITH_TRIGRAM_CACHE setting is used to enable/disable the\ntrigram cache for linear interpolation smoothing. This may give a\nperformance gain in some situations, but is currently not thread-safe.\n\nAfter configuring Citar with cmake you can invoke \"make\" on Unix\nsystems to build Citar. Command-line utilities for training and\nevaluating the tagger will be produced. Compilation will also produce\nthe 'libsitar.a' library, which you can use to integrate the tagger in\nyour own programs.\n\n== Training ==\n\nThe language model and lexicon can be created with the 'train' utility:\n\n---\n$ ./train corpus-train lexicon ngrams\n---\n\nThis will create the 'lexicon' and 'ngrams' files. The trainer will read\ncorpora in the Brown format (one sentence per line, words and tags are\nseparated with a forward slash). You can now test the tagger with the\ncommand-line 'tag' utility, which reads tokenized sentences from the\nstandard input and prints the most probable tag sequence:\n\n---\n$ echo \"The cat is on the mat .\" | ./tag lexicon ngrams\nThe/AT cat/NN is/BEZ on/IN the/AT mat/NN ./.\n---\n\n== Authors ==\n\nDaniel de Kok \u003cme@danieldk.eu\u003e\n\n== FAQ ==\n\n- \"What's up with the name?\"\n\n  Citar, it is not an abbreviation. If you do prefer abbreviations,\n  let's make it \"C++ sImple TAgging Redux\" :).\n\n[1] TnT - a statistical part-of-speech tagger, Thorsten Brants, 2000\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Fcitar-cxx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieldk%2Fcitar-cxx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldk%2Fcitar-cxx/lists"}