{"id":19457646,"url":"https://github.com/mgproduction/mgtagger","last_synced_at":"2026-06-17T11:31:17.938Z","repository":{"id":79204830,"uuid":"110285870","full_name":"MGProduction/mgtagger","owner":"MGProduction","description":"A small, generic, single-C-source-code POS tagger, featuring ngrams with most common word spice, with Viterbi-like code.","archived":false,"fork":false,"pushed_at":"2017-11-11T17:24:14.000Z","size":8057,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-25T11:46:47.058Z","etag":null,"topics":["c","part-of-speech-tagger","postagger","tagger"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MGProduction.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-10T19:41:27.000Z","updated_at":"2018-06-23T13:11:26.000Z","dependencies_parsed_at":"2023-02-28T19:46:10.950Z","dependency_job_id":null,"html_url":"https://github.com/MGProduction/mgtagger","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MGProduction/mgtagger","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MGProduction%2Fmgtagger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MGProduction%2Fmgtagger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MGProduction%2Fmgtagger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MGProduction%2Fmgtagger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MGProduction","download_url":"https://codeload.github.com/MGProduction/mgtagger/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MGProduction%2Fmgtagger/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34447264,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","part-of-speech-tagger","postagger","tagger"],"created_at":"2024-11-10T17:23:16.307Z","updated_at":"2026-06-17T11:31:17.920Z","avatar_url":"https://github.com/MGProduction.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"mgtagger\n=====\n\n*mgtagger* is small, generic, single-C-source-code POS tagger, featuring ngrams with most common word spice, with Viterbi-like code.\nIt can learn languages from conllu files or from in-line-tagging ones.\n\nThe source code in this repository is provided under the terms of the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0.html).\n\n## Information\n\n*mgtagger* is able to learn the info needed to postag from inline pos tagged file (the/DT cat/NN is/VBZ on/IN the/DT table/NN) or from conllu files (in which case you can select which feature set to use, and you'll also get base forms in output).\nAfter the quick learning phase it generates (and it's able to load) a (text) .mg file - lex + ngrams.\n\nIt natively works in *utf8* - but you can switch it to codepage (changing this setting into the code)\n\nTo use it you in your project you simply need to add to your project *mgtagger_postag.c* + *mgtagger_private.h* / *mgtagger.h*\n\n*mgtagger* at the moment doesn't do tokenization (even if it's a built-in basic tokenizer that may fit for some languages - not surely\nfor Japanese, Chinese or Thai, anyway) - it just assign a POS to tokens after its analysis.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgproduction%2Fmgtagger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmgproduction%2Fmgtagger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgproduction%2Fmgtagger/lists"}