{"id":18541814,"url":"https://github.com/cltk/tibetan_pos_tdc","last_synced_at":"2026-02-13T23:06:54.140Z","repository":{"id":29972600,"uuid":"33519569","full_name":"cltk/tibetan_pos_tdc","owner":"cltk","description":"POS tagged corpora from Tibetan in Digital Communication","archived":false,"fork":false,"pushed_at":"2015-04-07T03:35:31.000Z","size":1112,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-15T04:09:40.493Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://larkpie.net/tibetancorpus","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cltk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-04-07T03:20:51.000Z","updated_at":"2020-11-13T15:35:52.000Z","dependencies_parsed_at":"2022-08-24T08:10:05.344Z","dependency_job_id":null,"html_url":"https://github.com/cltk/tibetan_pos_tdc","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cltk/tibetan_pos_tdc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Ftibetan_pos_tdc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Ftibetan_pos_tdc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Ftibetan_pos_tdc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Ftibetan_pos_tdc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cltk","download_url":"https://codeload.github.com/cltk/tibetan_pos_tdc/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cltk%2Ftibetan_pos_tdc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275108231,"owners_count":25406827,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-14T02:00:10.474Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T20:06:22.819Z","updated_at":"2026-02-13T23:06:49.106Z","avatar_url":"https://github.com/cltk.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# About\n\nPart of speech tagged text from the Tibetan in Digital Communication project ([larkpie.net/tibetancorpus](http://larkpie.net/tibetancorpus/)).\n\nThese tagged texts include all three of the projects groups of corpora:\n\n\u003e At present, the corpus consists of three distinct collections. First, there is the Classical corpus.\n\n\u003e Second, there is the Saint Petersburg corpus, consisting of texts assembled and tagged by Pavel Grokhovski from Saint-Petersburg State University. We are re-tagging these texts in order to make them consistent with our scheme.\n\n\u003e Third, there is the Berlin corpus, kindly provided to us by Michael Balk. This corpus comprises the entire Tibetan catalogue of the Berlin State Library. It came to us fully segmented, but not tagged. We are now tagging the corpus after having converted it from Extended Wylie to Unicode.\n\n\u003e Together, these three collections constitute the \"Complete corpus\".\n\n\n# Tags\n\n```\nThe Classical corpus is the authoritative reference for our tagset, so should be the first point of reference. As noted above, the Saint Petersburg corpus comes via Pavel Grokhovsky, who used a different tagset. We have been converting his tagset to our system, but this work is not yet complete. Finally, the Berlin corpus is only partially tagged, and so includes a great many words with dummy tags (for example, xxx).\n\nThe following list provides further documentation on how to use our tagset.\n\nadj\nadjectives\nThose words that can occur in attributive position and are not verbal nouns we define as adjectives; they are almost always multisyllablic. The paragon of adjectives are those that in -po, -mo, -pa, -ma, -can, -ldan, and -med. Etymologically -can, -ldan 'with', and -med 'without' are verbs, but when they occur in forms that are functioning nominally with the regards to syntax, they cannot be analyzed as verbs. So, we treat them together with the preceding syllable as an adjective. The formation of adjectives is a complex; which we intend to write about. In the mean time a search for words with this tag in the corpus will reveal some patterns. (Although words such as nag 'black', gsar 'new', and che 'big' are frequently treated like adjectives for pedagogical purposes, a single syllable in predicate position before verbal suffixes is a verb. These words may appear to occur attributively, but we see this as the formation of compounds. The compound blon-chen 'prime minister' contrasts beautifully with noun and adjective pair blon-po chen-po 'great minister', etc.) \nadv.dir\ndirectional adverbs\nWe use this tag for adverbs that end in -cad, i.e. phyin-cad 'after', sṅon-cad 'before', man-cad 'below', yan-cad 'above', slan-cad 'after'. We also include phan-tshun 'mutually' in this category, for lack of a better place to put it.  Words tagged with this tag include: ཡན་ཆད་, མན་ཆད་, ཕན་ཚུན་, ཕྱིན་ཅད་, ཚུན་, ཚུན་ཅད་, སྔོན་ཅད་, སྔོན་ཆད་, སླན་ཅད་, ཕན་ཆད་, མན་ཅད་, ཡན་, ཡན་ཅད་  \nadv.intense\nintensive adverbs\nThis tag was created for the adverbs rab (tu) 'very' and śin (tu) 'very', because as uninflected stems they do not occur independently. Subsequently the tag has also been applied to ཅུང་ཟད་, ཡེ་, ཧ་ཅང་, ཆེས་, ནན་ཏ, and ཅུང་ཙམ་.   \nadv.mim\nmimetic adverb\nThis tag is used for onomatopoeia and phonophors. Examples include ཁོར་ཁོར་ཡུག་, ནར་ནར་, ཙབ་ཙོབ་, རྩོག་རྩོག་, ཁྲེ་ལོག་ལོག་, གཅེར་རེ་, ཅེ་རེ་, ཏབ་ཏབ་པོ, ཏབ་ཏོབ་, ཏུར་ཏུར་པོ་, རིག་རིག་, ལིང་, and ཧྲིག་ཧྲིག་.  \nadv.proclausal\nproclausal adverbs\nA small number of words occur clause initially and refer to the content of the previous clause. Such adverbs often begin with a demonstrative stem. These words are classed as 'proclausal adverbs'. In the following list they are presented together with their affixal suffixes, but in our tagging we divide off these suffixes in order to be consistent with their treatment elsewhere: de (nas) 'then', de (ste) 'thereafter', gal (te) 'if', ḥo (na) 'in that case', ḥon (te) 'nevertheless', yaṅ (na) 'alternatively'. Words tagged with this tag include: དེ་, ཡང་, གལ་, ཁྱད་པ, འོ་, འོན་, དེ་བ, ལ, འང་, and སྒོས་. \nadv.temp\ntemporal adverbs\nTemporal adverbs are those that occur in syntactic positions or have morphological structure that suggests they are nouns, that refer to time, and that are not followed by case markers. In our corpus so far the temporal adverbs are sṅon 'previously', da 'now', deṅ 'these days', mdaṅ 'yesterday', gdod 'at first', da-ruṅ 'still', phyi-ñin 'the next day', phyi-dro 'in the afternoon', and saṅ 'the next day'. There are also nouns that refer to time such as źag 'day' and gdugs 'noon', but these behave syntactically as nouns, for example by being suffixed with case markers. In the phrase saṅ gi gdugs la, where saṅ also appears to function as a noun, we have hesitatingly decided that for the time being it is best to tag saṅ as a temporal adverb. Words tagged with this tag include: ད་, སྔོན་, སྔ་, ད་དུང་, ནང་པ, ཕྱི་ཉིན་, ཁ་སང་, དོ་ནུབ་, གདོད་, ཉིན་, དེ་རིང་, དེང་, ནུབ་, མཚན་མོ་, ཅིག་ཅར་, གཟོད་, ཐོག་མ་, དབྱར་, དིང་སང་, མདང་, མཚན་, སང་, དགུན་, དེང་སང་, ནངས་པ, ཕྱི་, ཡུན་རིང་, སྲོད་, གདུགས་ཚོད་, གནངས་, གུང་, གློ་བུ, ཉིན་མཚན་, ད་ལོ, ནང་, ནངས་, ཕྱི་དྲོ་, མཚན་ཡལ་, འདང་, སླན་ཆད་, ཁར་རྩང་, ཁྲུགས་, གནའ་, གཟོད་མ་, ཉིན་ཞག་ཕྲུགས་, ད་ནང་, ད་རུང་, དོ་ནུབ, ནང་བ, ནངས་བ, ནམ་བཞིག་, ནུབ་མོ, ཕྱི་ཕྱི, མཚན་ཐོག་ཐག་, འཆར་ཁ་, འདང་གསུམ་, འཕྲོ་, རྒྱ་, སང་ཉིན་, སང་དགོང་, སང་ནང་པ, སྐྱ་རེངས་, སྔ་དྲོ་, སྔ་སོ, སྔ་སོ་, སྔོན, སྔོན་མ་. \ncase.abl\nablative case\nThe ablative case is the morpheme las when it follows a substantive.\ncase.agn\nagentive case\nThe agentive case is marked by any of the allomorphs -kyis, -gis, -gyis, -s, and -yis when suffixed to a substantive.\ncase.all\nallative case\nThe allative case is the clitic morpheme la when it occurs suffixed to substantives.\ncase.ass\nassociative case\nThe associative case is the clitic morpheme daṅ when suffixed to a substantive.\ncase.comp\ncomparative case\nThe comparative case is either of the allomophs -bas or -pas suffixed to an substantive.\ncase.ela\nelative case\nThe elative case is the clitic morpheme -nas when suffixed to substantives.\ncase.gen\ngenitive case\nThe genitive case is any of the allomorphs -kyi, -gyi, -gi, -yi, or -ḥi, when suffixed to a substantive.\ncase.loc\nlocative case\nThe locative case is the clitic morpheme -na when suffixed to a substantive. Note however that when -na appears after a verbal noun ending in -pa or -ba in the meaning 'if/when', we tag this -na as 'cv.loc'.\ncase.nare\nthe case -na-re\nThe case suffix -na-re which introduces direct speach.\ncase.term\nterminative case\nThe terminative case is any of the allomorphs -tu, -du, -ru, -su, -r when suffixed to substantives.\ncl.focus\nFocus clitics\nThis tag was invented for the focus clitics ni and kyaṅ ~ yaṅ. The words currently tagged with this tag include: ནི་, ཡང་, ཀྱང་, འང་, ཅང་, and ཕྱིར་ཡང་. It is probably not correct to tag ཕྱིར་ཡང་ with this tag, but we must consider the matter further. The word lta when it is not tagged as a verb, or a relator noun, is also tagged as a focus clitic.  \ncl.quot\nThe quotative clitic\nWe invented this tag for the 'quotative clitic' with the allomorphs ces, źes, and śes. However, the forms ce-na, ces-pa, etc. reveal that more properly this should be called a 'quotative verb' and tagged with a tag that begins with 'v'. Nonetheless, it is now inconvenient to change our approach. The words tagged with this tag are: ཞེས་, ཅེས་, ཞེས་པ་, ཞེ་, ཅེས་པ་, ཤེ་, ཅེ་, ཅེས་བ་, ཞིས་ (a misspelling of ཞེས་), and ཞེས་བ. \ncv.abl\nablative converb\nThe ablative converb is the morpheme las when it follows a verb.\ncv.agn\nagentive case\nThe agentive coverb is marked by any of the allomorphs -kyis, -gis, -gyis, -s, and -yis when suffixed to a verb.\ncv.all\nallative converb\nThe allative converb is the clitic morpheme la when it occurs suffixed to verbs.\ncv.are\nThe converb -a-re\nThis tag is used to tag the converb -ta-re and its allomorphs (cf. Walter Wimon, 1967, 'The Tibetan Particle \"re\",' Bulletin of the School of Oriental and African Studies, 30.1, pp. 117-126). \ncv.ass\nassociative converb\nThe associative converb is the clitic morpheme daṅ when suffixed to a verbs. Note that this tag is particularly used with the polite imperative meaning of daṅ. One also sees daṅ in the meaning 'and' directly suffixed to morphological verb stems. In such cases we see daṅ as evidence that the preceding form is funtioning nominally and should be tagged [n.count], barring reasons to do otherwise.\ncv.cont\ncontinuing converb\nThe continuing converb, unknown in the early literature, is formally similar to the genitive followed by -n and functionally is similar to the imperfective converb in earlier strata of literature. The words tagged so far with this tag are: གིན་, ཀྱིན་, གྱིན་. \n \ncv.ela\nelative converb\nThe elative converb is the clitic morpheme -nas when suffixed to verbs.\ncv.fin\nfinal converb\nThe final converb is any allomorph of the marker -ḥo, which is an explicit marker of a finitie verb in the indicative mood. Words tagged with this tag include: སོ, འོ, དོ, ངོ, ནོ, ཏོ, རོ, ལོ, གོ, བོ, མོ, and པོ. \ncv.gen\ngenitive converb\nThe genitive converb is any of the allomorphs -kyi, -gyi, -gi, -yi, or -ḥi, when suffixed to a verb.\ncv.imp\nimperative converb\nThe imperative converb is the morpheme that explicitly marks the imperative or prohibitive mood, which has one of the three allomorphs -śig, -cig, and -źig.  Our corpus yields the unexpected śig-pa, which we have also tagged with this tag.  \ncv.impf\nimperfective converb\nThe semi-final converb is the morpheme that has the three allomorphs ཞིང་, ཅིང་, and ཤིང་.  \ncv.loc\nlocative case\nThe locative converb is the clitic morpheme -na when suffixed to a verbs, or when suffixed to a verbal noun ending in -pa or -ba in the meaning 'if/when'.\ncv.odd\nodd converbs\nOdd converbs are any that we discovered after fixing the tagset. So far we the following words with this tag: ལས་ཆེ, ཨང་, ཚད་. \ncv.ques\nquestion converb\nThe question converb is any allomorph of the marker -ḥam, which is an explicit marker of a polar question. Words tagged with this tag include: འམ, སམ, ནམ, དམ, གམ, ངམ, ཏམ, རམ, ལམ, བམ, མམ, ངེ་, and མཾ་   \ncv.rung\nThe converb rung\nWhen the morpheme ruṅ appears after a verb meaning 'however' it is tagged with this tag.\ncv.sem\nsemi-final converb\nThe semi-final converb is the morpheme that has the three allomorphs ཏེ, སྟེ, and དེ.  \ncv.term\nterminative converb\nThe terminative converb is any of the allomorphs -tu, -du, -ru, -su, -r when suffixed to a verb.\nd.dem\ndemonstratives\nThis tag is used for the demonstratives ḥdi 'this' and de 'that'. These two words are tagged as demonstratives also when used as determiners (i.e. we do not distinguish rgyal-po de 'that king' from de 'that one, him'). Words tagged with this tag include: དེ་, འདི་, ཡ, ཚུ, མ, ཕ, ཕྱི་, ཕ་གི་, མ་ཀི, མ་གི, མ་གི་, མ་ཀི་, འདི་པ, ཡ་གི་.  \nd.det\ndeterminer\nThe most frequent determiner is gźan 'other'. In addition, we identify ya-re 'each one (of two)' as a determiner on the basis of the following sentence: Brgya-byin daṅ Tshaṅs-paḥi rgyal-pos lag-pa ya-re nas zin te 'The kings Indra and Brahma each took him by one of his hands'. We reckon ḥbaḥ 'sole' as a determiner on the basis of sentences such as rus-pa daṅ khrag ḥbaḥ źig gis sa rtsog-rtsog ltar ḥdug-pa mthoṅ 'They saw the ground besmirched with only bone and blood'. Words tagged with this tag include: གཞན་, སོགས་, རང་, ཀ་, འབའ་, རེ་, རེ་རེ་, ག་, ཤ་སྟག་, ཁ་, གཞིན་, ཡ་རེ་, ཤས་དག་, སྣ་རེ་. We must check gzhin. \nd.emph\nemphasis\nWe initially invented this category for ñid in phrases such as rgyal-poñid 'that very king' or lusñid 'this body'. This syntactic use of ñid must be distinguished form its use in Buddhist terminology -ñid inside of words, e.g. stoṅ-pa-ñid 'emptiness'.Apart from ñid, we have categorized kho-na 'the very, same' and re-re 'each' as emphatics. This use of kho-na should not be confused with its function as a third person pronoun in Old Tibetan. In one case kho appears not as a personal pronoun but as what seems to be a variant of kho-na; this kho we also classify as an emphatic, viz. smras-paḥi tshig ḥdi bden na bden-paḥi tshig bden-paḥi tshig smras-pas / bdag gi lus ḥdi sṅa-ma kho bźin du rma med-par gyur cig 'If these words that I have said are true, then because of saying true words, let this my body be without wounds like before' (vol 74, page 137b). Words tagged with this tag include: ཉིད་, རབ་, ཁོ་ན་, and ཁོ་.  \nd.indef\nindefinitie\nThis category is used for the allomorphs of the indefinite marker cig, źig, and śig as in pho-ña cig 'a messenger'. The indefinite marker, which occurs inside of noun phrases, must be distinguished from the identically looking imperative converb (see below), which occurs suffixed to the imperative stems of verbs.  \nd.plural\nplurals\nThe plural markers rnams, dag, kun, thams-cad, ḥo-cog (and its variants) and tsho are tagged as their own category 'plural'. However, plural pronouns (bdag-cag, khyed-cag, ḥu-bu-cag) are treated as one word. The plural marker -cag is not removed because to do so would result in pronominal stems which are not mutually comparable (viz. bdag is a singular pronoun, khyed a plural pronoun, and ḥu-bu has no independent life outside of ḥu-bu-cag). We also tag ḥgaḥ 'some' as a plural, although in the abstract one would perhaps prefer to call it a 'quantifier'.  The three verbs (la) sogs-pa 'etc', (daṅ) ldan-pa 'having', (daṅ) bcas-pa 'together with' could be seen as similar to quantifiers or otherwise to be treated as parts of the noun phrase, however, we have chosen to treat them etymologically as verbs. Words tagged with this tag include: རྣམས་, དག་, ཐམས་ཅད་, ཀུན་, ཡོངས་, སྣ་ཚོགས་, འགའ་, སྤྱི་, ཅུང་ཟད་, ཚོ་, དུ་མ, ཤས་, མཐའ་དག་, ཆུང་ཟད་, དགུ་ཅོག་, འགའ་རྫི་ར་, སོ་ཅོག་, སྣ་དགུ་. The inclusion of འགའ་རྫི་ར་ is probably a mistake.   \nd.tsam\ntsam and words with similar distribution\nIn an earlier version of the tag set we had tagged tsam as [cl.tsam], but then it became clear that it only ever followed nominals, so it clearly deserves the [d.xxx] category. Words tagged with this tag include ཙམ་, སྙེད་, ཙམ་པ་, རྙེད་ (a misspelling for སྙེད་), ཙམ་པ, སྙད་, and སྙེད་པ.  \ninterj\ninterjections\n This tag is used for interjections. Words tagged with this tag include: ཀྱེ་མ་, ཀྱེ་, ཨ་ལ་ལ་, ཀྱི་ཧུད་, ཁོ་དེ, མ་ལ་, འོ་དོད་, ཧེ་, ཨེ་མ་, and ཨེ་ཨེ་.  \nn.count\ncount nouns\nTo identify lexical nouns we rely on the syntactic ability of the word in question to head a noun phrase, the dictionary meaning, and (when possible) the presence of nominal suffixes such as -mo, -po and -bu. Because we treat grammatical affixes as separate words, a single word normally does not not include grammatical affixes such as case markers and converbs. Nonetheless, there are well-motivated exceptions to this policy like gaṅ-na-ba 'whereabouts' and bdag-gi-ba 'that which is mine'. (We treat a -pa or -ba as part of the preceding word, regardless of the part-of-speech of the preceding word.) Another frequent category of exceptions is calques of Sanskrit terms, e.g. kun-tu-rgyu 'parivrājaka', which must be treated as a noun together because it looks like a verb phrase, but functions syntactically as a noun.  In general, when two nouns occur in succession they are understood as a compound; the dandva compound pha-ma 'parents' is treated as one noun rather than two (pha 'father' and ma 'mother') and the tatpuruṣa compound khyim-bdag 'householder' is likewise treated as one noun rather than two (khyim 'home' and bdag 'lord'). When an adjective precedes its head this is also treated as a compound. Thus, because dug btsan-po would be the expected order for 'mighty poison', we treat btsan-dug 'mighty poison' (vol. 74, page 147a) as a single word. Apposition is the one category of exceptions when the concatenation of two nouns is not treated as a compound. An example of this type is the two words bu khyeḥu in the following sentence: deḥi tshe yul de na khyim-bdag cig la bu khyeḥu źig btsas na / 'At that time, in that land, when a child, a son, was born to a householder'. Rather than understanding khyeḥu as an adjective modifying bu, or taking bu-khyeḥu as one word, the second word simply sits after the first one to add greater specificity. The reason to not treat apposition as compounding is because apposition occurs with proper nouns. For example, in the sentence deḥi tshe na rgyal-po Gsal-rgyal gyi btsun-mo chen-po Ḥbar-li źes bya-ba la bu-mo źig btsas-nas 'At that time a daughter was born to Ḥbar-li, the main queen of king Prasenajit', rgyal-po 'king' and Gsal-rgyal 'Prasenajit' are in apposition; to unite the two as a compound would lead to unintuitive, unwieldy, and therefore unacceptable consequences. \nn.mass\nmass nouns\n We divide out mass nouns from normal lexical nouns on the basis of two instances in our corpus where otherwise two nouns not in apposition would follow each other: nor-bu sbar gaṅ 'a handful of jewels' and chu sñim-pa gaṅ 'a handful of water'. Knowing that there exists this syntactic difference between normal lexical nouns and mass nouns, we tag all plausible mass nouns on the basis of their meaning (e.g. zaṅs 'copper'). A final list of mass nouns can only be securely put forward after the syntactic behavior of these words is better investigated. Words tagged with this tag include:  གསེར་, ཆུ་, རིན་པོ་ཆེ་, ནོར་, ཤ་, ཆུ, ས་, ཁྲག་, ཆང་, སྨན་, སེར་བ་, ནོར་བུ་, མར་, ངུར་སྨྲིག་, དངུལ་, ཡི་གེ་, ནས་, ལྕགས་, སྤུ་, གངས་, ཙན་དན་, གསེར་གཡུ་, གོས་, དར་, ཤེལ་, གསེར་དངུལ་, པདྨ་རཱ་ག་, མར་ཁུ, ཟངས་, ཡིག་ (I would think that this should only occur in compounds), གོས་དར་, ཆུ་སྡོར་, ཐེར་ཕྲུག་, བཟའ་, བསུ་ཆང་, བུམ་ཆུ, བཻདཱུརྱ, མཐོ་ཁྱད་, རིན་ཆེན་, སྣམ་བུ་, ཁ་བ་, གསེར་གྱི་མེ་ཏོག་ (a mistake), ཆུ་མེ་ཤིང་, ཉ་ཕྱིས་, ཐལ་བ་, ཐུད་, དབུས་ཕྲུག་, བུར་, མཐོན་མཐིང་, མར་གད་, མར་སར་, ཚོས་, ཟ་མ་བཅུད་, འཇིམ་པ་, འདག་, རིན་ཆེན་གསེར་, རྒྱགས་ཆུ་, རྡུལ་ཚོན་, རྣག་ཁྲག་, ལ་ཅ, ལ་ཆ, ལེ་བརྒན་, ཤ་ཁྲག་, སེང་ལྟེང་, སེང་ལྡེང་, སེར་, སྤེན་, and སྤེན་བད་.  \nn.prop\nproper nouns\nThis tag is used for personal and place names.    \nn.rel\nrelator nouns\nA relator noun is a noun, normally one syllable, which has a genitive before it and a spatial case (allative, locative, terminative) after it, e.g. deḥi naṅ na 'inside of that', deḥi druṅ du 'before him', deḥi ḥog tu 'under that', deḥi tshe na 'at that time'; relator nouns are not quantified and are not suffixed with adjectives, determiners or demonstratives. After identifying the class of relator nouns, these words are tagged as relator nouns even in syntactic contexts missing the genitive to left or the spatial case to the right. For example, in the sentence deḥi tshe blon-po źig phyi-rol nas naṅ du ḥoṅs-pa las/ mi btson du bzuṅ-ba mthoṅ-ba daṅ / 'Then the minister went inside from outside and saw the man who had been taken to prison', the relator noun tshe is not followed by a spatial case and the relator noun naṅ is not preceded by a genitive.  In the phrase bar ḥgaḥ 'sometimes' we do not consider bar a relator noun because it undergoes quantification, e.g. … bar ḥgaḥ ni gti-mug gi phyir lus btaṅ yaṅ chos kyi phyir bsod-nams kyi źiṅ daṅ lan ḥgaḥ yaṅ ma phrad-paḥi lus ḥdi ci ruṅ ? 'What is the use of this body which … sometimes has been used because of ignorance, but has not yet met an occasion (to serve) as a field of merit'. Words tagged with this tag include: ལྟ, ཕྱི, བཞིན་, ཚེ་, སྐད་, བར་, དུས་, ནང་, རྗེས་, དྲུང་, ལྟ་བུ, འདྲ་, འོག་, སྟེང་, ལྟ་བུ་, སླད་, ངང་, ཕྱོགས་, མཐའ་, སྒོ་, བཞིན, ས, སླ, མདུན, དབུས་, ཐོག་, ཁོང་, ཕྱི་བཞིན་, ཁ, ཆེད་, ཐོག་མ, སྐབས་, གོང་, རྟིང་, ཐད་, གན་, མཇུག་, འཕྲལ་, ལོགས་, ཁྲོད་, རྟེན་, སྨད་, ཁོངས་, གསེབ་, ཐ་མ, དུས, ཕུ་, དྲུད་, བསེབ་, མགོ, ཙ་, རིང་, རྩ, ལྟག་, སྐོར་, དཀྱིལ་, དབུང་, བྱང་, མཐའ་ལོགས་, ཚུན་, ཞོར་, འཁྲིས་, འོག་བ, རྒྱབ་, སྐེ་, སྐེད་, སྙིང་ཁ, སྟེངས་, སྟོད་. Several questions of delimiting relator nouns require further consideration, in particular, the approach to lta versus lta-bu.  \nn.v.aux\nNominalization of auxilliary verbs\nThis tag is used for the nominalized version of verbs that would be tagged [v.aux]. Forms currently tagged with this tag include དགོས་པ་, ནུས་པ་, ཤེས་པ་, དཀའ་བ་, ཕོད་པ་, ཐག་པ་, and རན་པ་.  \nn.v.cop\nNominalization of copulas\nThis tag is used for nominalized versions of verbs that would be tagged [v.cop], q.v.\nn.v.fut\nNominalization of a future verb stem\nThis tag is used for the nominalized versions of future verb stems, i.e. a future verb stem followed by pa, ba, tshul, rgyu, mkhan, mi, or bya.\nn.v.fut.n.v.past\nNominalization of a future or past stem of a verb\nThis tag is used for nominalized forms of verb stems that would be tagged [v.fut.v.past], q.v. \nn.v.fut.n.v.pres\nNominalization of a future or present stem of a verb\nThis tag is used for the nominalized version of a verb stem that would be tagged [v.fut.v.pres], q.v.\nn.v.imp\nNominalization of an imperative stem of a verb\nThe imperative stem of a verb is supposed not to be nominalized. However, in rare cases there are nominalized forms which appear to come from imperative stems, and we mark them accordingly. The form ཐོངས་པ་ is the only case that has come up so far.\nn.v.invar\nNominalization of [v.invar]\nThis tag is used for nominalized forms of verb stems that would be tagged [v.invar] q.v.\nn.v.neg\nNominalized form of the negative verb med\nThis tag is used for the nominalized forms of the inherently negated verb med, such as med-pa and med-sa. \nn.v.past\nNominalization of a past verb stem\nThis tag is used for the nominalized versions of past verb stems, i.e. a past verb stem followed by pa, ba, tshul, rgyu, mkhan, or mi.\nn.v.past.n.v.pres\nNominalization of a past or present stem of a verb\nThis tag is used for the nominalized version of a verb stem that would be tagged [v.past.v.pres], q.v. \nn.v.pres\nNominalization of a present verb stem\nThis tag is used for the nominalized versions of present verb stems, i.e. a present verb stem followed by pa, ba, tshul, rgyu, mkhan, or mi.\nneg\nnegation\nThe two negation prefixes ma and mi are classified together in their own category. The polar question prefix e is also categorized under this tag because it occupies the same syntactic position as ma and mi. For the two verbs min and med negation is inherent to their meaning, consequently 'neg' is also added to their POS-tags (i.e. min|v.cop.neg and med|v.neg see above).  \nnum.card\ncardinal numbers\nIn numbers we distinguish cardinals (gcig, gñis, gsum, etc.) and ordinals (daṅ-po, gñis-pa, gsum-pa, etc.). Other derivatives of numerals are treated according to their respective syntax, thus gcig-pa 'sole' is an adjective, gñi-ga 'both' is an indefinite pronoun, etc. In higher numbers each component digit is tagged separately, to do otherwise would prevent the computer from learning pattens by virtue of having to independently learn each possible cardinal number of the infinite possibility. When a numeral follows a noun we regard the two as separate words. In addition to obvious cases like mi lṅa 'five men', we also treat dkon-mchog gsum 'triratna' as two words. While it is true that one will almost never encounter any other numeral after the word dkon-mchog this fact says as much about Buddhism as it does about syntax.  The treatment of phrag well exemplifies our pragmatic attitude toward part-of-speech tagging. Although not a cardinal number itself, this syllable occurs inside cardinal numbers, effectively marking a certain place with a zero, e.g. stoṅ phrag drug cu '1060'. Because the internal structure of numerals is not of interest to our project and adding a new tag for phrag would add unnecessary complications to our tag-set, we treat phrag itself as a cardinal number. There are occasions when the morphology of a word suggests that it might contain a numeral (e.g. mṅon-sum 'real', phun-sum-tshogs 'marvelous'), but there is no reason to see such cases as synchronically analyzable. Numbers that end with -po are also tagged [num.card].  \nnum.ord\nordinal numbers\nIn numbers we distinguish cardinals (gcig, gñis, gsum, etc.) and ordinals (daṅ-po, gñis-pa, gsum-pa, etc.). Other derivatives of numerals are treated according to their respective syntax, thus gcig-pa 'sole' is an adjective, gñi-ga 'both' is an indefinite pronoun, etc. In higher numbers each component digit is tagged separately, to do otherwise would prevent the computer from learning pattens by virtue of having to independently learn each possible cardinal number of the infinite possibility. Some adjectives can be distinguished from ordinal numbers only in context. For example, in the phrase rdo-rje rtse-lṅa-pa 'a five-pronged vajra' the noun phrase syntax and the overall meaning of the passage dictates that lṅa-pa is part of the word rtse-lṅa-pa 'five-pronged' rather than a word 'fifth'.   \np.indef\nindefinite pronoun\nThe words la-la 'some', so-so 'each', and gñi-ga 'both' are used as indefinite pronouns in our pilot corpus. Words tagged with this tag include: ཁ་, སོ་སོ་, ལ་ལ་, རེ་རེ་, རེ་. (It looks like we are no longer putting gnyi-ga in here. I wonder why?) \np.interrog\ninterrogative pronouns\n This is the tag used for interrogative pronouns such as su 'who', nam 'when', and gaṅ 'where'. Words tagged with this tag include: ཅི་, གང་, ཇི་, ཅི, སུ་, ནམ་, ཇི་སྲིད་, ག་, ག་རེ་, ཇི་ཙམ་, ཅི་སྲིད་, and ཙུག་. \np.pers\npersonal pronouns\nThis tag is used for personal pronouns. Words tagged with this tag include: བདག་, ཁྱོད་, ང་, ཁྱེད་, བདག་ཅག་, ངེད་, ཁོ་, ཁོང་, རང་རེ་, ཁོ་བོ་, རང་, མོ་, ཁོ་མོ་, ཁྱེད་ཅག་, འོ་སྐོལ་, འུ་ཅག་, འོ་ཅག་. The word རང་ is only tagged as [p.pers] when it functions as a full independent pronoun in its own right; in ང་རང་ and ཁྱེད་རང་ the word རང་ is tagged [p.pers]. The word བདག་ is tagged as [p.pers] when it is a humble equivalent of ང་. When this word means 'lord' or ' the self' (philosophy) it is tagged [n.count] and when it means 'himself' it is tagged as [p.refl]. \n  \np.refl\nreflexive pronouns\nThis tag is used for reflexive pronouns. The only word tagged with this tag རང་ and བདག་. The word རང་ is tagged as [p.refl] when it functions as a reflexive, in a sentence (invented) such as ཁོས་ དེ་རིང་ རང་ གི་ ཟན་ ཟོས་ སོ་ ' Today, he ate his own food' or in long form personal pronouns such as ང་རང་ and ཁྱེད་རང་. When རང་ is full independent pronoun meaning 'I'  or ' you' it is tagged [p.pers]. The word བདག་ is tagged as [p.pers] when it is a humble equivalent of ང་; when it means 'lord' or ' the self' (philosophy) it is tagged [n.count] and when it means 'himself' it is tagged as [p.refl]. \n  \np.refl\nReflexive pronoun\nThis tag is used for the words རང་ and བདག་ when they are used as reflexive pronouns, i.e. in the meaning 'himself, herself'. The word རང་ when it immediately appears after a personal pronoun is always tagged [p.refl].\npunc\npunctuation\nWhile a tsheg is considered part of the preceding syllable, all other punctuation marks are tagged as punctuation. Such marks include །, ༑, །།, ༄༅༅།, and །།།།. Punctuation marks are broken apart to aid tokenization, e.g. we tag ༄|punc ༅|punc ༅|punc །|punc. \nskt\nSanskrit or other metalingustic elements\nThis tag was invented for the odd Sanskrit word that occurs in Tibetan (e.g. in mantras) not as a syntactic word in its own right, but as something metalinguisitc at least vis à vis Tibetan. On analogy to this use of Sanskrit words in Tibetan we have extended the use of the tag to include all metalinguistic forms, e.g. words used not as the words they are but as the name of that word, or the name of a letter, words in Roman script, etc.\nv.aux\nAuxiliary verbs\nThis tag is used for auxilliary verbs. By this we understand only thos verbs that show no morphological stem variation and may occur directly after another verb stem. The words we tag with this tag are: ཐག་, དགོས་, ནུས་, མོད་, འདོད་, ཤེས་, སྲིད་, ཤེས་, རན་, གྲགས་, ཐང་, and ཕོད་. Note that this set is far fewer than the total number of verbs that might be considered auxiliary verb in classical or modern Tibetan. In fact, we now feel that it was a mistake the give these tags a separate encoding, instead they should have been tagged on the basis of their syntax (e.g. mi [neg] nus [v.pres.v.fut]) and the question of auxiliation and other syntactic uses of verb stems in Tibetan addresssed systematically in a future project. However, once we arrived at this understanding it was inconvenient to change our approach. \nv.cop\nCopula verbs\nThis tag is used for copula verbs, in particular yin and lags which do not distinguish various stems. In our current view, it would have been better not to have included this tag, but rather to have tagged these verbs according to their syntactic position, e.g. ma [neg] yin [v.past], yin [v.past.v.pres] nas [cv.ela], etc. However, when we made this determination it was already inconvenient to go back on the inclusion of the tag [v.cop]. \nv.cop.neg\nThe negative copula verb\nThis tag is used for the inherently negated copula verb min.\nv.fut\nFuture stem of a verb\nThis tag is used for future stem verbs, e.g. bya, gzuṅ, sgrubs.\nv.fut.v.past\nFuture or past stem of a verb\nThis tag is used for verb stems that are both morphologically and syntactically ambiguous between the past and futur, e.g. bskol.\nv.fut.v.pres\nFuture or present stem of a verb\nThis tag is used for verb stems that are both morphologically and syntactically ambiguous between the past and present, e.g. gśegs in the phrase mi gśegs.\nv.imp\nImperative stem of a verb\nThis tag is used for imperative verb stems, e.g. byos, zuṅ, sgrubs.\nv.invar\nPresent, past, or future stem of a verb\nThis tag is used for verb stems that are morphologically and syntactically ambiguous among present, past, and future, e.g. gśegs in the phrase gśegs so.\nv.neg\nThe negative verb med\nThis tag is used for the inherently negated verb med.\nv.past\nPast stem of a verb\nThis tag is used for past stem verbs, e.g. byas, bzuṅ, bsgrubs.\nv.past.v.pres\nPast or present stem of a verb\nA verb stem that is both morphologically and syntactically ambiguous between the past and present, e.g. gśegs in the prhase gśegs nas.\nv.pres\nPresent stem of a verb\nThis tag is used for present stem verbs, e.g. byed, ḥdzin, sgrub.\n```\n\n# License\n\n\u003e All software written as part of this project will be released under an open source license. Our Tibetan pos tagging software, incorporating a word segmentation component, will be released on Tibetan NLP, a GitHub page collecting tools and resources related to natural language processing of Tibetan. The page also has contributions from the Tibetan Buddhist Resource Center and the Tibetan \u0026 Himalayan Library.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcltk%2Ftibetan_pos_tdc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcltk%2Ftibetan_pos_tdc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcltk%2Ftibetan_pos_tdc/lists"}