{"id":15519768,"url":"https://github.com/oroszgy/phd-dissertation","last_synced_at":"2026-01-15T23:02:54.451Z","repository":{"id":18642848,"uuid":"21849442","full_name":"oroszgy/phd-dissertation","owner":"oroszgy","description":"Hybrid algorithms for preprocessing agglutinative languages and less-resourced domains","archived":false,"fork":false,"pushed_at":"2017-02-24T13:42:30.000Z","size":2915,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-04T23:28:21.184Z","etag":null,"topics":["hungarian","nlp","phd-dissertation","phd-thesis"],"latest_commit_sha":null,"homepage":"http://gyorgy.orosz.link/","language":"TeX","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oroszgy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-07-15T06:46:17.000Z","updated_at":"2017-02-24T13:37:49.000Z","dependencies_parsed_at":"2022-09-05T22:50:19.674Z","dependency_job_id":null,"html_url":"https://github.com/oroszgy/phd-dissertation","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/oroszgy/phd-dissertation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oroszgy%2Fphd-dissertation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oroszgy%2Fphd-dissertation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oroszgy%2Fphd-dissertation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oroszgy%2Fphd-dissertation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oroszgy","download_url":"https://codeload.github.com/oroszgy/phd-dissertation/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oroszgy%2Fphd-dissertation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28473974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-15T22:27:41.514Z","status":"ssl_error","status_checked_at":"2026-01-15T21:54:47.910Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hungarian","nlp","phd-dissertation","phd-thesis"],"created_at":"2024-10-02T10:22:40.024Z","updated_at":"2026-01-15T23:02:54.436Z","avatar_url":"https://github.com/oroszgy.png","language":"TeX","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hybrid algorithms for preprocessing agglutinative languages and less-resourced domains effectively\n\nThis thesis deals with text processing applications examining methods suitable for\nless-resourced and agglutinative languages, thus presenting accurate preprocessing\nalgorithms.\n\nThe first part of this study describes morphological tagging algorithms which can\ncompute both the morpho-syntactic tags and lemmata of words accurately. A tool (called\nPurePos) was developed that was shown to produce precise annotations for Hungarian\ntexts and also to serve as a good base for rule-based domain adaptation scenarios.\nBesides, we present a methodology for combining tagger systems raising the overall\naccuracy of Hungarian annotation systems.\n\nNext, an application of the presented tagger is described that aims to produce\nmorphological annotation for speech transcripts, and thus, the first morphological\ndisambiguation tool for spoken Hungarian is introduced. Following this, a method is\ndescribed which utilizes the adapted PurePos system for estimating morpho-syntactic\ncomplexity of Hungarian speech transcripts automatically.\n\nThe third part of the study deals with the preprocessing of electronic health records.\nOn the one hand, a hybrid algorithm is presented for segmenting clinical texts into words\nand sentences accurately. On the other hand, domain-specific enhancements of PurePos\nare described showing that the resulting tagger has satisfactory performance on noisy\nmedical records.\n\nFinally, the main results of this study are summarized by presenting the author’s\ntheses. Further on, applications of the methods presented are listed which aims\nless-resourced languages.\n\n*Continue reading [here](https://github.com/oroszgy/phd-dissertation/releases/download/Final/thesis.pdf).*\n\n--- \n\nIt uses [this template](https://github.com/kks32/phd-thesis-template)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foroszgy%2Fphd-dissertation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foroszgy%2Fphd-dissertation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foroszgy%2Fphd-dissertation/lists"}