{"id":21524171,"url":"https://github.com/klaragtknst/bachelor-thesis","last_synced_at":"2026-02-09T10:32:07.802Z","repository":{"id":195339656,"uuid":"659637887","full_name":"KlaraGtknst/bachelor-thesis","owner":"KlaraGtknst","description":"This repository contains the written work on the Bachelor thesis 'Identification of key information with topic analysis on large unstructured text data'.","archived":false,"fork":false,"pushed_at":"2024-07-03T13:42:49.000Z","size":98379,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-24T05:11:27.433Z","etag":null,"topics":["eigendocs","eigenfaces","nlp","pca"],"latest_commit_sha":null,"homepage":"","language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KlaraGtknst.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-28T08:46:29.000Z","updated_at":"2024-07-03T13:44:10.000Z","dependencies_parsed_at":"2023-10-15T22:57:34.665Z","dependency_job_id":"1d7adaaa-477e-4aa2-a977-5c009b2d9513","html_url":"https://github.com/KlaraGtknst/bachelor-thesis","commit_stats":null,"previous_names":["klaragtknst/bachelor-thesis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KlaraGtknst%2Fbachelor-thesis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KlaraGtknst%2Fbachelor-thesis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KlaraGtknst%2Fbachelor-thesis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KlaraGtknst%2Fbachelor-thesis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KlaraGtknst","download_url":"https://codeload.github.com/KlaraGtknst/bachelor-thesis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244082579,"owners_count":20395297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["eigendocs","eigenfaces","nlp","pca"],"created_at":"2024-11-24T01:21:20.318Z","updated_at":"2026-02-09T10:32:06.287Z","avatar_url":"https://github.com/KlaraGtknst.png","language":"TeX","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Abstract\nThe goal of this thesis is to investigate the applicability of computational means to the exploration of large unstructured text corpora.\nFinding relevant documents and interconnections between documents becomes significantly more difficult due to the sheer amount of documents available.\nInstitutes, such as the German tax offices, have access to leak data, for instance, the _Panama Papers_ or the _Bahamas leak_, \ncontaining huge amounts of documents and valuable information yet to be extracted.\nHowever, these institutes, companies and individuals do not have sufficient resources to explore individual documents \nin order to find a specific one or to identify inherent key topics.\nHence, computational means, such as text mining or topic analysis, may help to overcome this obstacle.\nThis thesis proposes an approach to finding relevant documents which share common topics from a large unstructured text corpus.\nThe approach bundles different methods, such as textual embeddings, transformation of images and clustering techniques.\nAs a result of this work, a web interface that enables the comparison of the methods examined via queries for similar documents to a database is provided. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklaragtknst%2Fbachelor-thesis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fklaragtknst%2Fbachelor-thesis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklaragtknst%2Fbachelor-thesis/lists"}