{"id":13622492,"url":"https://github.com/robertpfeiffer/dbpedia-clustering","last_synced_at":"2025-04-15T09:32:34.756Z","repository":{"id":553246,"uuid":"183745","full_name":"robertpfeiffer/dbpedia-clustering","owner":"robertpfeiffer","description":"class project","archived":false,"fork":false,"pushed_at":"2009-08-31T22:23:52.000Z","size":856,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-08-01T21:53:45.466Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"hpi-web.de/teaching/lehrangebot/veranstaltung/mapreduce_algorithms_on_hadoop.html","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robertpfeiffer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2009-04-23T17:12:43.000Z","updated_at":"2019-08-13T14:20:46.000Z","dependencies_parsed_at":"2022-07-07T15:30:36.264Z","dependency_job_id":null,"html_url":"https://github.com/robertpfeiffer/dbpedia-clustering","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robertpfeiffer%2Fdbpedia-clustering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robertpfeiffer%2Fdbpedia-clustering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robertpfeiffer%2Fdbpedia-clustering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robertpfeiffer%2Fdbpedia-clustering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robertpfeiffer","download_url":"https://codeload.github.com/robertpfeiffer/dbpedia-clustering/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223668170,"owners_count":17182884,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T21:01:20.046Z","updated_at":"2024-11-08T10:30:22.014Z","avatar_url":"https://github.com/robertpfeiffer.png","language":"Java","funding_links":[],"categories":["Java"],"sub_categories":[],"readme":"Clustering of DBPedia Subjects\n==============================\n \nSeminar Map/Reduce Algorithms on Hadoop \n---------------------------------------\n\n\n##Schritt 1: Kompilieren des Projektes\njar mit ant erstellen\n\n    ant make-jar\n\n##Schritt 2: Erstellen der Sequencedatei\nEine Datei mit den Namen muss aus der ersten pivotdatei generiert werden.\n\n    tail -n 1 infobox_pivot_part1 \u003e names\n\nDie Klasse BitsToSeqFile muss mit der pivot-Binärdatei, der Namensdatei und dem Namen \nder gewünschten Ausgabedatei für die Subjekte aufgerufen werden\n\n    java -jar dist/clustering.jar de.myhpi.BitsToSeqFile infobox_pivot_part2 names subjects.seq\n\n##Schritt 3: Erstellen der Clusterzentren\nDie Klasse GenerateClusters muss mit der Subjektdatei, der Namensdatei und dem Namen \nder gewünschten Ausgabedatei für die Subjekte aufgerufen werden. Weitere benötigte \nArgumente sind die Anzahl der Attribute und die Anzahl der zu erzeugenden Cluster.\n\n    java -jar dist/clustering.jar de.myhpi.GenerateClusters subjects.seq centers.seq 42644 100\n\n##Schritt 4: Kopieren der Eingabedateien ins HDFS\nDanach müssen die Subjektdatei, die Clusterzetrendatei und die Datei config.xml in das\nHDFS kopiert werden. Gegenbenenfalls kann die config.xml angepasst werden.\n\n##Schritt 5: Jobs ausführen\nhadoop jar mit dem Programmnamen \"k-means\" und der Subjektdatei, der Zentrendatei und\ndem Ausgabepfad aufrufen\n\n   hadoop jar dist/clustering.jar k-means subjects.seq centers.seq output-dir\n\n##Schritt 6: Ausgabedaten aus dem HDFS kopieren\nNachdem das Programm die Jobs ausgeführt hat, können die Ausgabedaten auf das lokale \nDateisystem kopiert und von Menschen gelesen werden\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobertpfeiffer%2Fdbpedia-clustering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobertpfeiffer%2Fdbpedia-clustering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobertpfeiffer%2Fdbpedia-clustering/lists"}