{"id":16776120,"url":"https://github.com/ahmadassaf/kbe","last_synced_at":"2026-01-03T02:50:31.884Z","repository":{"id":15017793,"uuid":"17743451","full_name":"ahmadassaf/KBE","owner":"ahmadassaf","description":"Node.js application to extract the knowledge represented in Google infoboxes (aka Google Knowlege Graph Panel)","archived":false,"fork":false,"pushed_at":"2017-02-28T13:56:37.000Z","size":5768,"stargazers_count":26,"open_issues_count":0,"forks_count":6,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-01-23T05:14:08.348Z","etag":null,"topics":["knowledge-graph","knowledgebase","web-ontology-language"],"latest_commit_sha":null,"homepage":"http://kbe.ahmadassaf.com","language":"Web Ontology Language","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ahmadassaf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-03-14T10:54:54.000Z","updated_at":"2024-10-02T12:44:13.000Z","dependencies_parsed_at":"2022-09-04T23:01:54.838Z","dependency_job_id":null,"html_url":"https://github.com/ahmadassaf/KBE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmadassaf%2FKBE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmadassaf%2FKBE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmadassaf%2FKBE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmadassaf%2FKBE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ahmadassaf","download_url":"https://codeload.github.com/ahmadassaf/KBE/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243910821,"owners_count":20367545,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["knowledge-graph","knowledgebase","web-ontology-language"],"created_at":"2024-10-13T07:09:02.500Z","updated_at":"2026-01-03T02:50:31.845Z","avatar_url":"https://github.com/ahmadassaf.png","language":"Web Ontology Language","funding_links":[],"categories":[],"sub_categories":[],"readme":"Knowledge-base Extractor\n=======================\n\nThis is a node.js application that aims at extracting the knowledge represented in the Google infoboxes (aka Google Knowlege Graph Panel). \n\nThe Algorithm implemented is the following:\n - Query DBpedia for all concepts (types) for which there is at least one instance that has a \u003csameAs\u003e link to a Freebase ID \n - For each of these concepts pick (n) instances randomly \n - For each instance, issue a Google Search query:\n     + if an infobox is available -\u003e scrap the infobox to extract the properties\n     + if no infoxbox is available, check if Google suggests \"do you mean ... ?\" and if so, traverse the link and look for an infobox\n     + if no infobox or correction is available, disambiguate the concept (type) used in the search query and check if an infobox is returned \n     + if Google suggests disambiguation in an infobox parse all the links in it -\u003e it is best to find which suggestion maps to the current data-type we are using -\u003e check the Freebase - DBpedia mappings\n - Cluster properties for each concept \n\n### Notes\n- The result of our expirement is in the results folder ```results/dbpedia.json```\n- For a more detailed view for each DBpedia class, one can check the files in ```results/dbpedia```\n\n## How to run?\n - Clone the repo to your local machine\n - run ```npm install``` on the root of the local project directory \n \nWe Will automatically create all the required Cache folders:\n \n- Main cache folder \"cache\" in the root folder of the application\n    + folder called ```GKB``` inside the cache folder: This will hold the aggregated Google Knowledge boxes extracted for a DBpedia concept (type)\n    + folder called ```instances_GKB``` inside the cache folder: This will hold the Google Knowledge box for a single instance\n    + folder called ```instances``` inside the cache folder: This will hold the DBpedia instances for each concept (type)\n    + folder called ```instance_properties``` inside the cache folder: Thiw ill hold the distinct list of properties for all the instances of a certain concept \n\n- run ```node KBE.js``` in the console\n\nThe application is run in the console and the output will be available in cache/result.json\n\n## Crawling Configuration\nThere is a set of options that you can change found in the file ```options.json```\n```js\ncache_dbpedia_concepts       : true,\nlimit_dbpedia_concepts       : true,\nlimit_dbpedia_instances      : true,\nlimit_dbpedia_concepts_value : 10,\nlimit_dbpedia_instances_value: 10,\nproxy                        : null\n```\n- ```cache_dbpedia_concepts``` cache the concepts retrieved from DBpedia.\n- ```limit_dbpedia_concepts``` limit the number of concepts retrieved by DBpedia, false will retrieve all the concepts\n- ```limit_dbpedia_instances``` limit the number of instances retrieved for each concept, false will retrieve all the instances\n- ```limit_dbpedia_concepts_value``` the number of concepts you wish to retrieve\n- ```limit_dbpedia_instances_value``` the number of instances you wish to retrieve for each concept\n- ```proxy``` the proxy address string containing ports i.e ```http:\\\\proxy:8080```\n\nFor our experiment the parameters are:\n```js\ncache_dbpedia_concepts       : true,\nlimit_dbpedia_concepts       : false,\nlimit_dbpedia_instances      : true,\nlimit_dbpedia_concepts_value : null,\nlimit_dbpedia_instances_value: 100,\nproxy                        : null\n```\n\nMoreover, you can always check the corresponding CSS class name selectors for the Google Knowledge Panel and edit them if needed in the same ```options.json``` file.\n\nCurrently the CSS selectors are:\n```\n\"knowledgeBox\"                : \"#kno-result\",\n\"knowledgeBox_disambiguate\"   : \".kp-blk\",\n\"property\"                    : \"._Nl\",\n\"property_value\"              : \".kno-fv\",\n\"label\"                       : \".kno-ecr-pt\",\n\"description\"                 : \".kno-rdesc\",\n\"type\"                        : \"._kx\",\n\"images\"                      : \".bicc\",\n\"special_property\"            : \".kno-sh\",\n\"special_property_value\"      : \"._Zh\",\n\"special_property_value_link\" : \"a._dt\"\n```\n## Updates\n\n - Properties now have the direct links to DBpedia ontology\n - Properties scores are normalized\n\n## Sample Result\n```\n  \"Band\": {\n  \t\"summary\": {\n  \t\t\"label\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/label\",\n  \t\t\t\"count\": 100\n  \t\t},\n  \t\t\"description\": {\n  \t\t\t\"uri\": \"http://purl.org/dc/elements/1.1/description\",\n  \t\t\t\"count\": 100\n  \t\t},\n  \t\t\"type\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/type\",\n  \t\t\t\"count\": 100\n  \t\t},\n  \t\t\"origin\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/origin\",\n  \t\t\t\"count\": 88.17204301075269\n  \t\t},\n  \t\t\"members\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/members\",\n  \t\t\t\"count\": 88.17204301075269\n  \t\t},\n  \t\t\"albums\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/albums\",\n  \t\t\t\"count\": 87.09677419354838\n  \t\t},\n  \t\t\"leadSingers\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/leadSingers\",\n  \t\t\t\"count\": 6.451612903225806\n  \t\t},\n  \t\t\"recordLabel\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/recordLabel\",\n  \t\t\t\"count\": 12.903225806451612\n  \t\t},\n  \t\t\"awards\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/awards\",\n  \t\t\t\"count\": 13.978494623655912\n  \t\t},\n  \t\t\"nominations\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/nominations\",\n  \t\t\t\"count\": 7.526881720430108\n  \t\t},\n  \t\t\"born\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/born\",\n  \t\t\t\"count\": 2.1505376344086025\n  \t\t},\n  \t\t\"nationality\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/nationality\",\n  \t\t\t\"count\": 2.1505376344086025\n  \t\t},\n  \t\t\"height\": {\n  \t\t\t\"uri\": \"http://dbpedia.org/property/height\",\n  \t\t\t\"count\": 1.0752688172043012\n  \t\t}\n  \t},\n  \t\"infoboxless\": [\n  \t\t\"!Action Pact!\",\n  \t\t\"Allele (band)\",\n  \t\t\"Anti-Pasti\",\n  \t\t\"Armageddon (A\u0026M band)\",\n  \t\t\"Banket (band)\",\n  \t\t\"Battlelore\",\n  \t\t\"Ben Folds Five\"\n  \t],\n  \t\"Unmapped_Properties\": {\n  \t\t\"leadSinger\": 1,\n  \t\t\"recordLabels\": 1,\n  \t\t\"songs\": 1,\n  \t\t\"upcomingEvents\": 1,\n  \t\t\"peopleAlsoSearchFor\": 1,\n  \t\t\"activeFrom\": 1,\n  \t\t\"filmMusicCredits\": 1,\n  \t\t\"activeUntil\": 1,\n  \t\t\"moviesAndTvShows\": 1\n  \t}\n  }\n ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmadassaf%2Fkbe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fahmadassaf%2Fkbe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmadassaf%2Fkbe/lists"}