{"id":14959391,"url":"https://github.com/msorkhpar/wiki-entity-summarization","last_synced_at":"2025-05-02T12:31:58.015Z","repository":{"id":244321533,"uuid":"802731420","full_name":"msorkhpar/wiki-entity-summarization","owner":"msorkhpar","description":"This repository hosts a comprehensive suite for graph-based entity summarization dataset generating from user-selected Wikipedia pages. Utilizing a series of interconnected modules, it leverages Wikidata and Wikipedia dumps to construct a dataset, alongside auto-generated ground truths.","archived":false,"fork":false,"pushed_at":"2024-06-24T16:29:18.000Z","size":37037,"stargazers_count":21,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-13T02:23:43.266Z","etag":null,"topics":["dataset","dataset-generator","entity-summarization","neo4j","networkx","python","wiki-entity-summarization","wikies"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/msorkhpar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-05-19T05:33:03.000Z","updated_at":"2024-12-30T22:29:43.000Z","dependencies_parsed_at":"2024-09-22T09:02:30.324Z","dependency_job_id":"1e1fc046-a11c-4938-8971-d70deba7386d","html_url":"https://github.com/msorkhpar/wiki-entity-summarization","commit_stats":{"total_commits":59,"total_committers":5,"mean_commits":11.8,"dds":0.2542372881355932,"last_synced_commit":"11d091c20e4a378c6386cc5360d6978d1fee02b4"},"previous_names":["msorkhpar/wiki-entity-summarization"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msorkhpar%2Fwiki-entity-summarization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msorkhpar%2Fwiki-entity-summarization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msorkhpar%2Fwiki-entity-summarization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msorkhpar%2Fwiki-entity-summarization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/msorkhpar","download_url":"https://codeload.github.com/msorkhpar/wiki-entity-summarization/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252038217,"owners_count":21684654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","dataset-generator","entity-summarization","neo4j","networkx","python","wiki-entity-summarization","wikies"],"created_at":"2024-09-24T13:19:36.184Z","updated_at":"2025-05-02T12:31:53.005Z","avatar_url":"https://github.com/msorkhpar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![arXiv](https://img.shields.io/badge/arXiv-2406.08435-B31B1B.svg)](https://doi.org/10.48550/arXiv.2406.08435)![GitHub License](https://img.shields.io/github/license/msorkhpar/wiki-entity-summarization)![GitHub Release](https://img.shields.io/github/v/release/msorkhpar/wiki-entity-summarization)\n\n# Wiki Entity Summarization Benchmark (WikES)\n\nThis repository leverages\nthe [wiki-entity-summarization-preprocessor](https://github.com/msorkhpar/wiki-entity-summarization-preprocessor)\nproject to construct an Entity Summarization Graph based on a given set of nodes. The project tries to\nmaintain the structure of the Wikidata knowledge graph by performing random walk sampling with a depth of `K,` starting\nfrom seed nodes after all the summary edges have been added to the result.\nIt then checks if the expanded graph is a single weakly connected component. If not, it finds `B` paths\nto connect the components. The final result is a heterogeneous graph consisting of the seed nodes,\ntheir summary edges, (1..K)-hop neighbors of the seed nodes and their edges, and any intermediary nodes added to ensure\ngraph connectivity. Each node and edge in the graph is enriched with metadata obtained from Wikidata and Wikipedia and\npredicate information, providing additional context and details about the entities and their relationships.\n\u003cbr/\u003e\n\u003cbr/\u003e\n![A single root entity with its summary edges and other expanded edges by random walk](/WikES-example.png)\n\n## Loading the Datasets\n\n### Load Using `wikes-toolkit`\n\nTo load the dataset, we have introduced a toolkit that can be used to download, load, work, and evaluate 48\nWiki-Entity-Summarization datasets. The toolkit is available as a Python package and can be installed using pip:\n\n```bash\npip install wikes-toolkit\n```\n\nA simple example of how to use the toolkit is as follows:\n\n```python\nfrom wikes_toolkit import WikESToolkit, V1, WikESGraph\n\ntoolkit = WikESToolkit(save_path=\"./data\")  # save_path is optional\nG = toolkit.load_graph(\n    WikESGraph,\n    V1.WikiLitArt.SMALL,\n    entity_formatter=lambda e: f\"Entity({e.wikidata_label})\",\n    predicate_formatter=lambda p: f\"Predicate({p.label})\",\n    triple_formatter=lambda\n        t: f\"({t.subject_entity.wikidata_label})-[{t.predicate.label}]-\u003e ({t.object_entity.wikidata_label})\"\n)\n\nroot_nodes = G.root_entities()\nnodes = G.entities()\n\n```\n\nPlease refer to the [Wiki-Entity-Summarization-Toolkit](https://github.com/msorkhpar/wiki-entity-summarization-toolkit)\nrepository for more information.\n\n### Using mlcroissant\n\nTo load WikES datasets, you can use [mlcorissant](https://github.com/mlcommons/croissant/) as well. You can find the\nmetadata JSON files in the [dataset details tabel](#Pre-generated-Datasets). \u003c/br\u003e\n\nHere is an example of loading our dataset using mlcorissant:\n\n```python\nfrom mlcroissant import Dataset\n\n\ndef print_first_item(record_name):\n    for record in dataset.records(record_set=record_name):\n        for key, val in record.items():\n            if isinstance(val, bytes):\n                val = str(val, \"utf-8\")\n            print(f\"{key}=[{val}]({type(val)})\", end=\", \")\n        break\n    print()\n\n\ndataset = Dataset(\n    jsonld=\"https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s.json\")\n\nprint(dataset.metadata.record_sets)\n\nprint_first_item(\"entities\")\nprint_first_item(\"root-entities\")\nprint_first_item(\"predicates\")\nprint_first_item(\"triples\")\nprint_first_item(\"ground-truths\")\n\"\"\" The output of the above code:\nwikes-dataset\n[RecordSet(uuid=\"entities\"), RecordSet(uuid=\"root-entities\"), RecordSet(uuid=\"predicates\"), RecordSet(uuid=\"triples\"), RecordSet(uuid=\"ground-truths\")]\nid=[0](\u003cclass 'int'\u003e), entity=[Q6387338](\u003cclass 'str'\u003e), wikidata_label=[Ken Blackwell](\u003cclass 'str'\u003e), wikidata_description=[American politician and activist](\u003cclass 'str'\u003e), wikipedia_id=[769596](\u003cclass 'int'\u003e), wikipedia_title=[Ken_Blackwell](\u003cclass 'str'\u003e), \nentity=[9](\u003cclass 'int'\u003e), category=[singer](\u003cclass 'str'\u003e), \nid=[0](\u003cclass 'int'\u003e), predicate=[P1344](\u003cclass 'str'\u003e), predicate_label=[participant in](\u003cclass 'str'\u003e), predicate_desc=[event in which a person or organization was/is a participant; inverse of P710 or P1923](\u003cclass 'str'\u003e), \nsubject=[1](\u003cclass 'int'\u003e), predicate=[0](\u003cclass 'int'\u003e), object=[778](\u003cclass 'int'\u003e), \nroot_entity=[9](\u003cclass 'int'\u003e), subject=[9](\u003cclass 'int'\u003e), predicate=[8](\u003cclass 'int'\u003e), object=[31068](\u003cclass 'int'\u003e), \n\"\"\"\n```\n\n## Loading the Pre-processed Databases\n\nAs described\nin [wiki-entity-summarization-preprocessor](https://github.com/msorkhpar/wiki-entity-summarization-preprocessor), we\nhave imported en-wikidata items as a graph with their summaries into a Neo4j database using Wikipedia and Wikidata XML\ndump files. Additionally, all the other related metadata was imported into a Postgres database.\n\nIf you want to create your own dataset but do not want to run the pre-processor again, you can download and load the\nexported files from these two databases. Please refer to the release notes of the current version `1.0.0` (\nenwiki-2023-05-1 and wikidata-wiki-2023-05-1).\n\n- [PostgreSQL-1.0.0 (wiki 2023-05-01)](https://github.com/msorkhpar/wiki-entity-summarization-preprocessor/releases/tag/PostgreSQL-1.0.0)\n- [Neo4j-1.0.0 (wiki 2023-05-01)](https://github.com/msorkhpar/wiki-entity-summarization-preprocessor/releases/tag/Neo4j-1.0.0)\n\n## Process Overview\n\n### 1. **Building the Summary Graph**\n\n- Create a summary graph where each seed node is expanded with its summary edges.\n\n### 2. **Expanding the Summary Graph**\n\n- Perform random walks starting from the seed nodes to mimic the structure of the Wikidata graph.\n- Scale the number of walks based on the degree of the seed nodes.\n- Add new edges to the graph from the random walk results.\n\n### 3. **Connecting Components**\n\n- Check if the expanded graph forms a single weakly connected component.\n- If not, iteratively connect smaller components using the shortest paths until a single component is achieved.\n\n### 4. **Adding Metadata**\n\n- Enhance the final graph with additional metadata for each node and edge.\n- Include labels, descriptions, and other relevant information from Wikidata, Wikipedia, and predicate information.\n\n### Pre-generated Datasets\n\nWe have generated datasets using [A Brief History of Human Time project](https://medialab.github.io/bhht-datascape/).\nThese datasets contain different sets of seed nodes, categorized by various human arts and professions.\n\n| dataset (variant, size, None/train/val/test)                                                                                                                                                                                                                                                                                                                                          | #roots | #smmaries | #nodes | #edges | #labels | roots category distribution                                                                                                                                  | Running Time(sec) |\n|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|-----------|--------|--------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|\n| WikiLitArt-s \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s.json)                          | 494    | 10416     | 85346  | 136950 | 547     | actor=150\u003cbr/\u003e composer=35\u003cbr/\u003e film=41\u003cbr/\u003e novelist=24\u003cbr/\u003e painter=59\u003cbr/\u003e poet=39\u003cbr/\u003e screenwriter=17\u003cbr/\u003e singer=72\u003cbr/\u003e writer=57                     | 91.934            |\n| WikiLitArt-s-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s-train.json)  | 346    | 7234      | 61885  | 96497  | 508     | actor=105\u003cbr/\u003e composer=24\u003cbr/\u003e film=29\u003cbr/\u003e novelist=17\u003cbr/\u003e painter=42\u003cbr/\u003e poet=27\u003cbr/\u003e screenwriter=12\u003cbr/\u003e singer=50\u003cbr/\u003e writer=40                     | 66.023            |\n| WikiLitArt-s-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s-val.json)          | 74     | 1572      | 14763  | 20795  | 340     | actor=23\u003cbr/\u003e composer=5\u003cbr/\u003e film=6\u003cbr/\u003e novelist=4\u003cbr/\u003e painter=9\u003cbr/\u003e poet=6\u003cbr/\u003e screenwriter=2\u003cbr/\u003e singer=11\u003cbr/\u003e writer=8                             | 14.364            |\n| WikiLitArt-s-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-s-test.json)      | 74     | 1626      | 15861  | 22029  | 350     | actor=22\u003cbr/\u003e composer=6\u003cbr/\u003e film=6\u003cbr/\u003e novelist=3\u003cbr/\u003e painter=8\u003cbr/\u003e poet=6\u003cbr/\u003e screenwriter=3\u003cbr/\u003e singer=11\u003cbr/\u003e writer=9                             | 14.6              |\n| WikiLitArt-m \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m.json)                          | 494    | 10416     | 128061 | 220263 | 604     | actor=150\u003cbr/\u003e composer=35\u003cbr/\u003e film=41\u003cbr/\u003e novelist=24\u003cbr/\u003e painter=59\u003cbr/\u003e poet=39\u003cbr/\u003e screenwriter=17\u003cbr/\u003e singer=72\u003cbr/\u003e writer=57                     | 155.368           |\n| WikiLitArt-m-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m-train.json)  | 346    | 7234      | 93251  | 155667 | 566     | actor=105\u003cbr/\u003e composer=24\u003cbr/\u003e film=29\u003cbr/\u003e novelist=17\u003cbr/\u003e painter=42\u003cbr/\u003e poet=27\u003cbr/\u003e screenwriter=12\u003cbr/\u003e singer=50\u003cbr/\u003e writer=40                     | 111.636           |\n| WikiLitArt-m-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m-val.json)          | 74     | 1572      | 22214  | 33547  | 375     | actor=23\u003cbr/\u003e composer=5\u003cbr/\u003e film=6\u003cbr/\u003e novelist=4\u003cbr/\u003e painter=9\u003cbr/\u003e poet=6\u003cbr/\u003e screenwriter=2\u003cbr/\u003e singer=11\u003cbr/\u003e writer=8                             | 22.957            |\n| WikiLitArt-m-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-m-test.json)      | 74     | 1626      | 24130  | 35980  | 394     | actor=22\u003cbr/\u003e composer=6\u003cbr/\u003e film=6\u003cbr/\u003e novelist=3\u003cbr/\u003e painter=8\u003cbr/\u003e poet=6\u003cbr/\u003e screenwriter=3\u003cbr/\u003e singer=11\u003cbr/\u003e writer=9                             | 26.187            |\n| WikiLitArt-l \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l.json)                          | 494    | 10416     | 239491 | 466905 | 703     | actor=150\u003cbr/\u003e composer=35\u003cbr/\u003e film=41\u003cbr/\u003e novelist=24\u003cbr/\u003e painter=59\u003cbr/\u003e poet=39\u003cbr/\u003e screenwriter=17\u003cbr/\u003e singer=72\u003cbr/\u003e writer=57                     | 353.113           |\n| WikiLitArt-l-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l-train.json)  | 346    | 7234      | 176057 | 332279 | 661     | actor=105\u003cbr/\u003e composer=24\u003cbr/\u003e film=29\u003cbr/\u003e novelist=17\u003cbr/\u003e painter=42\u003cbr/\u003e poet=27\u003cbr/\u003e screenwriter=12\u003cbr/\u003e singer=50\u003cbr/\u003e writer=40                     | 244.544           |\n| WikiLitArt-l-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l-val.json)          | 74     | 1572      | 42745  | 71734  | 446     | actor=23\u003cbr/\u003e composer=5\u003cbr/\u003e film=6\u003cbr/\u003e novelist=4\u003cbr/\u003e painter=9\u003cbr/\u003e poet=6\u003cbr/\u003e screenwriter=2\u003cbr/\u003e singer=11\u003cbr/\u003e writer=8                             | 57.263            |\n| WikiLitArt-l-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiLitArt-l-test.json)      | 74     | 1626      | 46890  | 77931  | 493     | actor=22\u003cbr/\u003e composer=6\u003cbr/\u003e film=6\u003cbr/\u003e novelist=3\u003cbr/\u003e painter=8\u003cbr/\u003e poet=6\u003cbr/\u003e screenwriter=3\u003cbr/\u003e singer=11\u003cbr/\u003e writer=9                             | 60.466            |\n| WikiCinema-s \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s.json)                          | 493    | 11750     | 70753  | 126915 | 469     | actor=405\u003cbr/\u003e film=88                                                                                                                                       | 118.014           |\n| WikiCinema-s-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s-train.json)  | 345    | 8374      | 52712  | 89306  | 437     | actor=284\u003cbr/\u003e film=61                                                                                                                                       | 84.364            |\n| WikiCinema-s-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s-val.json)          | 73     | 1650      | 13362  | 19280  | 305     | actor=59\u003cbr/\u003e film=14                                                                                                                                        | 18.651            |\n| WikiCinema-s-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-s-test.json)      | 75     | 1744      | 14777  | 21567  | 313     | actor=62\u003cbr/\u003e film=13                                                                                                                                        | 19.851            |\n| WikiCinema-m \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m.json)                          | 493    | 11750     | 101529 | 196061 | 541     | actor=405\u003cbr/\u003e film=88                                                                                                                                       | 196.413           |\n| WikiCinema-m-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m-train.json)  | 345    | 8374      | 75900  | 138897 | 491     | actor=284\u003cbr/\u003e film=61                                                                                                                                       | 142.091           |\n| WikiCinema-m-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m-val.json)          | 73     | 1650      | 19674  | 30152  | 344     | actor=59\u003cbr/\u003e film=14                                                                                                                                        | 31.722            |\n| WikiCinema-m-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-m-test.json)      | 75     | 1744      | 22102  | 34499  | 342     | actor=62\u003cbr/\u003e film=13                                                                                                                                        | 33.674            |\n| WikiCinema-l \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l.json)                          | 493    | 11750     | 185098 | 397546 | 614     | actor=405\u003cbr/\u003e film=88                                                                                                                                       | 475.679           |\n| WikiCinema-l-train  \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l-train.json) | 345    | 8374      | 139598 | 284417 | 575     | actor=284\u003cbr/\u003e film=61                                                                                                                                       | 333.148           |\n| WikiCinema-l-val  \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l-val.json)         | 73     | 1650      | 37352  | 63744  | 412     | actor=59\u003cbr/\u003e film=14                                                                                                                                        | 68.62             |\n| WikiCinema-l-test  \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiCinema-l-test.json)     | 75     | 1744      | 43238  | 74205  | 426     | actor=62\u003cbr/\u003e film=13                                                                                                                                        | 87.07             |\n| WikiPro-s \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s.json)                                      | 493    | 9853      | 79825  | 125912 | 616     | actor=58\u003cbr/\u003e football=156\u003cbr/\u003e journalist=14\u003cbr/\u003e lawyer=16\u003cbr/\u003e painter=23\u003cbr/\u003e player=25\u003cbr/\u003e politician=125\u003cbr/\u003e singer=27\u003cbr/\u003e sport=21\u003cbr/\u003e writer=28  | 126.119           |\n| WikiPro-s-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s-train.json)              | 345    | 6832      | 57529  | 87768  | 575     | actor=41\u003cbr/\u003e football=109\u003cbr/\u003e journalist=10\u003cbr/\u003e lawyer=11\u003cbr/\u003e painter=16\u003cbr/\u003e player=17\u003cbr/\u003e politician=87\u003cbr/\u003e singer=19\u003cbr/\u003e sport=15\u003cbr/\u003e writer=20   | 89.874            |\n| WikiPro-s-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s-val.json)                      | 74     | 1548      | 15769  | 21351  | 405     | actor=9\u003cbr/\u003e football=23\u003cbr/\u003e journalist=2\u003cbr/\u003e lawyer=3\u003cbr/\u003e painter=3\u003cbr/\u003e player=4\u003cbr/\u003e politician=19\u003cbr/\u003e singer=4\u003cbr/\u003e sport=3\u003cbr/\u003e writer=4            | 21.021            |\n| WikiPro-s-test  \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-s-test.json)                 | 74     | 1484      | 15657  | 21145  | 384     | actor=8\u003cbr/\u003e football=24\u003cbr/\u003e journalist=2\u003cbr/\u003e lawyer=2\u003cbr/\u003e painter=4\u003cbr/\u003e player=4\u003cbr/\u003e politician=19\u003cbr/\u003e singer=4\u003cbr/\u003e sport=3\u003cbr/\u003e writer=4            | 21.743            |\n| WikiPro-m \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m.json)                                      | 493    | 9853      | 119305 | 198663 | 670     | actor=58\u003cbr/\u003e football=156\u003cbr/\u003e journalist=14\u003cbr/\u003e lawyer=16\u003cbr/\u003e painter=23\u003cbr/\u003e player=25\u003cbr/\u003e politician=125\u003cbr/\u003e singer=27\u003cbr/\u003e sport=21\u003cbr/\u003e writer=28  | 208.157           |\n| WikiPro-m-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m-train.json)              | 345    | 6832      | 86434  | 138676 | 633     | actor=41\u003cbr/\u003e football=109\u003cbr/\u003e journalist=10\u003cbr/\u003e lawyer=11\u003cbr/\u003e painter=16\u003cbr/\u003e player=17\u003cbr/\u003e politician=87\u003cbr/\u003e singer=19\u003cbr/\u003e sport=15\u003cbr/\u003e writer=20   | 141.563           |\n| WikiPro-m-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m-val.json)                      | 74     | 1548      | 24230  | 34636  | 463     | actor=9\u003cbr/\u003e football=23\u003cbr/\u003e journalist=2\u003cbr/\u003e lawyer=3\u003cbr/\u003e painter=3\u003cbr/\u003e player=4\u003cbr/\u003e politician=19\u003cbr/\u003e singer=4\u003cbr/\u003e sport=3\u003cbr/\u003e writer=4            | 36.045            |\n| WikiPro-m-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-m-test.json)                  | 74     | 1484      | 24117  | 34157  | 462     | actor=8\u003cbr/\u003e football=24\u003cbr/\u003e journalist=2\u003cbr/\u003e lawyer=2\u003cbr/\u003e painter=4\u003cbr/\u003e player=4\u003cbr/\u003e politician=19\u003cbr/\u003e singer=4\u003cbr/\u003e sport=3\u003cbr/\u003e writer=4            | 36.967            |\n| WikiPro-l \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l.json)                                      | 493    | 9853      | 230442 | 412766 | 769     | actor=58\u003cbr/\u003e football=156\u003cbr/\u003e journalist=14\u003cbr/\u003e lawyer=16\u003cbr/\u003e painter=23\u003cbr/\u003e player=25\u003cbr/\u003e politician=125\u003cbr/\u003e singer=27\u003cbr/\u003e sport=21\u003cbr/\u003e writer=28  | 489.409           |\n| WikiPro-l-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l-train.json)              | 345    | 6832      | 166685 | 290069 | 725     | actor=41\u003cbr/\u003e football=109\u003cbr/\u003e journalist=10\u003cbr/\u003e lawyer=11\u003cbr/\u003e painter=16\u003cbr/\u003e player=17\u003cbr/\u003e politician=87\u003cbr/\u003e singer=19\u003cbr/\u003e sport=15\u003cbr/\u003e writer=20   | 334.864           |\n| WikiPro-l-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l-val.json)                      | 74     | 1548      | 48205  | 74387  | 549     | actor=9\u003cbr/\u003e football=23\u003cbr/\u003e journalist=2\u003cbr/\u003e lawyer=3\u003cbr/\u003e painter=3\u003cbr/\u003e player=4\u003cbr/\u003e politician=19\u003cbr/\u003e singer=4\u003cbr/\u003e sport=3\u003cbr/\u003e writer=4            | 84.089            |\n| WikiPro-l-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiPro-l-test.json)                  | 74     | 1484      | 47981  | 72845  | 546     | actor=8\u003cbr/\u003e football=24\u003cbr/\u003e journalist=2\u003cbr/\u003e lawyer=2\u003cbr/\u003e painter=4\u003cbr/\u003e player=4\u003cbr/\u003e politician=19\u003cbr/\u003e singer=4\u003cbr/\u003e sport=3\u003cbr/\u003e writer=4            | 92.545            |\n| WikiProFem-s \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s.json)                          | 468    | 8338      | 79926  | 123193 | 571     | actor=141\u003cbr/\u003e athletic=25\u003cbr/\u003e football=24\u003cbr/\u003e journalist=16\u003cbr/\u003e painter=16\u003cbr/\u003e player=32\u003cbr/\u003e politician=81\u003cbr/\u003e singer=69\u003cbr/\u003e sport=18\u003cbr/\u003e writer=46 | 177.63            |\n| WikiProFem-s-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s-train.json)  | 330    | 5587      | 58329  | 87492  | 521     | actor=98\u003cbr/\u003e athletic=18\u003cbr/\u003e football=17\u003cbr/\u003e journalist=9\u003cbr/\u003e painter=13\u003cbr/\u003e player=22\u003cbr/\u003e politician=57\u003cbr/\u003e singer=48\u003cbr/\u003e sport=14\u003cbr/\u003e writer=34   | 127.614           |\n| WikiProFem-s-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s-val.json)          | 68     | 1367      | 14148  | 19360  | 344     | actor=21\u003cbr/\u003e athletic=4\u003cbr/\u003e football=3\u003cbr/\u003e journalist=4\u003cbr/\u003e painter=1\u003cbr/\u003e player=5\u003cbr/\u003e politician=13\u003cbr/\u003e singer=11\u003cbr/\u003e sport=1\u003cbr/\u003e writer=5         | 29.081            |\n| WikiProFem-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-s-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-test.json)          | 70     | 1387      | 13642  | 18567  | 360     | actor=22\u003cbr/\u003e athletic=3\u003cbr/\u003e football=4\u003cbr/\u003e journalist=3\u003cbr/\u003e painter=2\u003cbr/\u003e player=5\u003cbr/\u003e politician=11\u003cbr/\u003e singer=10\u003cbr/\u003e sport=3\u003cbr/\u003e writer=7         | 27.466            |\n| WikiProFem-m \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m.json)                          | 468    | 8338      | 122728 | 196838 | 631     | actor=141\u003cbr/\u003e athletic=25\u003cbr/\u003e football=24\u003cbr/\u003e journalist=16\u003cbr/\u003e painter=16\u003cbr/\u003e player=32\u003cbr/\u003e politician=81\u003cbr/\u003e singer=69\u003cbr/\u003e sport=18\u003cbr/\u003e writer=46 | 301.718           |\n| WikiProFem-m-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m-train.json)  | 330    | 5587      | 89922  | 140505 | 600     | actor=98\u003cbr/\u003e athletic=18\u003cbr/\u003e football=17\u003cbr/\u003e journalist=9\u003cbr/\u003e painter=13\u003cbr/\u003e player=22\u003cbr/\u003e politician=57\u003cbr/\u003e singer=48\u003cbr/\u003e sport=14\u003cbr/\u003e writer=34   | 217.699           |\n| WikiProFem-m-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m-val.json)          | 68     | 1367      | 21978  | 31230  | 409     | actor=21\u003cbr/\u003e athletic=4\u003cbr/\u003e football=3\u003cbr/\u003e journalist=4\u003cbr/\u003e painter=1\u003cbr/\u003e player=5\u003cbr/\u003e politician=13\u003cbr/\u003e singer=11\u003cbr/\u003e sport=1\u003cbr/\u003e writer=5         | 46.793            |\n| WikiProFem-m-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-m-test.json)      | 70     | 1387      | 21305  | 29919  | 394     | actor=22\u003cbr/\u003e athletic=3\u003cbr/\u003e football=4\u003cbr/\u003e journalist=3\u003cbr/\u003e painter=2\u003cbr/\u003e player=5\u003cbr/\u003e politician=11\u003cbr/\u003e singer=10\u003cbr/\u003e sport=3\u003cbr/\u003e writer=7         | 46.317            |\n| WikiProFem-l \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l.json)                          | 468    | 8338      | 248012 | 413895 | 722     | actor=141\u003cbr/\u003e athletic=25\u003cbr/\u003e football=24\u003cbr/\u003e journalist=16\u003cbr/\u003e painter=16\u003cbr/\u003e player=32\u003cbr/\u003e politician=81\u003cbr/\u003e singer=69\u003cbr/\u003e sport=18\u003cbr/\u003e writer=46 | 768.99            |\n| WikiProFem-l-train \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l-train.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l-train.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l-train.json)  | 330    | 5587      | 183710 | 297686 | 676     | actor=98\u003cbr/\u003e athletic=18\u003cbr/\u003e football=17\u003cbr/\u003e journalist=9\u003cbr/\u003e painter=13\u003cbr/\u003e player=22\u003cbr/\u003e politician=57\u003cbr/\u003e singer=48\u003cbr/\u003e sport=14\u003cbr/\u003e writer=34   | 544.893           |\n| WikiProFem-l-val \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l-val.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l-val.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l-val.json)          | 68     | 1367      | 46018  | 67193  | 492     | actor=21\u003cbr/\u003e athletic=4\u003cbr/\u003e football=3\u003cbr/\u003e journalist=4\u003cbr/\u003e painter=1\u003cbr/\u003e player=5\u003cbr/\u003e politician=13\u003cbr/\u003e singer=11\u003cbr/\u003e sport=1\u003cbr/\u003e writer=5         | 116.758           |\n| WikiProFem-l-test \u003c/br\u003e[csv](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l-test.zip), [graphml](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l-test.graphml), [croissant.json](https://github.com/msorkhpar/wiki-entity-summarization/releases/download/1.0.5/WikiProFem-l-test.json)      | 70     | 1387      | 44193  | 63563  | 472     | actor=22\u003cbr/\u003e athletic=3\u003cbr/\u003e football=4\u003cbr/\u003e journalist=3\u003cbr/\u003e painter=2\u003cbr/\u003e player=5\u003cbr/\u003e politician=11\u003cbr/\u003e singer=10\u003cbr/\u003e sport=3\u003cbr/\u003e writer=7         | 118.524           |\n\n**Keep in mind that by providing a new set of seed nodes, you can generate the output for your own dataset.**\n\n### Dataset Parameters\n\n| Parameter                     | Value |\n|-------------------------------|-------|\n| Min valid summary edges       | 5     |\n| Random walk depth length      | 3     |\n| Min random walk number-small  | 100   |\n| Min random walk number-medium | 150   |\n| Min random walk number-large  | 300   |\n| Max random walk number-small  | 300   |\n| Max random walk number-medium | 600   |\n| Max random walk number-large  | 1800  |\n| Bridges number                | 5     |\n\n## Graph Structure\n\nIn the following you can see a sample of the graph format (we highly recommend using our toolkit to load the datasets):\n\n### CSV Format\n\nAfter unzipping `{variant}-{size}-{dataset_type}.zip` file, you will find the following CSV files:\n\n`{variant}-{size}-{dataset_type}-entities.csv` contains entities. An entity is a Wikidata item (node) in our\ndataset.\n\n| Field           | Description                          | datatype |\n|-----------------|--------------------------------------|----------| \n| id              | incremental integer starting by zero | int      |\n| entity          | Wikidata qid, e.g. `Q76`             | string   |\n| wikidata_label  | Wikidata label (nullable)            | string   |\n| wikidata_desc   | Wikidata description (nullable)      | string   |\n| wikipedia_title | Wikipedia title (nullable)           | string   |\n| wikipedia_id    | Wikipedia page id (nullable)         | long     |\n\n`{variant}-{size}-{dataset_type}-root-entities.csv` contains root entities. A root entity is a seed node\ndescribed previously.\n\n| Field    | Description                                              | datatype |\n|----------|----------------------------------------------------------|----------|\n| entity   | id key in `{variant}-{size}-{dataset_type}-entities.csv` | int      |\n| category | category                                                 | string   |\n\n`{variant}-{size}-{dataset_type}-predicates.csv` contains predicates. A predicate is a Wikidata property or a\ndescribing\na connection.\n\n| Field           | Description                              | datatype |\n|-----------------|------------------------------------------|----------| \n| id              | incremental integer starting by zero     | int      |\n| predicate       | Wikidata Property id, e.g. `P121`        | string   |\n| predicate_label | Wikidata Property label (nullable)       | string   |\n| predicate_desc  | Wikidata Property description (nullable) | string   |\n\n`{variant}-{size}-{dataset_type}-triples.csv` contains triples. A triple is an edge between two entities with a\npredicate.\n\n| Field     | Description                                                | datatype |\n|-----------|------------------------------------------------------------|----------| \n| subject   | id key in `{variant}-{size}-{dataset_type}-entities.csv`   | int      |\n| predicate | id key in `{variant}-{size}-{dataset_type}-predicates.csv` | int      |\n| object    | id key in `{variant}-{size}-{dataset_type}-entities.csv`   | int      |\n\n`{viariant}_{size}_{dataset_type}-ground-truths.csv` contains ground truth triples. A ground truth triple is an\nedge that\nis marked as a summary for a root entity.\n\n| Field       | Description                                                   | datatype |\n|-------------|---------------------------------------------------------------|----------| \n| root_entity | entity in `{variant}-{size}-{dataset_type}-root-entities.csv` | int      |\n| subject     | id key in `{variant}-{size}-{dataset_type}-entities.csv`      | int      |\n| predicate   | id key in `{variant}-{size}-{dataset_type}-predicates.csv`    | int      |\n| object      | id key in `{variant}-{size}-{dataset_type}-entities.csv`      | int      |\n\n**Note: for this file one of the columns `subject` or `object` is equal to the `root_entity`.**\n\n### Example of CSV Files\n\n```csv\n# entities.csv\nid,entity,wikidata_label,wikidata_desc,wikipedia_title,wikipedia_id\n0,Q43416,Keanu Reeves,Canadian actor (born 1964),Keanu_Reeves,16603\n1,Q3820,Beirut,capital and largest city of Lebanon,Beirut,37428\n2,Q639669,musician,person who composes, conducts or performs music,Musician,38284\n3,Q219150,Constantine,2005 film directed by Francis Lawrence,Constantine_(film),1210303\n```\n\n```csv\n# root-entities.csv\nentity,category\n0,Q43416,actor\n```\n\n```csv\n# predicates.csv\nid,predicate,predicate_label,predicate_desc\n0,P19,place of birth,location where the subject was born\n1,P106,occupation,occupation of a person; see also \"field of work\" (Property:P101), \"position held\" (Property:P39)\n2,P161,cast member,actor in the subject production [use \"character role\" (P453) and/or \"name of the character role\" (P4633) as qualifiers] [use \"voice actor\" (P725) for voice-only role]\n```\n\n```csv\n# triples.csv\nsubject,predicate,object\n0,0,1\n0,1,2\n3,2,0\n```\n\n```csv\n# ground-truth.csv\nroot_entity,subject,predicate,object\n0,0,0,1\n3,3,2,0\n```\n\n### GraphML Example\n\nThe same graph can be represented in GraphML format, available in the [dataset details tabel](#Pre-generated-Datasets)\n\n```xml\n\u003c?xml version=\"1.0\" encoding=\"UTF-8\"?\u003e\n\u003cgraphml xmlns=\"http://graphml.graphdrawing.org/xmlns\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n         xsi:schemaLocation=\"http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd\"\u003e\n    \u003ckey id=\"d9\" for=\"edge\" attr.name=\"summary_for\" attr.type=\"string\"/\u003e\n    \u003ckey id=\"d8\" for=\"edge\" attr.name=\"predicate_desc\" attr.type=\"string\"/\u003e\n    \u003ckey id=\"d7\" for=\"edge\" attr.name=\"predicate_label\" attr.type=\"string\"/\u003e\n    \u003ckey id=\"d6\" for=\"edge\" attr.name=\"predicate\" attr.type=\"string\"/\u003e\n    \u003ckey id=\"d5\" for=\"node\" attr.name=\"category\" attr.type=\"string\"/\u003e\n    \u003ckey id=\"d4\" for=\"node\" attr.name=\"is_root\" attr.type=\"boolean\"/\u003e\n    \u003ckey id=\"d3\" for=\"node\" attr.name=\"wikidata_desc\" attr.type=\"string\"/\u003e\n    \u003ckey id=\"d2\" for=\"node\" attr.name=\"wikipedia_title\" attr.type=\"string\"/\u003e\n    \u003ckey id=\"d1\" for=\"node\" attr.name=\"wikipedia_id\" attr.type=\"long\"/\u003e\n    \u003ckey id=\"d0\" for=\"node\" attr.name=\"wikidata_label\" attr.type=\"string\"/\u003e\n    \u003cgraph edgedefault=\"directed\"\u003e\n        \u003cnode id=\"Q43416\"\u003e\n            \u003cdata key=\"d0\"\u003eKeanu Reeves\u003c/data\u003e\n            \u003cdata key=\"d1\"\u003e16603\u003c/data\u003e\n            \u003cdata key=\"d2\"\u003eKeanu_Reeves\u003c/data\u003e\n            \u003cdata key=\"d3\"\u003eCanadian actor (born 1964)\u003c/data\u003e\n            \u003cdata key=\"d4\"\u003eTrue\u003c/data\u003e\n            \u003cdata key=\"d5\"\u003eactor\u003c/data\u003e\n        \u003c/node\u003e\n        \u003cnode id=\"Q3820\"\u003e\n            \u003cdata key=\"d0\"\u003eBeirut\u003c/data\u003e\n            \u003cdata key=\"d1\"\u003e37428\u003c/data\u003e\n            \u003cdata key=\"d2\"\u003eBeirut\u003c/data\u003e\n            \u003cdata key=\"d3\"\u003ecapital and largest city of Lebanon\u003c/data\u003e\n        \u003c/node\u003e\n        \u003cnode id=\"Q639669\"\u003e\n            \u003cdata key=\"d0\"\u003emusician\u003c/data\u003e\n            \u003cdata key=\"d1\"\u003e38284\u003c/data\u003e\n            \u003cdata key=\"d2\"\u003eMusician\u003c/data\u003e\n            \u003cdata key=\"d3\"\u003eperson who composes, conducts or performs music\u003c/data\u003e\n        \u003c/node\u003e\n        \u003cnode id=\"Q219150\"\u003e\n            \u003cdata key=\"d0\"\u003eConstantine\u003c/data\u003e\n            \u003cdata key=\"d1\"\u003e1210303\u003c/data\u003e\n            \u003cdata key=\"d2\"\u003eConstantine_(film)\u003c/data\u003e\n            \u003cdata key=\"d3\"\u003e2005 film directed by Francis Lawrence\u003c/data\u003e\n        \u003c/node\u003e\n        \u003cedge source=\"Q43416\" target=\"Q3820\" id=\"P19\"\u003e\n            \u003cdata key=\"d6\"\u003eP19\u003c/data\u003e\n            \u003cdata key=\"d7\"\u003eplace of birth\u003c/data\u003e\n            \u003cdata key=\"d8\"\u003elocation where the subject was born\u003c/data\u003e\n            \u003cdata key=\"d9\"\u003eQ43416\u003c/data\u003e\n        \u003c/edge\u003e\n        \u003cedge source=\"Q43416\" target=\"Q639669\" id=\"P106\"\u003e\n            \u003cdata key=\"d6\"\u003eP106\u003c/data\u003e\n            \u003cdata key=\"d7\"\u003eoccupation\u003c/data\u003e\n            \u003cdata key=\"d8\"\u003eoccupation of a person; see also \"field of work\" (Property:P101), \"position held\"\n                (Property:P39)\n            \u003c/data\u003e\n        \u003c/edge\u003e\n        \u003cedge source=\"Q219150\" target=\"Q43416\" id=\"P106\"\u003e\n            \u003cdata key=\"d6\"\u003eP161\u003c/data\u003e\n            \u003cdata key=\"d7\"\u003ecast member\u003c/data\u003e\n            \u003cdata key=\"d8\"\u003eactor in the subject production [use \"character role\" (P453) and/or \"name of the character\n                role\" (P4633) as qualifiers] [use \"voice actor\" (P725) for voice-only role]\n            \u003c/data\u003e\n            \u003cdata key=\"d9\"\u003eQ43416\u003c/data\u003e\n        \u003c/edge\u003e\n    \u003c/graph\u003e\n\u003c/graphml\u003e\n```\n\n## Usage\n\n### Generate a New Dataset\n\nTo get started with this project, first clone this repository and install the necessary\ndependencies using Poetry.\n\n```bash\ngit clone https://github.com/yourusername/wiki-entity-summarization.git\ncd wiki-entity-summarization\ncurl -sSL https://install.python-poetry.org | python3 -\npoetry config virtualenvs.in-project true\npoetry install\npoetry shell\n\n# You can set the parameters via .env file instead of providing command line arguments.\ncp .env_sample .env\n\npython3 main.py [-h] [--min_valid_summary_edges MIN_VALID_SUMMARY_EDGES] \n                [--random_walk_depth_len RANDOM_WALK_DEPTH_LEN] [--bridges_number BRIDGES_NUMBER] \n                [--max_threads MAX_THREADS] [--output_path OUTPUT_PATH] [--db_name DB_NAME] [--db_user DB_USER] \n                [--db_password DB_PASSWORD] [--db_host DB_HOST] [--db_port DB_PORT] [--neo4j_user NEO4J_USER] \n                [--neo4j_password NEO4J_PASSWORD] [--neo4j_host NEO4J_HOST] [--neo4j_port NEO4J_PORT]\n                [dataset_name] [min_random_walk_number] [max_random_walk_number] [seed_node_ids] [categories]\n                \n        options:\n                -h, --help                Show this help message and exit\n                --min_valid_summary_edges Minimum number of valid summaries for a seed ndoe\n                --random_walk_depth_len   Depth length of random walks (number of nodes in each random walk)\n                --bridges_number          Number of connecting path bridges between components\n                --max_threads             Maximum number of threads\n                --output_path             Path to save output data\n                --db_name                 Database name\n                --db_user                 Database user\n                --db_password             Database password\n                --db_host                 Database host\n                --db_port                 Database port\n                --neo4j_user              Neo4j user\n                --neo4j_password          Neo4j password\n                --neo4j_host              Neo4j host\n                --neo4j_port              Neo4j port\n\n        Positional arguments:\n                dataset_name              The name of the dataset to process (required)\n                min_random_walk_number    Minimum number of random walks for each seed node (required)\n                max_random_walk_number    Maximum number of random walks for each seed node (required)\n                seed_node_ids             Seed node ids in comma-separated format (required)\n                categories                Seed node categories in comma-separated format (optional)\n\n```\n\n### Re-generate WikES Dataset\n\nTo re-construct our pre-generated datasets, you can use the following command:\n\n```bash \npython3 human_history_dataset.py\n```\n\n**This project uses our [pre-processor project](https://github.com/msorkhpar/wiki-entity-summarization-preprocessor)\ndatabases. Make sure you have loaded the data and run the databases properly.**\n\n## Citation\n\nIf you use this project in your research, please cite the following paper:\n\n```bibtex\n@misc{javadi2024wiki,\n    title = {Wiki Entity Summarization Benchmark},\n    author = {Saeedeh Javadi and Atefeh Moradan and Mohammad Sorkhpar and Klim Zaporojets and Davide Mottin and Ira Assent},\n    year = {2024},\n    eprint = {2406.08435},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.IR}\n}\n```\n\n## License\n\nThis project and its released datasets are licensed under the CC BY 4.0 License. See the [LICENSE](LICENSE)\nfile for details.\n\nIn the following, you can check other licenses that we used as external services, libraries, or software.\nBy using this project, you accept the third parties' licenses.\n\n1. Wikipedia:\n    - https://www.gnu.org/licenses/fdl-1.3.html\n    - https://creativecommons.org/licenses/by-sa/3.0/\n    - https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use\n2. Wikidata:\n    - https://creativecommons.org/publicdomain/zero/1.0/\n    - https://creativecommons.org/licenses/by-sa/3.0/\n3. Python:\n    - https://docs.python.org/3/license.html#psf-license\n    - https://docs.python.org/3/license.html#bsd0\n    - https://docs.python.org/3/license.html#otherlicenses\n4. DistilBERT:\n    - https://github.com/RayWilliam46/FineTune-DistilBERT/blob/main/LICENSE\n5. Networkx:\n    - https://github.com/networkx/nx-guides/blob/main/LICENSE\n6. Postgres:\n    - https://opensource.org/license/postgresql\n7. Neo4j:\n    - https://www.gnu.org/licenses/quick-guide-gplv3.html\n8. Docker:\n    - https://github.com/moby/moby/blob/master/LICENSE\n9. PyTorch:\n    - https://github.com/intel/torch/blob/master/LICENSE.md\n10. Scikit-learn:\n    - https://github.com/scikit-learn/scikit-learn/blob/main/COPYING\n11. Pandas:\n    - https://github.com/pandas-dev/pandas/blob/main/LICENSE\n12. Numpy:\n    - https://numpy.org/doc/stable/license.html\n13. Java-open:\n    - https://github.com/openjdk/jdk21/blob/master/LICENSE\n14. Spring framework:\n    - https://github.com/spring-projects/spring-boot/blob/main/LICENSE.txt\n15. Other libraries:\n    - https://github.com/tatuylonen/wikitextprocessor/blob/main/LICENSE\n    - https://github.com/aaronsw/html2text/blob/master/COPYING\n    - https://github.com/earwig/mwparserfromhell/blob/main/LICENSE\n    - https://github.com/more-itertools/more-itertools/blob/master/LICENSE\n    - https://github.com/siznax/wptools/blob/master/LICENSE\n    - https://github.com/tqdm/tqdm/blob/master/LICENCE\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsorkhpar%2Fwiki-entity-summarization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmsorkhpar%2Fwiki-entity-summarization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsorkhpar%2Fwiki-entity-summarization/lists"}