{"id":18354481,"url":"https://github.com/alvarofpp/network-from-wikipedia","last_synced_at":"2025-04-10T01:19:15.856Z","repository":{"id":132916950,"uuid":"398869016","full_name":"alvarofpp/network-from-wikipedia","owner":"alvarofpp","description":"Script to constructing a network from Wikipedia pages.","archived":false,"fork":false,"pushed_at":"2024-03-08T00:13:03.000Z","size":164,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-05T12:36:40.651Z","etag":null,"topics":["dataset","graphml","networkx","networkx-graph","python3","wikipedia"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alvarofpp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-22T18:18:33.000Z","updated_at":"2024-09-03T14:25:28.000Z","dependencies_parsed_at":"2025-02-15T16:38:07.414Z","dependency_job_id":"55574023-c8ca-4345-8499-c16f657aa82b","html_url":"https://github.com/alvarofpp/network-from-wikipedia","commit_stats":null,"previous_names":["alvarofpp/network-from-wikipedia"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvarofpp%2Fnetwork-from-wikipedia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvarofpp%2Fnetwork-from-wikipedia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvarofpp%2Fnetwork-from-wikipedia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvarofpp%2Fnetwork-from-wikipedia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alvarofpp","download_url":"https://codeload.github.com/alvarofpp/network-from-wikipedia/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137937,"owners_count":21053784,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","graphml","networkx","networkx-graph","python3","wikipedia"],"created_at":"2024-11-05T22:04:14.074Z","updated_at":"2025-04-10T01:19:15.836Z","avatar_url":"https://github.com/alvarofpp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Network from Wikipedia\n\n`to_graphml.py` may be used to constructing a network from wikipedia pages.\nThis script is based on Colab from [ivanovitchm/network_analysis](https://github.com/ivanovitchm/network_analysis)\n(Week 07 Directed networks: case study of Wikipedia pages).\nOutput file example [here](output.graphml).\n\nThe snowballing process will be initialized from your `source` argument.\n\n\u003e \"When you start the snowballing, you will eventually (and quite soon) bump\n  into the pages describing ISBN and ISSN numbers, the arXiv, PubMed, and the\n  like. Almost all other Wikipedia pages refer to one or more of those pages.\n  This hyper-connectedness transforms any network into a collection of almost\n  perfect gigantic stars, making all Wikipedia-based networks look similar. To\n  avoid the stardom syndrome, treat the known 'star' pages as stop words in\n  information retrieval—in other words, ignore any links to them.\n  Constructing the black list of stop words, STOPS, is a matter of trial and\n  error. We put thirteen subjects on it; you may want to add more when you\n  come across other “stars.” We also excluded pages whose names begin with\n  \"List of\", because they are simply lists of other\n  subjects.\" - Ivanovitch's jupyter notebook\n\nYou can change the `STOPS` words in [`constants.py`](https://github.com/alvarofpp/dataset-network-from-wikipedia/blob/main/utils/constants.py#L4).\n\n## Examples of use\n\n- [Javier Barriuso (Barri) - Graphs of the different leaders of each political party in Argentina.](https://x.com/BarriPdmx/status/1437720971746631680)\n\n## Requirements\n\nThe script is written in Python. Dependant packages can be installed via:\n\n```shell\npip install -r requirements.txt\n```\n\n## How to run\n\nRun `to_graphml.py` providing:\n\n\u003c!-- markdownlint-disable MD013 --\u003e\n\n| Argument | Required | Description | Default value |\n| -------- | -------- | ----------- | ------------- |\n| `-s` \u003cbr/\u003e `--source` | Yes | Url or title from Wikipedia. | - |\n| `-d` \u003cbr/\u003e `--degree` | No | Number of degree that will be used in the filter of nodes. Equal to or greater than this value. | `2` |\n| `-l` \u003cbr/\u003e `--layers` | No | Number of search layers. | `2` |\n| `-o` \u003cbr/\u003e `--output` | No | Output filename. | `'output'` |\n| `-v` \u003cbr/\u003e `--verbose` | No | Increase output verbosity. | `False` |\n\n\u003c!-- markdownlint-enable MD013 --\u003e\n\n### Examples\n\nBasic usage:\n\n```shell\npython3 to_graphml.py --source='Complex network'\n# Or (tested only on `en.wikipedia`)\npython3 to_graphml.py --source='https://en.wikipedia.org/wiki/Complex_network'\n```\n\nVerbose mode:\n\n```shell\npython3 to_graphml.py --source='Complex network' --verbose\n```\n\nSearch deeper, filtering by more degree and changing the output file:\n\n```shell\npython3 to_graphml.py --source='Complex network' --layers=5 --degree=10 --output=graph_5l_10d\n# The output file is `graph_5l_10d.graphml`\n```\n\n## Contributing\n\nContributions are more than welcome. Fork, improve and make a pull request.\nFor bugs, ideas for improvement or other, please create an [issue](https://github.com/alvarofpp/dataset-network-from-wikipedia/issues).\n\n## License\n\nThis project is licensed under the GNU Affero General Public License - see\nthe [LICENSE.md](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falvarofpp%2Fnetwork-from-wikipedia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falvarofpp%2Fnetwork-from-wikipedia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falvarofpp%2Fnetwork-from-wikipedia/lists"}