{"id":34729316,"url":"https://github.com/felsenhower/top500-dataloader","last_synced_at":"2026-05-25T06:31:34.247Z","repository":{"id":318914293,"uuid":"1072182734","full_name":"felsenhower/top500-dataloader","owner":"felsenhower","description":"TOP500 scraper / downloader / dataloader","archived":false,"fork":false,"pushed_at":"2025-10-15T15:28:33.000Z","size":124,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-16T12:17:56.542Z","etag":null,"topics":["excel","package","polars","python","scraper","top500","xml"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/felsenhower.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-08T11:22:38.000Z","updated_at":"2025-10-15T15:28:36.000Z","dependencies_parsed_at":"2025-10-17T06:42:14.570Z","dependency_job_id":null,"html_url":"https://github.com/felsenhower/top500-dataloader","commit_stats":null,"previous_names":["felsenhower/top500-dataloader"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/felsenhower/top500-dataloader","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felsenhower%2Ftop500-dataloader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felsenhower%2Ftop500-dataloader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felsenhower%2Ftop500-dataloader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felsenhower%2Ftop500-dataloader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/felsenhower","download_url":"https://codeload.github.com/felsenhower/top500-dataloader/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/felsenhower%2Ftop500-dataloader/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33462836,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-25T06:15:14.662Z","status":"ssl_error","status_checked_at":"2026-05-25T06:14:31.284Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["excel","package","polars","python","scraper","top500","xml"],"created_at":"2025-12-25T02:55:30.507Z","updated_at":"2026-05-25T06:31:34.242Z","avatar_url":"https://github.com/felsenhower.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# top500-dataloader\n\nThis repository contains a scraper / downloader / dataloader for the [TOP500](https://top500.org/) website.\n\n## Usage\n\n⚠️ Since the module isn't on PyPI, I will use `uv` and `uvx` in all examples, since it works quite well with packages from git.\n\n### As an Executable\n\nWhen installed, you can invoke the CLI via\n\n```shell\n$ python -m top500 --help\nusage: top500 [-h] [-d dir] {list-online,list-local,download,download-all,display} ...\n\nDownload or view TOP500 lists.\n\npositional arguments:\n  {list-online,list-local,download,download-all,display}\n    list-online         List TOP500 list issues that are available online.\n    list-local          List TOP500 list issues that are available locally.\n    download            Download a TOP500 list issue (see \"download --help\" for more info).\n    download-all        Download all TOP500 list issues that are available online.\n    display             Display a TOP500 list on the console (see \"display --help\" for more info).\n\noptions:\n  -h, --help            show this help message and exit\n  -d, --download-dir dir\n                        Set the download dir. Defaults to \"/home/ruben/.local/share/top500\".\n```\n\nYou can also do the same thing like this with `uvx`:\n```shell\n$ uvx git+https://github.com/felsenhower/top500-dataloader.git --help\n```\n\nIf you simply want to download all TOP500 issues to `~/.local/share/top500`, you can do this:\n```shell\n$ uvx git+https://github.com/felsenhower/top500-dataloader.git download-all\n```\n\nTo get a nice tabular view of the available lists online:\n```shell\n$ uvx git+https://github.com/felsenhower/top500-dataloader.git list-online\nFetching https://top500.org/lists/top500/...\nshape: (65, 6)\n┌─────────┬───────────────┬────────┬──────────────┬───────────────────────────┬─────────────────────────────────────────┐\n│ key     ┆ title         ┆ number ┆ published_on ┆ published_at              ┆ url                                     │\n│ ---     ┆ ---           ┆ ---    ┆ ---          ┆ ---                       ┆ ---                                     │\n│ str     ┆ str           ┆ i64    ┆ date         ┆ str                       ┆ object                                  │\n╞═════════╪═══════════════╪════════╪══════════════╪═══════════════════════════╪═════════════════════════════════════════╡\n│ 2025-06 ┆ June 2025     ┆ 65     ┆ 2025-06-14   ┆ Hamburg, Germany          ┆ https://top500.org/lists/top500/2025/06 │\n│ 2024-11 ┆ November 2024 ┆ 64     ┆ 2024-11-19   ┆ Atlanta, GA, USA          ┆ https://top500.org/lists/top500/2024/11 │\n│ 2024-06 ┆ June 2024     ┆ 63     ┆ 2024-06-01   ┆ Hamburg, Germany          ┆ https://top500.org/lists/top500/2024/06 │\n│ 2023-11 ┆ November 2023 ┆ 62     ┆ 2023-11-14   ┆ Denver, CO, USA           ┆ https://top500.org/lists/top500/2023/11 │\n│ 2023-06 ┆ June 2023     ┆ 61     ┆ 2023-06-01   ┆ Hamburg, Germany          ┆ https://top500.org/lists/top500/2023/06 │\n│ 2022-11 ┆ November 2022 ┆ 60     ┆ 2022-11-15   ┆ Dallas, TX, USA           ┆ https://top500.org/lists/top500/2022/11 │\n[...]\n```\n\nThe `key` may be used to get a glimpse of a list like this:\n```\n$ uvx git+https://github.com/felsenhower/top500-dataloader.git display 2025-06\nshape: (500, 7)\n┌──────┬─────────────────────────────────┬──────────────────────┬─────────────────────────────────┬────────────────┬─────────────────┬────────────┐\n│ Rank ┆ System Name                     ┆ Country              ┆ Manufacturer                    ┆ Rmax [GFlop/s] ┆ Rpeak [GFlop/s] ┆ Power [kW] │\n│ ---  ┆ ---                             ┆ ---                  ┆ ---                             ┆ ---            ┆ ---             ┆ ---        │\n│ i64  ┆ str                             ┆ str                  ┆ str                             ┆ f64            ┆ f64             ┆ f64        │\n╞══════╪═════════════════════════════════╪══════════════════════╪═════════════════════════════════╪════════════════╪═════════════════╪════════════╡\n│ 1    ┆ El Capitan                      ┆ United States        ┆ HPE                             ┆ 1.7420e9       ┆ 2.7464e9        ┆ 29581.0    │\n│ 2    ┆ Frontier                        ┆ United States        ┆ HPE                             ┆ 1.3530e9       ┆ 2.0557e9        ┆ 24607.0    │\n│ 3    ┆ Aurora                          ┆ United States        ┆ Intel                           ┆ 1.0120e9       ┆ 1.9800e9        ┆ 38698.4    │\n│ 4    ┆ JUPITER Booster                 ┆ Germany              ┆ EVIDEN                          ┆ 7.934e8        ┆ 9.3e8           ┆ 13088.2    │\n│ 5    ┆ Eagle                           ┆ United States        ┆ Microsoft Azure                 ┆ 5.612e8        ┆ 8.468352e8      ┆ null       │\n│ 6    ┆ HPC6                            ┆ Italy                ┆ HPE                             ┆ 4.779e8        ┆ 6.0696576e8     ┆ 8460.9     │\n│ 7    ┆ Supercomputer Fugaku            ┆ Japan                ┆ Fujitsu                         ┆ 4.4201e8       ┆ 5.37212e8       ┆ 29899.2    │\n│ 8    ┆ Alps                            ┆ Switzerland          ┆ HPE                             ┆ 4.349e8        ┆ 5.7484128e8     ┆ 7124.0     │\n│ 9    ┆ LUMI                            ┆ Finland              ┆ HPE                             ┆ 3.797e8        ┆ 5.3150515e8     ┆ 7106.8     │\n[...]\n```\n\n### As a Python Module\n\n```python\nimport top500\n\nfor list_info in top500.iter_lists_online():\n    df = top500.read_list(list_info)\n    fastest_computer = df[\"name\"][0]\n    if fastest_computer is None:\n        continue\n    print(f\"In {list_info.title}, the fastest computer was {fastest_computer}.\")\n```\n\nThe module exports these functions (see [`__init__.py`](https://github.com/felsenhower/top500-dataloader/blob/main/src/top500/__init__.py) for their docstrings):\n\n```python\ndef set_download_dir(download_dir: str | os.PathLike) -\u003e None:\ndef get_download_dir() -\u003e Path:\ndef iter_lists_online(newest_first: bool = True) -\u003e Iterator[Top500ListInfo]:\ndef iter_lists_local(newest_first: bool = True) -\u003e Iterator[Top500ListInfo]:\ndef download_list(list_info_or_key: str | Top500ListInfo) -\u003e None:\ndef download_all_lists() -\u003e None:\ndef read_list(list_info_or_key: str | Top500ListInfo, allow_download: bool = True, source: str = \"normalized\") -\u003e pl.DataFrame:\n```\n\nSome Python examples are located in the [examples](examples) directory.\n\nThe `read_list` function returns a `polars.DataFrame` for the TOP500 list you request.\nYou can use either the key as a `str` or a `Top500ListInfo` object (but in the first case, the TOP500 overview page may be visited).\nIf a list is not downloaded yet, it can be automatically downloaded, unless `allow_download` is set to `False`.\nThe `source` argument can be `excel`, `xml`, `normalized` or `normalized-pretty`.\n- `excel` will give you the data like in the Excel file (the columns are not stable).\n- `xml` will give you the data like in the XML file (the columns are not stable).\n- `normalized` will give you a merge of `excel` and `xml` with stable and sane columns.\n- `normalized-pretty` is like `normalized`, but with prettier column names (similar to `excel`).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffelsenhower%2Ftop500-dataloader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffelsenhower%2Ftop500-dataloader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffelsenhower%2Ftop500-dataloader/lists"}