{"id":13615897,"url":"https://github.com/ESA-PhiLab/Major-TOM","last_synced_at":"2025-04-13T21:31:54.444Z","repository":{"id":225865816,"uuid":"764083955","full_name":"ESA-PhiLab/Major-TOM","owner":"ESA-PhiLab","description":"Expandable Datasets for Earth Observation","archived":false,"fork":false,"pushed_at":"2025-01-16T08:57:04.000Z","size":9245,"stargazers_count":162,"open_issues_count":6,"forks_count":13,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-01-16T10:11:50.665Z","etag":null,"topics":["dataset","earth-observation","multi-spectral","remote-sensing","sentinel-1","sentinel-2"],"latest_commit_sha":null,"homepage":"https://huggingface.co/Major-TOM","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ESA-PhiLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-27T13:00:47.000Z","updated_at":"2025-01-16T08:57:05.000Z","dependencies_parsed_at":"2024-03-07T11:40:37.259Z","dependency_job_id":"a01db7e9-cb0a-4107-9b45-b9b3019a583b","html_url":"https://github.com/ESA-PhiLab/Major-TOM","commit_stats":null,"previous_names":["esa-philab/major-tom"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ESA-PhiLab%2FMajor-TOM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ESA-PhiLab%2FMajor-TOM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ESA-PhiLab%2FMajor-TOM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ESA-PhiLab%2FMajor-TOM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ESA-PhiLab","download_url":"https://codeload.github.com/ESA-PhiLab/Major-TOM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248786504,"owners_count":21161457,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","earth-observation","multi-spectral","remote-sensing","sentinel-1","sentinel-2"],"created_at":"2024-08-01T20:01:20.059Z","updated_at":"2025-04-13T21:31:49.428Z","avatar_url":"https://github.com/ESA-PhiLab.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"\n![image/png](https://cdn-uploads.huggingface.co/production/uploads/6304c06eeb6d777a838eab63/BJKsLwX0GG4W3-gdf40TJ.png)\n\n# 🗺️ Major TOM: Expandable Datasets for Earth Observation\n[![HF](https://img.shields.io/badge/%F0%9F%A4%97-Datasets-yellow)](https://www.huggingface.co/Major-TOM) [![HF](https://img.shields.io/badge/%F0%9F%A4%97-Spaces_Demo-yellow)](https://huggingface.co/spaces/Major-TOM/MajorTOM-Core-Viewer) [![paper](https://img.shields.io/badge/arXiv-2402.12095-D12424)](https://www.arxiv.org/abs/2402.12095) \u003ca href=\"https://colab.research.google.com/github/ESA-PhiLab/Major-TOM/blob/main/03-Filtering-in-Colab.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e \u003ca href='https://www.youtube.com/watch?v=KonWxQ1mCpA'\u003e\u003cimg src='https://img.shields.io/badge/YouTube-%23FF0000.svg?style=flat\u0026logo=youtube\u0026logoColor=white' /\u003e\u003c/a\u003e \u003ca href='https://open.spotify.com/episode/6R3Rsi3iMwvuumlqFKJbIM'\u003e\u003cimg src='https://img.shields.io/badge/Spotify-1ED760?\u0026logo=spotify\u0026logoColor=white' /\u003e\u003c/a\u003e\n\n\nA standard for curating large-scale (Terabyte-scale) EO datasets.\n\nThis repo currently provides some basic functionality and examples for interacting with Major TOM datasets. This will grow and change as more datasets are created.\n\n## 📊 Available Datasets\n| Dataset    | Modality            | Number of Patches | Sensing Type | Comments |\n|------------|---------------------|-------------------|--------------|----------|\n| [Core-S2L2A](https://huggingface.co/datasets/Major-TOM/Core-S2L2A) | Sentinel-2 Level 2A | 2,245,886 | Multi-Spectral | General-Purpose Global (about 23 TB) |\n| [Core-S2L1C](https://huggingface.co/datasets/Major-TOM/Core-S2L1C) | Sentinel-2 Level 1C | 2,245,886 | Multi-Spectral | General-Purpose Global (about 23 TB) |\n| [Core-S1RTC](https://huggingface.co/datasets/Major-TOM/Core-S1RTC) | Sentinel-1 RTC | 1,469,955 | SAR | General-Purpose Global (about 16 TB) |\n| [Core-DEM](https://huggingface.co/datasets/Major-TOM/Core-DEM) | Copernicus DEM 30 | 1,837,843 | Digital Surface Model (DSM) | General-Purpose Global (about 1 TB) |\n\n## 🔭 Demo\nYou can view samples from the `MajorTOM-Core` dataset instantly in your browser here:\nhttps://huggingface.co/spaces/Major-TOM/MajorTOM-Core-Viewer\n[![huggingface co_spaces_Major-TOM_MajorTOM-Core-Viewer (1)](https://github.com/ESA-PhiLab/Major-TOM/assets/13435425/6a82c999-23f6-42e7-ad12-1e86869d17a4)](https://huggingface.co/spaces/Major-TOM/MajorTOM-Core-Viewer)\n\n## 🌐 Generate uniform grid across the globe\nIn case 10 km grid is not appropriate, you can easily define Major TOM grids at different scales by using [grid.py]([url](https://github.com/ESA-PhiLab/Major-TOM/blob/main/src/grid.py)) script to generate a uniform grid accross the globe:\n\n```\ndist = 100 # sets distance in km\ngrid = Grid(dist) # returns grid points coordinates\n```\n\n## 📌 Open Access Manuscript\nThis project has been outlined in [https://arxiv.org/abs/2402.12095/](https://arxiv.org/abs/2402.12095/).\n\u003cdetails\u003e\n\u003csummary\u003eRead Abstract\u003c/summary\u003e\n  \n  \u003e Deep learning models are increasingly data-hungry, requiring significant resources to collect and compile the datasets needed to train them, with Earth Observation (EO) models being no exception. However, the landscape of datasets in EO is relatively atomised, with interoperability made difficult by diverse formats and data structures. If ever larger datasets are to be built, and duplication of effort minimised, then a shared framework that allows users to combine and access multiple datasets is needed. Here, Major TOM (Terrestrial Observation Metaset) is proposed as this extensible framework. Primarily, it consists of a geographical indexing system based on a set of grid points and a metadata structure that allows multiple datasets with different sources to be merged. Besides the specification of Major TOM as a framework, this work also presents a large, open-access dataset, MajorTOM-Core, which covers the vast majority of the Earth's land surface. This dataset provides the community with both an immediately useful resource, as well as acting as a template for future additions to the Major TOM ecosystem.\n\u003c/details\u003e\n\n![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6304c06eeb6d777a838eab63/dfw2xVubATDEGj9--4i1D.jpeg)\n\n### If you found this useful for your research, please cite accordingly as:\n```latex\n@inproceedings{Major_TOM,\n  title={Major TOM: Expandable Datasets for Earth Observation}, \n  author={Alistair Francis and Mikolaj Czerkawski},\n  year={2024},\n  eprint={2402.12095},\n  archivePrefix={arXiv},\n  primaryClass={cs.CV}\n}\n```\n\nPowered by [Φ-lab, European Space Agency (ESA) 🛰️](https://huggingface.co/ESA-philab)\n\n---\n## FAQ\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eIs Major TOM just another EO dataset?\u003c/b\u003e\u003c/summary\u003e\n\n  Almost. Major TOM is **not a dataset**, but a project aiming to standardize some of the future EO datasets. As an example of what such a dataset could be like, **MajorTOM-Core** is released as a nearly global dataset of Sentinel-2 data.\n  \n  Scroll up to the 📊 **Available Datasets** section of this file to see the list of current datasets.\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eWho is going to contribute to upcoming Major TOM datasets?\u003c/b\u003e\u003c/summary\u003e\n  \n  Anyone can contribute. The original authors of the Major TOM paper are already working on a few other datasets that will join the Major TOM initiative.\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eCan I join Major TOM organisation on HuggingFace?\u003c/b\u003e\u003c/summary\u003e\n  \n  Anyone can join the organisation with reading rights. In order to gain contributor rights, you will need to contact one of the admins and verify who you are and how you would like to contribute (you should be allowed to contribute with any dataset that follows Major TOM standard).\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FESA-PhiLab%2FMajor-TOM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FESA-PhiLab%2FMajor-TOM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FESA-PhiLab%2FMajor-TOM/lists"}