{"id":32398191,"url":"https://github.com/niyiyu/pnw-ml","last_synced_at":"2025-10-25T07:59:57.528Z","repository":{"id":44693467,"uuid":"470042054","full_name":"niyiyu/PNW-ML","owner":"niyiyu","description":"A ML-ready curated data set for a wide range of seismic signals from Pacific Northwest","archived":false,"fork":false,"pushed_at":"2025-06-03T21:20:16.000Z","size":49812,"stargazers_count":20,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-09-05T04:39:50.178Z","etag":null,"topics":["dataset","machine-learning","seismology"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/niyiyu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"citation.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-03-15T07:01:48.000Z","updated_at":"2025-08-21T11:28:21.000Z","dependencies_parsed_at":"2025-04-13T19:58:23.289Z","dependency_job_id":"5bc36d1c-f7ea-4a7a-b562-91e7723a8dbf","html_url":"https://github.com/niyiyu/PNW-ML","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/niyiyu/PNW-ML","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niyiyu%2FPNW-ML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niyiyu%2FPNW-ML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niyiyu%2FPNW-ML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niyiyu%2FPNW-ML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/niyiyu","download_url":"https://codeload.github.com/niyiyu/PNW-ML/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niyiyu%2FPNW-ML/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280923460,"owners_count":26414236,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-25T02:00:06.499Z","response_time":81,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","machine-learning","seismology"],"created_at":"2025-10-25T07:57:50.023Z","updated_at":"2025-10-25T07:59:57.513Z","avatar_url":"https://github.com/niyiyu.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pacific Northwest Curated Seismic Dataset\n[![DOI](https://zenodo.org/badge/470042054.svg)](https://zenodo.org/badge/latestdoi/470042054) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n## A curated dataset for a wide range of sources from the Pacific Northwest\n\n![map](./figures/README_overview.png)\n\n## Overview\nEach dataset is made by two files: waveform (in HDF5 format) and metadata (in CSV format). All follow the structure of [seisbench format](https://seisbench.readthedocs.io/en/latest/). See [here](https://seisbench.readthedocs.io/en/latest/pages/data_format.html) to learn more about the file structure.\n\n## Datasets\nWe are hosting two copies of the dataset: one on Google Drive, another on UW-ESS server. All datasets are also available through [SeisBench](https://seisbench.readthedocs.io/en/latest/pages/benchmark_datasets.html#pnw).\n\n### 1. ComCat Events\n- EH, BH, and HH channel (velocity)\n  - waveform (62.7 GB): [[GDrive](https://drive.google.com/file/d/10UCLyJSRibvhon9CuUTfns3fObNFKDer/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/comcat_waveforms.hdf5)]\n  - metadata (50.4 MB): [[GDrive](https://drive.google.com/file/d/1bKDITx8KiDGZUaUoWQSZilpo7GhdWxKv/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/comcat_metadata.csv)]\n\n- EN (accelerometer)\n  - waveform (2.1 GB): [[GDrive](https://drive.google.com/file/d/1I16psU3YJ7CFFNWZiaAGPlw1M3BmvuT8/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/accelerometer_waveforms.hdf5)]\n  - metadata (1.7 MB): [[GDrive](https://drive.google.com/file/d/1xpeaoC3NsZqyICIbNHF2J46WsfZwwF6K/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/accelerometer_metadata.csv)]\n\n### 2. Noise Waveform (EH, BH, and HH)\n  - waveform (~18 GB): [[GDrive](https://drive.google.com/file/d/1Z55WTcoyy-bR-WwWbedlZJrSo6tkRLlJ/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/noise_waveforms.hdf5)]\n  - metadata (4.9 MB): [[GDrive](https://drive.google.com/file/d/1Ou5AKRczEqnNRsSEUSafIRlGcXTvLLUW/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/noise_metadata.csv)]\n  \n### 3. Exotic Events (EH, BH, and HH)\n  - waveform (3.9 GB): [[GDrive](https://drive.google.com/file/d/1pxGQnLnAwXf9Zhc8xfh1HXEOsXjga2sG/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/exotic_waveforms.hdf5)]\n  - metadata (1.4 MB): [[GDrive](https://drive.google.com/file/d/1brCZkrKjRtToLxBX5ob7qHX6EBq00nAM/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/exotic_metadata.csv)]\n\n### 4. Northern California Sequence (December 2022)\n  - waveform (346 MB): [[GDrive](https://drive.google.com/file/d/15UxIbxacloPlY2DUTDBEnBaMYvh2eXVI/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/norcal_waveforms.hdf5)]\n  - metadata (126 KB): [[GDrive](https://drive.google.com/file/d/1BhLVODzlu407JDZ0OteoPgZlTE-o469O/view?usp=sharing)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/norcal_metadata.csv)]\n\n### 5. ML-enhanced catalog\n  - CSV (~93 MB): [[GDrive](https://drive.google.com/file/d/16qUT_3-duVuKwfmPmvtH5EifL4eeyRvv/view?usp=sharing)] \n\n## Access\n### Quick tour to the dataset\nHere are several ways to use the PNW dataset.\n\n1. Jupyter Notebook\n   \n  A jupyter notebook is available to load and plot PNW dataset at [here](./notebooks/inspect_pnw_dataset.ipynb). Download and run it on a local machine to enable the interactive plotting (e.g., zoom in/out for checking the picks).\n\n2. A notebook is available [here](./notebooks/curated_pnw_dataset_seisbench.ipynb) on accessing the dataset with SeisBench APIs.\n\n3. Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/17Qu54ZI_HxJjIgLgo9K18-vwpXWoIeYM?usp=sharing)\n\n  If you are more familiar with Google Colab, go to the link above. Note that interactive plotting is not available on Colab.\n\n### Demo sets\n1. A micro version of the dataset, which contains 10 earthquake streams, 10 explosion streams, 10 sonic boom streams, 10 thunder streams, and 10 surface event streams. See [data/microPNW](https://github.com/niyiyu/PNW-ML/tree/main/data/microPNW).\n   \n2. A mini version of the dataset, which contains 500 earthquake streams, 500 explosion streams, 500 surface event streams, 126 sonic boom streams, and 94 thunder quake streams.\n  - waveform (640 MB): [[Google Drive](https://drive.google.com/file/d/1Yq6n8R0sb338OaT0KTwW2XFDb9x_LG6g/view?usp=share_link)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/miniPNW_waveforms.hdf5)]\n  - metadata (424 KB): [[Google Drive](https://drive.google.com/file/d/1Y0nK6ObBVABuoTopaWRNg2lPqjQsXa7e/view?usp=share_link)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/miniPNW_metadata.csv)]\n\n3. A meso version with 10% of the full ComCat dataset (only earthquake + explosion).\n  - waveform (6.3 GB): [[Google Drive](https://drive.google.com/file/d/1SrbiQpBpU6mPq5Un_lJpPcfBekwczLzp/view?usp=share_link)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/mesoPNW_waveforms.hdf5)]\n  - metadata (4.7 MB): [[Google Drive](https://drive.google.com/file/d/1HK2AuWPQj3dCdKShYcJ7a5E577XASrab/view?usp=share_link)] | [[UW-ESS](https://dasway.ess.washington.edu/shared/niyiyu/PNW-ML/mesoPNW_metadata.csv)]\n\n## Metadata\n| Attribute | Description | Example |\n| ----------- | ----------- |-------|\n| event_id | Event identifier | uw10564613 |\n| source_origin_time | Source origin time in UTC | 2002-10-03T01:56:49.530000Z |\n| source_latitude_deg | - | 48.553 |\n| source_longitude_deg | - | -122.52 |\n| source_type | - | earthquake |\n| source_type_pnsn_label | PNSN AQMS event type | eq |\n| source_depth_km | - | 14.907 |\n| source_magnitude_preferred | - | 2.1 |\n| source_magnitude_type_preferred | - | Md |\n| source_magnitude_uncertainty_preferred | - | 0.03 |\n| source_local/duration/hand_magnitude | Ml, Md, and Mh if available | 1.32 |\n| source_local/duration_magnitude_uncertainty | magnitude uncertainty if available | 0.15 |\n| source_depth_uncertainty_km | - | 1.69 |\n| source_horizontal_uncertainty_km | - |0.694 |\n| station_network_code | FDSN network code | UW |\n| station_code | FDSN station code | GNW |\n| station_location_code | FDSN location code | 01 |\n| station_channel_code | FDSN channel code (first two digits) | BH |\n| station_latitude_deg | - | 47.5641 |\n| station_longitude_deg | - | -122.825 |\n| station_elevation_m | - | 220.0 |\n| trace_name | Bucket and array index | bucket1\\$0,:3:15001 |\n| trace_sampling_rate_hz | All traces resampled to 100 Hz | 100 |\n| trace_start_time |  Trace start time in UTC | 2002-10-03T01:55:59.530000Z |\n| trace_P/S_arrival_sample | Closest sample index of arrival  | 8097 |\n| trace_P/S_arrival_uncertainty_s | Picking uncertainty in second |  0.02 |\n| trace_P/S_onset | - |  emergent |\n| trace_P_polarity | P-wave onset polarity | positive, negative, or undecidable |\n| trace_has_offset | Any visible offset in the trace | 1 |\n| trace_missing_channel | Number of missing channel of the trace | 2 |\n| trace_snr_db | SNR for each component |  6.135|3.065|11.766 |\n\n## Reference\nNi, Y., Hutko, A., Skene, F., Denolle, M., Malone, S., Bodin, P., Hartog, R., \u0026 Wright, A. (2023). Curated Pacific Northwest AI-ready Seismic Dataset. *Seismica*, 2(1). https://doi.org/10.26443/seismica.v2i1.368\n\nBiBTex:\n```bibtex\n@article{ni2023pnw, \n  title={Curated Pacific Northwest AI-ready Seismic Dataset}, \n  volume={2}, \n  url={https://seismica.library.mcgill.ca/article/view/368}, \n  number={1}, \n  journal={Seismica}, \n  author={Ni, Yiyu and Hutko, Alexander and Skene, Francesca and Denolle, Marine and Malone, Stephen and Bodin, Paul and Hartog, Renate and Wright, Amy}, \n  year={2023}, \n  month={05},\n  doi={10.26443/seismica.v2i1.368}\n}\n```\n\n## Known issues\n* [August 2023] Very few events (~15) in the ComCat dataset may have inconsistent `event_type_pnsn_label` and `event_type`. This issue comes from the outdated ComCat event metadata. Please prioritize PNSN label when such inconsistent occurs.\n* [June 2025] The `trace_start_time` field in the exotic metadata was delayed by 50 seconds. The metadata has now been corrected for all affected files.\n\n## Report bugs\nIf you find any issue in the dataset, please report through [GitHub Issue](https://github.com/niyiyu/PNW-ML/issues) or [Email](mailto:niyiyu@uw.edu). \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniyiyu%2Fpnw-ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fniyiyu%2Fpnw-ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniyiyu%2Fpnw-ml/lists"}