{"id":13739085,"url":"https://github.com/smousavi05/STEAD","last_synced_at":"2025-05-08T18:31:49.610Z","repository":{"id":50672350,"uuid":"180492532","full_name":"smousavi05/STEAD","owner":"smousavi05","description":"STanford EArthquake Dataset (STEAD):A Global Data Set of Seismic Signals for AI","archived":false,"fork":false,"pushed_at":"2023-07-07T23:46:55.000Z","size":4820,"stargazers_count":282,"open_issues_count":2,"forks_count":67,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-11-15T09:42:40.719Z","etag":null,"topics":["dataset","deep-learning","earthquake","stanford","stead"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smousavi05.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-04-10T03:13:28.000Z","updated_at":"2024-11-13T07:22:53.000Z","dependencies_parsed_at":"2022-08-31T20:02:00.572Z","dependency_job_id":"8425f008-4ed7-44df-a8ea-2452e9ee311e","html_url":"https://github.com/smousavi05/STEAD","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smousavi05%2FSTEAD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smousavi05%2FSTEAD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smousavi05%2FSTEAD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smousavi05%2FSTEAD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smousavi05","download_url":"https://codeload.github.com/smousavi05/STEAD/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253127091,"owners_count":21858191,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","deep-learning","earthquake","stanford","stead"],"created_at":"2024-08-03T04:00:24.775Z","updated_at":"2025-05-08T18:31:47.925Z","avatar_url":"https://github.com/smousavi05.png","language":"Python","funding_links":[],"categories":["Training Data","Seismic data access"],"sub_categories":[],"readme":"#  STanford EArthquake Dataset (STEAD):A Global Data Set of Seismic Signals for AI                                                                                           \n\n![GitHub last commit](https://img.shields.io/github/last-commit/smousavi05/STEAD?style=for-the-badge) \n![GitHub forks](https://img.shields.io/github/forks/smousavi05/STEAD?style=for-the-badge)\n![GitHub stars](https://img.shields.io/github/stars/smousavi05/STEAD?style=for-the-badge)\n![GitHub watchers](https://img.shields.io/github/watchers/smousavi05/STEAD?style=for-the-badge)\n![Twitter Follow](https://img.shields.io/twitter/follow/smousavi05?style=social)\n\n-----------------------------------------\n\n![map](map2.png)\n\n![map](stations2.png)\n\nhttps://www.youtube.com/watch?v=Nn8KJFJu-V0\n\n----------------------------------------- \n## Table of Contents: \n* [Downloading the dataset](#You_can_get_the_wavefoms_from_here) \n* [Description of the dataset](#You_can_get_the_paper_from_here)                                \n* [How to access the earthquake waveforms](#Example_of_data_selection_and_accessing_(earthquake_waveforms)) \n* [How to access the noise waveforms](#Example_of_data_selection_and_accessing_(noise_waveforms))                   \n* [How to convert raw waveforms into acceleration, velocity, or displacement](#How_to_convert_raw_waveforms_into_Acceleration,_Velocity,_or_Displacement)  \n* [Studies that used STEAD](#These_are_some_of_the_studies_that_used_STEAD.)            \n-----------\n## Note:\n#### Please note that some of the back azimuths in the current version have been misplaced. If you plan to use back azimuth labels you can recalculate it based on station and event location. Here is code to do so using Obspy:\n\n```python\ndistance_m, azimuth, back_azimuth = obspy.geodetics.base.gps2dist_azimuth(\n                                                                        float(event_lat), \n                                                                        float(event_lon),\n                                                                        float(station_lat), \n                                                                        float(station_lon), \n                                                                        a=6378137.0, \n                                                                        f=0.0033528106647474805)\n```            \n\n### You can get the wavefoms from here: \n\n##### Each of the following files contains one hdf5 (data) and one CSV (metadata) files for ~ 200k 3C waveforms. You can download the chunks you need and then merge them into a single file using the provided code in the repository.\n\nhttps://rebrand.ly/chunk1  (chunk1 ~ 14.6 GB) Noise\n\nhttps://rebrand.ly/chunk2  (chunk2 ~ 13.7 GB) Local Earthquakes\n\nhttps://rebrand.ly/chunk3  (chunk3 ~ 13.7 GB) Local Earthquakes\n\nhttps://rebrand.ly/chunk4  (chunk4 ~ 13.7 GB) Local Earthquakes\n\nhttps://rebrand.ly/chunk5  (chunk5 ~ 13.7 GB) Local Earthquakes\n\nhttps://rebrand.ly/chunk6  (chunk6 ~ 15.7 GB) Local Earthquakes \n\n#### If you have a fast internet you can download the entire dataset in a single file using following links:\n\nhttps://rebrand.ly/whole (merged ~ 85 GB) Local Earthquakes + Noise\n\n* Note1: some of the unzipper programs for Windows and Linux operating systems have size limits. Try '7Zip' software if had problems unzipping the files. \n\n* Note2: all the metadata are also available in the hdf5 file (as attributes associated with each waveform).\n\n* Note3: For some of the noise data waveforms are identical for 3 components. These are related to single-channel stations where we duplicated the vertical channel for horizontal ones. However, these makeup to less than 4 % of noise data. For the rest, noise is different for each channel.\n\n#### If you had trouble downloading the data from above links or unzipping them, you can get the dataset from SeisBench\n\n-----------------------------------\n\n### You can get the paper from here:\n\nhttps://rebrand.ly/STEADrg   or   https://rebrand.ly/STEADac\n\n### You can use QuakeLabeler (https://maihao14.github.io/QuakeLabeler/) or SeisBench (https://github.com/seisbench/seisbench) to labele and convert your data into STEAD format.\n\n### Last Update in the Dataset:\nMay 25, 2020 \n\n\n### Reporting Bugs:\n\nReport bugs at https://github.com/smousavi05/STEAD/issues.\n\nor send me an email: smousavi05@gmail.com\n\n-------------------------------------\nReference:\n\n`Mousavi, S. M., Sheng, Y., Zhu, W., Beroza G.C., (2019). \nSTanford EArthquake Dataset (STEAD): A Global Data Set of Seismic Signals for AI, \nIEEE Access, doi:10.1109/ACCESS.2019.2947848` \n\n\nBibTeX:\n\n    @article{mousavi2019stanford,\n      title={STanford EArthquake Dataset (STEAD): A Global Data Set of Seismic Signals for AI},\n      author={Mousavi, S Mostafa and Sheng, Yixiao and Zhu, Weiqiang and Beroza, Gregory C},\n      journal={IEEE Access},\n      year={2019},\n      publisher={IEEE}\n    }\n\n-------------------------------------\n\nThe CSV file can be used to easily select a specific part of the dataset and only read associated waveforms from the hdf5 file for efficiency.\n\n### Example of data selection and accessing (earthquake waveforms):\n\n```python\nimport pandas as pd\nimport h5py\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfile_name = \"merge.hdf5\"\ncsv_file = \"merge.csv\"\n\n# reading the csv file into a dataframe:\ndf = pd.read_csv(csv_file)\nprint(f'total events in csv file: {len(df)}')\n# filterering the dataframe\ndf = df[(df.trace_category == 'earthquake_local') \u0026 (df.source_distance_km \u003c= 20) \u0026 (df.source_magnitude \u003e 3)]\nprint(f'total events selected: {len(df)}')\n\n# making a list of trace names for the selected data\nev_list = df['trace_name'].to_list()\n\n# retrieving selected waveforms from the hdf5 file: \ndtfl = h5py.File(file_name, 'r')\nfor c, evi in enumerate(ev_list):\n    dataset = dtfl.get('data/'+str(evi)) \n    # waveforms, 3 channels: first row: E channel, second row: N channel, third row: Z channel \n    data = np.array(dataset)\n\n    fig = plt.figure()\n    ax = fig.add_subplot(311)         \n    plt.plot(data[:,0], 'k')\n    plt.rcParams[\"figure.figsize\"] = (8, 5)\n    legend_properties = {'weight':'bold'}    \n    plt.tight_layout()\n    ymin, ymax = ax.get_ylim()\n    pl = plt.vlines(dataset.attrs['p_arrival_sample'], ymin, ymax, color='b', linewidth=2, label='P-arrival')\n    sl = plt.vlines(dataset.attrs['s_arrival_sample'], ymin, ymax, color='r', linewidth=2, label='S-arrival')\n    cl = plt.vlines(dataset.attrs['coda_end_sample'], ymin, ymax, color='aqua', linewidth=2, label='Coda End')\n    plt.legend(handles=[pl, sl, cl], loc = 'upper right', borderaxespad=0., prop=legend_properties)        \n    plt.ylabel('Amplitude counts', fontsize=12) \n    ax.set_xticklabels([])\n\n    ax = fig.add_subplot(312)         \n    plt.plot(data[:,1], 'k')\n    plt.rcParams[\"figure.figsize\"] = (8, 5)\n    legend_properties = {'weight':'bold'}    \n    plt.tight_layout()\n    ymin, ymax = ax.get_ylim()\n    pl = plt.vlines(dataset.attrs['p_arrival_sample'], ymin, ymax, color='b', linewidth=2, label='P-arrival')\n    sl = plt.vlines(dataset.attrs['s_arrival_sample'], ymin, ymax, color='r', linewidth=2, label='S-arrival')\n    cl = plt.vlines(dataset.attrs['coda_end_sample'], ymin, ymax, color='aqua', linewidth=2, label='Coda End')\n    plt.legend(handles=[pl, sl, cl], loc = 'upper right', borderaxespad=0., prop=legend_properties)        \n    plt.ylabel('Amplitude counts', fontsize=12) \n    ax.set_xticklabels([])\n\n    ax = fig.add_subplot(313)         \n    plt.plot(data[:,2], 'k')\n    plt.rcParams[\"figure.figsize\"] = (8,5)\n    legend_properties = {'weight':'bold'}    \n    plt.tight_layout()\n    ymin, ymax = ax.get_ylim()\n    pl = plt.vlines(dataset.attrs['p_arrival_sample'], ymin, ymax, color='b', linewidth=2, label='P-arrival')\n    sl = plt.vlines(dataset.attrs['s_arrival_sample'], ymin, ymax, color='r', linewidth=2, label='S-arrival')\n    cl = plt.vlines(dataset.attrs['coda_end_sample'], ymin, ymax, color='aqua', linewidth=2, label='Coda End')\n    plt.legend(handles=[pl, sl, cl], loc = 'upper right', borderaxespad=0., prop=legend_properties)        \n    plt.ylabel('Amplitude counts', fontsize=12) \n    ax.set_xticklabels([])\n    plt.show() \n\n    for at in dataset.attrs:\n        print(at, dataset.attrs[at])    \n\n    inp = input(\"Press a key to plot the next waveform!\")\n    if inp == \"r\":\n        continue             \n```\n\n![event](eventSample.png)\n\n![event](eventSample2.png)\n\n-----------------------------------------                                                                                                                                                                                   \n\n### Example of data selection and accessing (noise waveforms):\n\n```python\n# reading the csv file into a dataframe:\ndf = pd.read_csv(csv_file)\nprint(f'total events in csv file: {len(df)}')\n# filterering the dataframe\ndf = df[(df.trace_category == 'noise') \u0026 (df.receiver_code == 'PHOB') ]\nprint(f'total events selected: {len(df)}')\n\n# making a list of trace names for the selected data\nev_list = df['trace_name'].to_list()[:200]\n\n# retrieving selected waveforms from the hdf5 file: \ndtfl = h5py.File(file_name, 'r')\nfor c, evi in enumerate(ev_list):\n    dataset = dtfl.get('data/'+str(evi)) \n    # waveforms, 3 channels: first row: E channel, second row: N channel, third row: Z channel \n    data = np.array(dataset)\n\n    fig = plt.figure()\n    ax = fig.add_subplot(311)         \n    plt.plot(data[:,0], 'k')\n    plt.rcParams[\"figure.figsize\"] = (8, 5)\n    legend_properties = {'weight':'bold'}    \n    plt.tight_layout()\n    plt.ylabel('Amplitude counts', fontsize=12) \n    ax.set_xticklabels([])\n\n    ax = fig.add_subplot(312)         \n    plt.plot(data[:,1], 'k')\n    plt.rcParams[\"figure.figsize\"] = (8, 5)\n    legend_properties = {'weight':'bold'}    \n    plt.tight_layout()     \n    plt.ylabel('Amplitude counts', fontsize=12) \n    ax.set_xticklabels([])\n\n    ax = fig.add_subplot(313)         \n    plt.plot(data[:,2], 'k')\n    plt.rcParams[\"figure.figsize\"] = (8,5)\n    legend_properties = {'weight':'bold'}    \n    plt.tight_layout()     \n    plt.ylabel('Amplitude counts', fontsize=12) \n    ax.set_xticklabels([])\n    plt.show() \n\n    for at in dataset.attrs:\n        print(at, dataset.attrs[at])    \n\n    inp = input(\"Press a key to plot the next waveform!\")\n    if inp == \"r\":\n        continue       \n```\n\n![event](noise.png)\n\n-----------------------------------------                                                                                    \n\n### How to convert raw waveforms into Acceleration, Velocity, or Displacement:\n\n```python\nimport obspy\nimport h5py\nfrom obspy import UTCDateTime\nimport numpy as np\nfrom obspy.clients.fdsn.client import Client\nimport matplotlib.pyplot as plt\n\ndef make_stream(dataset):\n    '''\n    input: hdf5 dataset\n    output: obspy stream\n\n    '''\n    data = np.array(dataset)\n\n    tr_E = obspy.Trace(data=data[:, 0])\n    tr_E.stats.starttime = UTCDateTime(dataset.attrs['trace_start_time'])\n    tr_E.stats.delta = 0.01\n    tr_E.stats.channel = dataset.attrs['receiver_type']+'E'\n    tr_E.stats.station = dataset.attrs['receiver_code']\n    tr_E.stats.network = dataset.attrs['network_code']\n\n    tr_N = obspy.Trace(data=data[:, 1])\n    tr_N.stats.starttime = UTCDateTime(dataset.attrs['trace_start_time'])\n    tr_N.stats.delta = 0.01\n    tr_N.stats.channel = dataset.attrs['receiver_type']+'N'\n    tr_N.stats.station = dataset.attrs['receiver_code']\n    tr_N.stats.network = dataset.attrs['network_code']\n\n    tr_Z = obspy.Trace(data=data[:, 2])\n    tr_Z.stats.starttime = UTCDateTime(dataset.attrs['trace_start_time'])\n    tr_Z.stats.delta = 0.01\n    tr_Z.stats.channel = dataset.attrs['receiver_type']+'Z'\n    tr_Z.stats.station = dataset.attrs['receiver_code']\n    tr_Z.stats.network = dataset.attrs['network_code']\n\n    stream = obspy.Stream([tr_E, tr_N, tr_Z])\n\n    return stream\n \n def make_plot(tr, title='', ylab=''):\n    '''\n    input: trace\n    \n    '''\n    \n    fig = plt.figure()\n    ax = fig.add_subplot(1, 1, 1)\n    ax.plot(tr.times(\"matplotlib\"), tr.data, \"k-\")\n    ax.xaxis_date()\n    fig.autofmt_xdate()\n    plt.ylabel('counts')\n    plt.title('Raw Data')\n    plt.show()\n    \n    \nif __name__ == '__main__': \n\n    # reading one sample trace from STEAD\n    dtfl = h5py.File(file_name, 'r')\n    dataset = dtfl.get('data/109C.TA_20061103161223_EV') \n\n    # convering hdf5 dataset into obspy sream\n    st = make_stream(dataset)\n    \n    # ploting the verical component of the raw data\n    make_plot(st[2], title='Raw Data', ylab='counts')\n\n```\n\n![raw](1_raw.png)\n\n```python\n\n    # downloading the instrument response of the station from IRIS\n    client = Client(\"IRIS\")\n    inventory = client.get_stations(network=dataset.attrs['network_code'],\n                                    station=dataset.attrs['receiver_code'],\n                                    starttime=UTCDateTime(dataset.attrs['trace_start_time']),\n                                    endtime=UTCDateTime(dataset.attrs['trace_start_time']) + 60,\n                                    loc=\"*\", \n                                    channel=\"*\",\n                                    level=\"response\")  \n\n    # converting into displacement\n    st = make_stream(dataset)\n    st = st.remove_response(inventory=inventory, output=\"DISP\", plot=False)\n\n    # ploting the verical component\n    make_plot(st[2], title='Displacement', ylab='meters')\n    \n```\n\n![disp](1_disp.png)\n\n```python\n    # converting into velocity\n    st = make_stream(dataset)\n    st = st.remove_response(inventory=inventory, output='VEL', plot=False) \n    \n    # ploting the verical component\n    make_plot(st[2], title='Velocity', ylab='meters/second')\n```\n        \n![vel](1_vel.png)\n\n```python\n    # converting into acceleration\n    st = make_stream(dataset)\n    st.remove_response(inventory=inventory, output=\"ACC\", plot=False) \n    \n    # ploting the verical component\n    make_plot(st[2], title='Acceleration', ylab='meters/second**2')\n```\n\n![acc](1_acc.png)\n\n------------------------------------------\n\n### These are some of the studies that used STEAD. \nYou can check out the code repository of these studies as examples of how a Keras or Tensorflow model can be trained by STEAD in a memory efficient fashion:\n\n*   Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking,\n    SM Mousavi, WL Ellsworth, W Zhu, LY Chuang, GC Beroza, Nature Communications 11 (1), 1-12.\n\n*   Bayesian-deep-learning estimation of earthquake location from single-station observations,\n    SM Mousavi, GC Beroza, IEEE Transactions on Geoscience and Remote Sensing, 1 - 14.\n    \n*   A machine‐learning approach for earthquake magnitude estimation,\n    SM Mousavi, GC Beroza, Geophysical Research Letters 47 (1), e2019GL085976.\n\n*   Complex Neural Networks for Estimating Epicentral Distance, Depth, and Magnitude of Seismic Waves,\n    Ristea, Nicolae-Cătălin, and Anamaria Radoi., IEEE Geoscience and Remote Sensing Letters.\n    \n*   Earthquake detection and P-wave arrival time picking using capsule neural network. \n    Saad, and Chen. IEEE Transactions on Geoscience and Remote Sensing, 59(7), 6234-6243.\n    \n*   Prediction of intensity and location of seismic events using deep learning.\n    Nicolis, Plaza, \u0026 Salas. Spatial Statistics, 42, 100442.\n\n## License\nFor more details on the license of this repository see [LICENSE](https://github.com/smousavi05/STEAD/blob/master/LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmousavi05%2FSTEAD","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmousavi05%2FSTEAD","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmousavi05%2FSTEAD/lists"}