{"id":24623408,"url":"https://github.com/schipp/fast_beamforming","last_synced_at":"2025-04-10T21:13:55.229Z","repository":{"id":192364858,"uuid":"684053669","full_name":"schipp/fast_beamforming","owner":"schipp","description":"Fast and efficient beamforming in Python - educational notebooks","archived":false,"fork":false,"pushed_at":"2024-10-23T13:39:55.000Z","size":1743,"stargazers_count":37,"open_issues_count":0,"forks_count":7,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-10T21:13:49.500Z","etag":null,"topics":["beamformer","beamforming","jupyter","python","seismic-interferometry","seismic-source","seismology"],"latest_commit_sha":null,"homepage":"https://doi.org/10.5281/zenodo.8315028","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/schipp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-28T11:05:57.000Z","updated_at":"2025-03-03T01:23:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"c37611a5-2a32-4430-b647-9cb51872f3cf","html_url":"https://github.com/schipp/fast_beamforming","commit_stats":null,"previous_names":["schipp/fast_beamforming"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schipp%2Ffast_beamforming","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schipp%2Ffast_beamforming/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schipp%2Ffast_beamforming/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schipp%2Ffast_beamforming/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/schipp","download_url":"https://codeload.github.com/schipp/fast_beamforming/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248298312,"owners_count":21080320,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beamformer","beamforming","jupyter","python","seismic-interferometry","seismic-source","seismology"],"created_at":"2025-01-25T03:57:33.797Z","updated_at":"2025-04-10T21:13:55.204Z","avatar_url":"https://github.com/schipp.png","language":"Jupyter Notebook","funding_links":[],"categories":["Array seismology"],"sub_categories":[],"readme":"# Fast beamforming in Python\n\n[![DOI](https://zenodo.org/badge/684053669.svg)](https://zenodo.org/badge/latestdoi/684053669)\n\n\u003cimg align=\"left\" src=\"beampowers.png\" width=\"400px\"\u003e\n\nCross-correlation beamforming can be realised in a few lines of matrix operations in Python, making use of `pytorch` linear algebra optimisations for speed. For small problems (=smaller than memory), this is fast, efficient and fully parallel. For large problems, this approach fails, specifically when the matrices containing cross correlations become too large for memory. To solve this, we can employ `dask` to divide the computations automatically into multiple tasks than can run across arbitrary infrastructure while retaining much of the same syntax and logic.\n\nWe demonstrate this with the notebooks in this repository:\n\n* `beamforming_naive.ipynb`: naive beamforming code, **SLOW** \"pure\" Python version for teaching purposes only\n* `beamforming_numpy.ipynb`: same as above, rewritten in `numpy` using broadcasting etc.\n* `beamforming_pytorch.ipynb`: same as above, replacing `numpy` functions with `pytorch` equivalents\n* `beamforming_dask.ipynb`: same as above, moving memory-limited computations to `dask`.\n\nOther variants:\n\n* `beamforming_geo.ipynb`: same as the `pytorch` version, but using geographical coordinates. requires [geokernels](https://github.com/sigmaterra/geokernels) for distance calculations.\n* `beamforming_planewave_data.ipynb`: plane-wave beamforming of seismic field data. requires [obspy](https://docs.obspy.org) for handling seismograms.\n\nIn these notebooks, logic and processing is not abstracted away in a package of functions. Instead, all processing happens within the notebooks for instructional purposes.\n\n## Performance statistics\n\nThese are the runtimes of the cell that performs beamforming (under 3. Beamforming) on a machine with 2x Intel Xeon Gold 6326 (16C/32T), 512 GB RAM for the parameters indicated below.\n\n**Is your code faster? Let me know!**\n\n### `n_sensors = 100`\n\n| notebook version | runtime  | speed-up |\n| ---------------- | -------- | -------- |\n| `naive`          | 43.9 sec | 1x       |\n| `numpy`          | 11.7 sec | 3.75x    |\n| `pytorch`        | 0.9 sec  | 48.8x    |\n| `dask`           | 1.9 sec  | 23.1x    |\n\n### `n_sensors = 1000`\n\n\n| notebook version | runtime    | speed-up |\n| ---------------- | ---------- | -------- |\n| `naive`          | 4861.3 sec | 1x       |\n| `numpy`          | fail       | fail     |\n| `pytorch`        | fail       | fail     |\n| `dask`           | 47.0 sec   | 103.4x   |\n\n`numpy` and `pytorch` versions fail, because `S` would require `2.1TiB` of memory.\n\nOther parameters in both tests: `grid_limit = 100`, `grid_spacing = 5`, `window_length = 100`, `sampling_rate = 10`, `fmin, fmax = 0.1, 1.0`\n\n## Python performance for scientific computing\n\nThis repository is also intended as a case study to teach students and researchers about the potential of a) making best use of the already exisiting computing libraries for significant speed-up compared to naively written Python code and b) how much performance can be gained simply by moving from `numpy` to equivalent `pytorch` code. Note that `pytorch` is significantly faster in the example of cross-correlation beamforming, because large tensors are involved. Further note that all linear equation systems, no matter what physics they express in your specific context, can be coded in matrix formulations, allowing to exploit the linear algebra optimisations developed in the machine learning community for your research.\n\n## Notes on `dask`\n\n`dask` allows to employ the same algorithm and largely the same syntax as the `pytorch` version, which means one doesn't have to worry about developing a different algorithm that is not memory-limited. However, `dask` also introduces a new optimisation problem: The choice of \"good\" chunks sizes for the specific system at hand. This is specific to the compute infrastructure used. On the bright side, this would need to be optimized only once for a given problem-geometry (number of stations, grid points, frequencies). Even without randomly chosen chunksizes (100, 100, 100 in the notebook here), performance is good. Visit the [dask documentation](https://docs.dask.org/en/stable/understanding-performance.html) for more details.\n\n## Methodological background\n\n### What is beamforming?\n\nBeamforming is a phase-matching algorithm commonly used to estimate the origin and local phase velocity of a wavefront propagating across an array of sensors. The most basic beamformer is the delay-and-sum beamformer, where recordings across the sensors are phase-shifted and summed (forming the beam) to test for the best-fitting source origin and medium velocity (Rost and Thomas, 2002).\n\n### Cross-correlation beamforming\n\nThe cross-correlation beamformer (also Bartlett beamformer, conventional beamformer, etc.) applies the same delay-and-sum idea to correlation functions between all sensor pairs (Ruigrok et al. 2017). This has the major advantage that only the coherent part of the wavefield is taken into account. The major disadvantage is that the computation of cross correlations between all station pairs can become expensive fast, scaling with $n^2$.\n\nA few different formulation of this beamformer exist. We write it in frequency domain as\n\n$B = \\sum_\\omega \\sum_j \\sum_{k\\neq j} K_{jk}(\\omega) S_{kj}(\\omega),$\n\nwith $B$ the beampower, $K_{jk}(\\omega) = d_j(\\omega) d^H_k(\\omega)$ the cross-spectral density matrix of recorded signals $d$, $S_{jk}(\\omega) = s_j(\\omega) s^H_k(\\omega)$ the cross-spectral density matrix of synthetic signals $s$, $j$ and $k$ identify sensors, and $H$ the complex conjugate. We exclude auto-correlations $j=k$, because they contain no phase-information. Consequently, negative beampowers indicate anti-correlation.\n\nThe synthetic signals $s$ (often called replica vectors or Green's functions) are the expected wavefield for a given source origin and medium velocity, most often in acoustic homogeneous half-space $s_j = \\exp(-i \\omega t_j)$, where $t_j$ is the traveltime from source to each receiver $j$.\n\n### Plane-wave beamforming\n\nIn seismology, \"beamforming\" is often synonymous with plane-wave beamforming. In plane-wave beamforming $t_j$ is the relative travel time from a reference point (commonly center of array) to the sensor $j$ for a given plane-wave\n\n$t_j = \\boldsymbol{r_j} \\cdot \\boldsymbol{u_h}$,\n\nwith $\\boldsymbol{r_j} = (r_x, r_y)$ the coordinates of sensor $j$ relative to the reference point, and $\\boldsymbol{u_h} = u_h(\\sin(\\theta), \\cos(\\theta))$ the horizontal slowness vector of the plane-wave, with $u_h$ the horizontal slowness and $\\theta$ the direction of arrival. $u_h$ and $\\theta$ are the parameters that are tested for (or equivalently $u_x, u_y$). Because plane waves are assumed, the source origin must be enough far away that the plane-wave assumption becomes adequate. The advantage of this is that the spatial dimension is 1 (direction of arrival), which is cheap to compute.\n\n### Matched field processing\n\nWhen curved wavefronts are allowed, sources may be located within the sensor array and the grid that is tested is defined in space instead of the slowness-domain, adding at least one extra dimension. This is called matched field processing (e.g., Baggeroer et al. 1988). In practice, the difference between plane-wave beamforming and matched field processing lies in the computation of the Green's functions $s_j$, or more precisely the expected traveltimes $t_j$.\n\nIn MFP, the travel time is computed as\n\n$t_j = |\\boldsymbol{r}_j - \\boldsymbol{r}_s| / c$,\n\nwith $|\\boldsymbol{r}_j - \\boldsymbol{r}_s|$ the euclidean distance between sensor and source and $c$ the medium velocity. The parameters tested for in MFP are the source position $\\boldsymbol{r}_s$ (2D, 3D) and, sometimes, the medium velocity $c$. A different name for MFP that is intuitive to seismologists may be curved-wave Beamforming.\n\nThe beamforming in the notebooks here is Matched Field Processing.\n\n### References\n\nRost, S. \u0026 Thomas, C., 2002. Array seismology: Methods and applications. *Reviews of Geophysics*, **40**, 2–1–2–27. doi:10.1029/2000RG000100\n\nRuigrok, E., Gibbons, S. \u0026 Wapenaar, K., 2017. Cross-correlation beamforming. *J Seismol*, **21**, 495–508. doi:10.1007/s10950-016-9612-6\n\nBaggeroer, A.B., Kuperman, W.A. \u0026 Schmidt, H., 1988. Matched field processing: Source localization in correlated noise as an optimum parameter estimation problem. *The Journal of the Acoustical Society of America*, **83**, 571–587. doi:10.1121/1.396151\n\n### Requirements\n\nTo run these notebooks, the following is required\n\n* Python\n* scientific Python stack (numpy, scipy, matplotlib)\n* notebook\n* [torch](https://pytorch.org)\n* [dask](https://www.dask.org)\n* [geokernels](https://github.com/sigmaterra/geokernels) for distances on geographical grids\n\nA functioning installation can be achieved, e.g., via conda by\n\n```bash\n\u003e\u003e conda create -n fast_beamforming python=3.11\n\u003e\u003e conda activate fast_beamforming\n\u003e\u003e conda install pytorch dask scipy matplotlib notebook\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschipp%2Ffast_beamforming","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fschipp%2Ffast_beamforming","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschipp%2Ffast_beamforming/lists"}