{"id":22733946,"url":"https://github.com/mhw32/persistent-homology","last_synced_at":"2025-06-30T19:33:08.705Z","repository":{"id":82938364,"uuid":"42668152","full_name":"mhw32/persistent-homology","owner":"mhw32","description":"Statistically Quantifying Difference in the Observable Universe under Warm and Cold Dark Matter Assumptions","archived":false,"fork":false,"pushed_at":"2017-10-31T06:24:38.000Z","size":355728,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-30T02:11:48.323Z","etag":null,"topics":["astrostatistics","hypothesis-testing","persistent-homology","simulation"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mhw32.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-09-17T16:36:51.000Z","updated_at":"2018-12-27T20:01:48.000Z","dependencies_parsed_at":"2023-03-12T17:05:56.688Z","dependency_job_id":null,"html_url":"https://github.com/mhw32/persistent-homology","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mhw32/persistent-homology","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhw32%2Fpersistent-homology","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhw32%2Fpersistent-homology/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhw32%2Fpersistent-homology/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhw32%2Fpersistent-homology/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mhw32","download_url":"https://codeload.github.com/mhw32/persistent-homology/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhw32%2Fpersistent-homology/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262838500,"owners_count":23372541,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["astrostatistics","hypothesis-testing","persistent-homology","simulation"],"created_at":"2024-12-10T20:17:05.939Z","updated_at":"2025-06-30T19:33:08.678Z","avatar_url":"https://github.com/mhw32.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Persistent-Homology\n\nAn ongoing research project to use statistical analysis to extrapolate topological structure from simulations of the observable Universe under certain sets of assumptions. Specific focus lies on distinguishing conceptualized theories of dark matter. The research will be published in the Royal Astronomy Journal.\n\n## How to Run\n \nInstructions are provided for the following tasks:\n\n- Generating voronoi tessellations.\n- Running the testing library. \n- Applying the library to real simulation studies.\n\n### Voronoi Approximations\n\nIn order to gauge the effectiveness of the hypothesis tests, voronoi approximations are necessary since there are no measures of correctness for the real dataset. In one simulation, I generate a set of voronoi tessellations, deemed a voronoi group. Voronoi tessellations each individually have customizable filament percentages, wall percentages, clutter percentages, etc. When generating a Voronoi group, all percentages are held constant excep the filament percentage, which is varied from `0.1` to `0.9`. This variation is to mimic the differences between the observable Universe under warm and cold dark matter assumptions. A voronoi group takes the following settings:\n\n- `N` : The number of particles per Voronoi foam.\n- `Boxlim` : The volume of the foam i.e. `c(0, 50)`.\n- `res` : The resolution (\"fineness\") of the foam.\n- `perturb` : The amount of noise to add.\n- `groupN` : The number of tessellations per group.\n- `baseline` : Which percent filament to treat as the baseline i.e. `0.1`. \n- `nameId` : The name to give this group; it will be saved as this name.\n\nTo run the generation process, see `main_tests/run_voronoi.r`. Given some settings, the function to call is:\n\n    \u003e\u003e source('VoronoiFoam.r')\n    \u003e\u003e voronoi_compilation(N=N, Boxlim=Boxlim, res=res, perturb=perturb, groupN=groupN, baseline=baseline, nameId=i)\n\n### Hypothesis Testing Library\n\nWith the testing library, we want to create two sets of structures: the `foam` and the `base`. The `base` represents the baseline for which we compare each structure in the `foam` to. In our case, we consider an independently generated base of 0.1 percFil and an alternative base of 0.9 percFil. We would like the tests to be repeatable, so it is run 100 times. See `cluster_scripts/run_vtests.r` for more details.\n\n    \u003e\u003e source('testlib.r')\n    \u003e\u003e foam \u003c- readRDS(paste('./saved_states/large_set/foam', i, '.rds', sep=''))\n    \u003e\u003e base \u003c- readRDS(paste('./saved_states/large_set/baseline', i, '-0.1.rds', sep=''))\n    \u003e\u003e test_wrapper(foam, base, paste(i, '0.1baseNormFalse', sep='-'), FALSE)\n\n### Real Data\n\nThere are two options when trying to analyze the full Eagle dataset. Either one can use the KDE test, for which only 1 sample is needed, or one can slice the data set into different cubes in order to bootstrap a large enough set to use the traditional `testlib.r`. \n\nTo load the Eagle data:\n\n    \u003e\u003e source(\"process_eagle.r\")\n    \u003e\u003e cdm \u003c- load_CDM()\n    \u003e\u003e wdm \u003c- load_WDM()\n\nTo create the respective persistence diagrams:\n\n    \u003e\u003e res \u003c- 2\n    \u003e\u003e boxlim \u003c- c(0, 100) \n    \u003e\u003e cdm_diag \u003c- gridDiag(cdm, dtm, lim=cbind(boxlim, boxlim, boxlim), by=res, sublevel=T, printProgress=T, m0=0.001)\n    \u003e\u003e wdm_diag \u003c- gridDiag(wdm, dtm, lim=cbind(boxlim, boxlim, boxlim), by=res, sublevel=T, printProgress=T, m0=0.001)\n\nAll that remains is to calculate the p-value:\n\n    \u003e\u003e pval \u003c- ks::kde.test(cdm_diag$diagram, wdm_diag$diagram)$pvalue\n\nAlternatively, slicing is done with the following procedure. From here the same testing framework is applicable:\n\n    \u003e\u003e cdm_slices \u003c- slice_cube_robust(cdm, 2)\n    \u003e\u003e wdm_slices \u003c- slice_cube_robust(wdm, 2)\n    \u003e\u003e cdm_diags \u003c- persistify_set(cdm_slices)\n    \u003e\u003e wdm_diags \u003c- persistify_set(wdm_slices)\n\n## Repository Folders\n\n- `cluster_scripts` : Scripts designed to run the simulations and topological analysis on the Yale Grace cluster. This uses the LSF platform so all scripts are `*.bsub`. Each `*.bsub` file has a corresponding `*.r` file.\n- `main_tests` : Important scripts for utilizing the hypothesis testing framework to compare simulations.\n- `simple_tests` : Development scripts for analyzing properties of Voronoi tessellations and persistent homologies.\n- `saved_states` : A temporary database for results and figures prior to processing.\n- `simulations` : A sample of the Eagle simulations of the observable Universe under WDM and CDM assumptions.\n- `sketch` : Contains images created by an illustrator-type program.\n- `images` : Holds images generated for the paper writing process as well as small demonstrations of topological analysis. Also holds results from visualizing hypothesis test p-values.\n- `web` : A static web page to more-or-less showcase the work.\n- `writeup` : Contains many different Overleaf repositories including the proposal, the complete writeup, kernel-density analysis, etc.\n\n### Repository R Scripts\n\n- `distance.r` : Contains functions for different distance measurements (Bottleneck, Wasserstein, etc.)\n- `distribution.r` : Contains functions for deriving a statistic from the distribution and contour hypothesis tests.\n- `euler.r` : Implementation of the Euler characteristic, which in theory is topologically invariant. \n- `localtest.r` : Functions using the local KDE test.\n- `multiassign.r` : Short cut for doing multiple variable assignment in a single line.\n- `NHST.r` : Implementation of randomization-style null hypothesis significance tests for persistence diagrams.\n- `process_eagle.r` : Implementation of cubic slicing for Eagle data.\n- `summarize.r` : Functions for processing Euler, Silhouette, and Landscape hypothesis tests. \n- `testlib.r` : Almost wrapper library for all the individual functions. This is where the API should hook in. \n- `tools.r` : Random (, possibly) useful functions.\n- `twosample.r` : Functions for permutations testing (Gaussian Kernel).\n- `Voronoi3Dfct.r` : Implementation of Voronoi tessellation generations with spacing derived from KNN. \n- `VoronoiFoam.r` : Wrapper to test `Voronoi3Dfct.r`. \n\n### Repository Python Scripts\n\n- `parse.py` : Given the messy output of `testlib.r`, this is a robust way of organizing the data in a format that is easy to work with. For example:\n\n        \u003e\u003e import parse\n        \u003e\u003e start = '../saved_states/third_push/results-' \n        \u003e\u003e end   = '-0.1base.txt'\n        \u003e\u003e filepaths = [start+str(i)+end for i in range(1, 21)]\n        \u003e\u003e resArr = np.array([parse.parse(f) for f in filepaths])\n\n- `parse.py` also contains two important functions: `parse.prepare1d` and `parse.prepare2d` that take `resArr` and further parse it into a plot-representable form. Certain tests are performed per dimension and require `prepare2d`; others only need `prepare1d`. \n\n        \u003e\u003e singles = ['all-silh', 'euler']\n        \u003e\u003e doubles = ['indiv_silh', 'contour', 'global-kde']\n        \u003e\u003e bighash = {}\n        \u003e\u003e for characteristic in singles:\n        \u003e\u003e     bighash[characteristic] = parse.prepare1d(resArr, characteristic)\n        \u003e\u003e for characteristic in doubles:\n        \u003e\u003e     for dim in [0,1,2]:\n        \u003e\u003e         bighash[characteristic+'-dim-'+str(dim)] = parse.prepare2d(resArr, characteristic, dim)\n\n## Contributing / Questions\n\nFeel free to contribute to the code. If you have questions, email me at `me@mikewuis.me`. If you submit a pull request, I'd be happy to review it and most probably accept. Would love feedback too! If you spot a bug, let me know!\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhw32%2Fpersistent-homology","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmhw32%2Fpersistent-homology","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhw32%2Fpersistent-homology/lists"}