{"id":14066897,"url":"https://github.com/gersteinlab/shiny-dim-reduction","last_synced_at":"2026-01-05T13:02:50.099Z","repository":{"id":146424826,"uuid":"289967127","full_name":"gersteinlab/shiny-dim-reduction","owner":"gersteinlab","description":"This project applies dimensionality reduction to tabular data and generates applications for visualizing the results.","archived":false,"fork":false,"pushed_at":"2024-07-28T03:22:05.000Z","size":18067,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-01-27T10:51:12.752Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gersteinlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-24T15:32:46.000Z","updated_at":"2024-07-28T03:22:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"15e459f5-65d4-4823-a3ff-2115e92ad8f0","html_url":"https://github.com/gersteinlab/shiny-dim-reduction","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gersteinlab%2Fshiny-dim-reduction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gersteinlab%2Fshiny-dim-reduction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gersteinlab%2Fshiny-dim-reduction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gersteinlab%2Fshiny-dim-reduction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gersteinlab","download_url":"https://codeload.github.com/gersteinlab/shiny-dim-reduction/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244947011,"owners_count":20536545,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-13T07:05:19.215Z","updated_at":"2026-01-05T13:02:45.069Z","avatar_url":"https://github.com/gersteinlab.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"\u003c!---\n---\ntitle: \"Shiny Dimensionality Reduction\"\npagetitle: Shiny Dimensionality Reduction\n---\n--\u003e\n\n# Shiny Dimensionality Reduction\n\n\u003ci\u003eJustin Chang, Joel Rozowsky, Mark Gerstein\u003c/i\u003e\n\n##### Table of Contents  \n[Overview](#overview)  \n[Installing R](#installing-r)  \n[Running App Code](#running-app-code)  \n[App Instructions](#app-instructions)  \n[Installing RStudio](#installing-rstudio)  \n[Installing Rtools](#installing-rtools)  \n[Installing Anaconda](#installing-anaconda)  \n[Performing Reduction](#performing-reduction)  \n[AWS Integration](#aws-integration)  \n[Running Tests](#running-tests)  \n[Contributors](#contributors) \n\n\u003ca name=\"overview\"/\u003e\n\n## Overview\n\nThis shiny dimensionality reduction (SDR) project performs dimensionality reduction on tabular data and generates R Shiny apps for interactive, precomputed visualization of the results. Supported analyses include:\n\n* Principal Component Analysis (PCA)\n* Variational Autoencoders (VAE)\n* Uniform Manifold Approximation and Projection (UMAP)\n* Potential of Heat diffusion for Affinity-based Transition Embedding (PHATE)\n* t-Distributed Stochastic Neighbor Embedding (tSNE)\n* Set Thresholding and Intersection with UpSetR\n\nThere are several ways to use this project, each with varying system requirements:\n\n* \u003cb\u003eOnline App:\u003c/b\u003e If you received a URL, no downloads are necessary - visit the provided link with any browser.\n* \u003cb\u003eLocal App: \u003c/b\u003e If you received a ZIP representing an application's Store, you will need a way to unzip files (e.g. 7Zip) and R \u003e= 4.0.0 (see [Installing R](#installing-r)). First, unzip the Store. Second, obtain this repository locally by downloading / unzipping, using \"git clone\", or using RStudio's \"Create Project (Version Control)\" functionality. Third, follow the instructions in [Running App Code](#running-app-code).\n* \u003cb\u003ePipeline:\u003c/b\u003e If you intend to run a dimensionality reduction pipeline on tabular data, please read the rest of this document before proceeding.  \n\n\u003ca name=\"installing-r\"/\u003e\n\n## Installing R\n\nTo run app source code or perform dimensionality reduction, you must have R \u003e= 4.0.0 installed.\n\nYou can download an installer for R from https://cran.r-project.org.  \nFor reproducibility, the following settings were used in development:  \n\n* Run the installer as an administrator.  \n* Keep the default installation location.  \n* Select the 64-bit user installation.  \n* Select customized startup, MDI, plain text help, no start menu folder.  \n* Keep the defaults for additional tasks.\n\nYou can customize your home / library locations with the environmental variables \"HOME\", \"R_LIBS_USER\":\n\n* \u003cb\u003eWindows:\u003c/b\u003e Go to your environment variables (\"env\" in search) under \"System Variables\".\n* \u003cb\u003eMacOS / Linux:\u003c/b\u003e Edit your environment variables through your shell's \".bashrc\" file.\n\n\u003ca name=\"running-app-code\"/\u003e\n\n## Running App Code\n\nIn R, set the working directory to the \"app\" folder in the repo and run the following code to install necessary packages for the app:\n\n```R\nsource(\"install.R\")\n```\n\nThe above code does not need to be run on subsequent app launches. To launch the app, set the working directory to the \"app\" folder in the repo and run the following code to launch the app:\n\n```R\nshiny::runApp()\n```\n\nThe application data file (\"app_data.rds\") and a store file (\"local_store.rds\" or \"cloud_store.rds\") is required for the application to run properly. If the application is run locally and no store files are present, you will be prompted to specify the store location. Relevant files will then be copied over.\n\n\u003ca name=\"app-instructions\"/\u003e\n\n## App Instructions\n\nIn Shiny, values are reactive and observe their dependencies. If one of their\ndependencies is invalidated - meaning one of the inputs has changed - then they\nrecalculate their values. Once a reactive value is recalculated, the functions\nthat depend on it will be invalidated, causing their own downstream recalculation.\nThis approach allows the app to be highly responsive, without constantly performing\nupstream calculations. Users can therefore freely change parameters as they go.\nInvisible outputs are also not activated, which saves drawing time.\n\nIf you want to save plotly output as an image, use the camera icon. To save all other output, right click the image and use 'Save image as ...'\n\n\u003cb\u003eControls Glossary\u003c/b\u003e\n\n* \u003cu\u003eStart Plotting:\u003c/u\u003e Links inputs to the reactive system, which causes plots to update instantaneously.\n* \u003cu\u003eStop Plotting:\u003c/u\u003e Unlinks inputs to the reactive system, which freezes plotting.\n* \u003cu\u003eRequest:\u003c/u\u003e Submit a custom analysis request.\n* \u003cu\u003eRefresh:\u003c/u\u003e Check the application's store for new user requests.\n* \u003cu\u003eBookmark:\u003c/u\u003e Creates a URL that replicates this session.\n* \u003cu\u003eNumeric Data:\u003c/u\u003e Downloads the numeric data used to produce the current plot.\n* \u003cu\u003eMetadata:\u003c/u\u003e Downloads the metadata used to produce the current plot.\n* \u003cu\u003eNotes:\u003c/u\u003e See documentation, including a brief description of each category.\n\n\u003cb\u003eParameters Glossary\u003c/b\u003e\n\n* \u003cu\u003eCategory:\u003c/u\u003e A category is a set of samples with conserved features. Since categories do not necessarily share common metadata characteristics, distinct inputs exist for each category in row subsets, column subsets, colors, shapes, labels, filters, selections, and thresholds.\n* \u003cu\u003eSample Subset:\u003c/u\u003e A sample subset is a row subset for a category.\n* \u003cu\u003eFeature Subset:\u003c/u\u003e A feature subset is a column subset for a category.\n* \u003cu\u003eScaling:\u003c/u\u003e Logarithmic data underwent a transformation of f(x) = log2(x+1). Linear data was not transformed by this function.\n* \u003cu\u003eNormalization:\u003c/u\u003e Global methods normalize over the whole matrix. Local methods normalize over each feature. Normalization generally benefits neural networks, so it is suggested for VAE. On the other hand, normalization can be detrimental to PCA, as it removes the relative magnitudes of all features.\n* \u003cu\u003eMethod of Dimensionality Reduction:\u003c/u\u003e\n  * PCA: Principal Component Analysis.\n  * VAE: Variational Auto-Encoder.\n  * UMAP: Uniform Manifold Approximation and Projection.\n  * PHATE: Potential of Heat diffusion for Affinity-based Transition Embedding.\n  * Sets: Uses a user-calculated threshold to analyze characteristic intersections.\n* \u003cu\u003eMethod of Visualization:\u003c/u\u003e\n  * Explore: Plots combinations of principal components.\n  * Summarize: Generates a filter-free summary, with best-fit lines in ggplot2 and plotly3.\n  * tSNE: Flattens all components into a t-distributed Stochastic Neighbor Embedding.\n* \u003cu\u003ePerplexity:\u003c/u\u003e Perplexity is a hyperparameter used in dimensionality reduction methods that create clusters. It is analogous to the expected number of neighbors for any data point.\n* \u003cu\u003eThreshold:\u003c/u\u003e Let a sample's range be normalized to (0,1) after the scaling transformation. Then a sample will be considered present in a categorical characteristic if it is expressed above this threshold. The bounds of this slider were selected to ensure a broad range of expression patterns without overflowing memory.\n\n\u003cb\u003eFilters Glossary\u003c/b\u003e\n\n* \u003cu\u003eFraction of Samples:\u003c/u\u003e Suppose S samples belong to a characteristic. Suppose a feature is present at the threshold level in G of those samples. Then this slider determines the acceptable values of G/S to be displayed.\n* \u003cu\u003eNumber of Characteristics:\u003c/u\u003e This slider determines the number of possible sets that must contain a feature for the feature to be displayed. A feature is included if it is present in an appropriate number of samples.\n* \u003cu\u003eMaximum Features:\u003c/u\u003e The maximum number of features displayed by set-based approaches.\n* \u003cu\u003eColor By, Shape By, Label By:\u003c/u\u003e\nWhat category should points on the graph be colored / shaped / labeled by?\n(Note: depends on the category selected.)\n* \u003cu\u003eCurrent Filter, Filter By:\u003c/u\u003e\n'Current Filter' lists all available categories for which filters can be applied.\nSamples satisfying the intersection of all filters will be plotted. Filters consist\nof including / excluding factors of a selected metadata characteristic.\n(Note: depends on the category selected.)\n\n\u003cb\u003eSettings Glossary\u003c/b\u003e\n\n* \u003cu\u003eSettings Menu:\u003c/u\u003e\nIf 'Embed Title' is checked, then the title of the plot will be included within the\nplot graphic. Otherwise, it will be displayed as actual text. If 'Embed Legend' is\nchecked, then the legend of the plot will be included within the plot graphic.\nOtherwise, the legend will be displayed as an external table. If 'Boost Graphics' is\nchecked, certain plots will be drawn with more expensive methods. If 'Separate Colors' is\nunchecked, then colors / shapes / labels will all be bound to the current color.\nIf 'Uninverted Colors' is unchecked, then color scales will be reversed.\n* \u003cu\u003eColor Palette:\u003c/u\u003e\nPlots support 12 color scales. The custom color scales are\n'Custom' and 'Grayscale'. The color scales from R are\n'Rainbow', 'Heat', 'Terrain', 'Topography', and 'CM'. The color scales from Viridis are\n'Viridis', 'Magma', 'Plasma', 'Inferno', and 'Cividis'. Cividis\n'enables nearly-identical visual-data interpretation' for color-deficient vision,\n'is perceptually uniform in hue and brightness, and increases in brightness linearly'.\n* \u003cu\u003eGraph Height:\u003c/u\u003e\nThe height of the plotting graphic. If a data table's UI does not appear responsive, try\nincreasing this value.\n* \u003cu\u003eNotification Time:\u003c/u\u003e\nThe time it takes for notifications to fade away. Set to a nonpositive value to hide\nall notifications.\n* \u003cu\u003eNumber of Columns:\u003c/u\u003e\nThe number of bars in the UpSetR histogram plot.\n* \u003cu\u003eDisplayed Components:\u003c/u\u003e\nDisplayed Component 1 denotes the component, after dimensionality reduction,\nthat will be usually shown on the x-axis. Displayed Component 2 denotes the component,\nafter dimensionality reduction, that will usually be shown on the y-axis. For plotly3,\nDisplayed Component 3 denotes the component, after dimensionality reduction,\nthat will be usually shown on the z-axis. Components can equal each other.\n* \u003cu\u003eConsole Output:\u003c/u\u003e A tool used to see the precise inputs being passed to the plotting system.\n\n\u003ca name=\"installing-rstudio\"/\u003e\n\n## Installing RStudio\n\nTo perform dimensionality reduction, we recommend RStudio \u003e= 1.3.0.\n\nYou can download an installer for RStudio from https://rstudio.com/products/rstudio/download.  \nFor reproducibility, the following settings were used in development:  \n\n* Run the installer as an administrator. Keep the default installation location.  \n* Do not create start menu shortcuts. Do not allow automated crash reporting.  \n\n\u003ca name=\"installing-rtools\"/\u003e\n\n## Installing Rtools\n\nTo perform dimensionality reduction, we recommend Rtools \u003e= Rtools40.\n\nYou can download an installer for Rtools40 from https://cran.r-project.org/bin/windows/Rtools.  \nFor reproducibility, the following settings were used in development:  \n\n* Run the installer as an administrator. Keep the default installation location.  \n* Save version history to registry and don't create start menu icons.  \n\nTo add Rtools to your PATH, add the following code to your .Renviron file:\n\n```bash\nPATH=\"${RTOOLS40_HOME}\\usr\\bin;${PATH}\"\n```\n\nRestart R and run the following code to test functionality:\n\n```R\nSys.which(\"make\")\n```\n\n\u003ca name=\"installing-anaconda\"/\u003e\n\n## Installing Anaconda\n\nTo perform dimensionality reduction, a specialized environment in Anaconda (a Python package manager) is necessary. To install Anaconda, download an installer from https://anaconda.com, ensure RStudio is closed, and perform installation in a PATH without spaces, such as \"C:/Anaconda\".\n\nOnce Anaconda is installed, ensure that no existing environments are named \"r-reticulate\". You can do so through the following commands:\n\n```bash\nconda info --envs\nconda env remove --name r-reticulate\n```\n\nThen set up r-reticulate in the Anaconda Command Prompt:  \n\n```bash\nconda create --name r-reticulate\nconda activate r-reticulate\nconda install tensorflow\npip install phate\n```\n\n\u003ca name=\"performing-reduction\"/\u003e\n\n## Performing Reduction\n\nFor each application that you intend to create, you should have a single corresponding dimensionality reduction workflow. To set up these workflows, please perform the following steps in RStudio:\n\n* Navigate to \"File\" -\u003e \"New Project...\" -\u003e \"Version Control\" -\u003e \"Git\"\n* Set the URL to https://github.com/gersteinlab/shiny-dim-reduction.git\n* Name the Project Directory and select the parent directory.\n* Press \"Create Project\" and wait for the project to open.\n* To install necessary packages, open \"install.R\", and source the file. \n\nDuring installation, the following warning(s) can be safely ignored:  \n```\nYour CPU supports instructions that this TensorFlow binary was not compiled to use ...\n```\n\nOnce installation completes, you will need to designate a folder for storing your workflows, which we will call the workflows folder. The desired folder should be initially empty. To set the empty folder as your workflows folder, you will need to open \"workflows.R\" and source the file. You will be prompted to enter a directory path and should enter the path of your empty folder. After doing so, we strongly recommend against modifying the folder through methods not listed below. You will subsequently be given an interactive prompt to change your workflows folder, open an existing workflow, or create a new workflow. If you move the workflows folder to a new location, you will need to change your workflows folder in the prompt so that it can be found by the project.\n\n\u003ca name=\"aws-integration\"/\u003e\n\n## AWS Integration\n\nTo create applications for visualizing workflow results, select a supported storage method:\n\n* \u003cb\u003eLocal Storage:\u003c/b\u003e Store all generated data in a folder on a file system, usually named \"reference\". This is useful for portable executables, but may not be hostable through Shiny if the folder size is too large. \n* \u003cb\u003eAWS.S3:\u003c/b\u003e Store all generated data in a bucket on AWS.S3. This substantially decreases app bundle size, but requires more setup. Although AWS offers a free plan, the onus is on the user to ensure that their AWS usage does not exceed their budget.\n\nTo upload data to AWS.S3, create an AWS.IAM account with full AWS.S3 permissions. The user should use the functions in \"storage.R\" to save a master key for that account in the project folder. This master account will upload data and its credentials should not be distributed with the generated app.\n\nGenerated apps should each be distributed with an AWS key (id, secret, S3 bucket) that link to an account with limited permissions. These permissions generally ought to include list / get / put, but the overall budget and permissions are in the hands of the developer. Please see example_aws_json.txt for an example policy.\n\n\u003ca name=\"running-tests\"/\u003e\n\n## Running Tests\n\nThe tests folder contains scripts for testing various components of the project. Please feel free to share ideas for increasing test coverage.\n\nWe suggest running the tests in the following order:\n\n1. test_install.R\n2. test_plotting.R\n4. test_find_replace.R\n5. test_text_work.R\n6. test_ui_functions.R\n7. test_storage.R\n8. test_preprocess.R\n9. test_make_requests.R\n10. test_workflows.R\n11. test_sca_nor_fun.R\n12. test_validation.R\n13. test_red_methods.R\n14. test_red_requests.R\n15. test_converter.R\n\n\u003ca name=\"contributors\"/\u003e\n\n## Contributors\n\nWe thank the following contributors:\n\n* Suchen Zheng (Yale)\n* Ran Meng (Yale)\n* Emily LaPlante (Baylor)\n* David Chen (Baylor)\n* Roger Alexander (PNDRI)\n* Matthew Roth (Baylor)\n* Aleks Milosavljevic (Baylor)\n* Abhinav Godavarthi (Yale)\n* Ana Berthel (Yale)\n* Max Sun (Yale)\n* Smita Krishnaswamy (Yale)\n* Rob Kitchen (Harvard)\n\nAdditionally, the following R packages were used:\n\n* **Rtsne:** Jesse H. Krijthe (2015). Rtsne: T-Distributed Stochastic Neighbor\nEmbedding using a Barnes-Hut Implementation, URL:\n[https://github.com/jkrijthe/Rtsne](https://github.com/jkrijthe/Rtsne)\n*  **Keras:** Keras, (2018), GitHub repository,\n[https://github.com/keras-team/keras](https://github.com/keras-team/keras)\n*  **UMAP:** McInnes et al., (2018). UMAP: Uniform Manifold Approximation and\nProjection. Journal of Open Source Software, 3(29), 861,\n[https://doi.org/10.21105/joss.00861](https://doi.org/10.21105/joss.00861)\n* **PHATE:** Moon, K.R., van Dijk, D., Wang, Z. et al.\nVisualizing structure and transitions in high-dimensional biological data.\nNat Biotechnol 37, 1482-1492 (2019).\n[https://doi.org/10.1038/s41587-019-0336-3](https://doi.org/10.1038/s41587-019-0336-3)\n* **UpSetR:** Jake R Conway, Alexander Lex, Nils Gehlenborg UpSetR: An R Package\nfor the Visualization of Intersecting Sets and their Properties doi:\n[https://doi.org/10.1093/bioinformatics/btx364](https://doi.org/10.1093/bioinformatics/btx364)\n* **heatmaply:** Galili, Tal, O'Callaghan, Alan, Sidi, Jonathan, Sievert,\nCarson (2017). \\\"heatmaply: an R package for creating interactive cluster heatmaps\nfor online publishing.\\\" Bioinformatics. doi:\n[http://dx.doi.org/10.1093/bioinformatics/btx657](http://dx.doi.org/10.1093/bioinformatics/btx657)\n* **Cividis:** Nuñez, Jamie R., Christopher R. Anderton, and Ryan S. Renslow.\n\"Optimizing colormaps with consideration for color vision deficiency to enable\naccurate interpretation of scientific data.\" PloS one 13.7 (2018): e0199239.\n\nFurther Reading:\n\n* **Optimizing tSNE:** Wattenberg, et al., \\\"How to Use t-SNE Effectively\\\",\nDistill, 2016. [http://doi.org/10.23915/distill.00002](http://doi.org/10.23915/distill.00002)\n* **James Diao's ERCC Plotting Tool:**\n[https://github.com/jamesdiao/ERCC-Plotting-Tool](https://github.com/jamesdiao/ERCC-Plotting-Tool)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgersteinlab%2Fshiny-dim-reduction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgersteinlab%2Fshiny-dim-reduction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgersteinlab%2Fshiny-dim-reduction/lists"}