{"id":17177919,"url":"https://github.com/robnewman/data-studios-datasources","last_synced_at":"2026-01-06T05:01:37.904Z","repository":{"id":248105581,"uuid":"763719684","full_name":"robnewman/data-studios-datasources","owner":"robnewman","description":null,"archived":false,"fork":false,"pushed_at":"2024-07-12T09:13:58.000Z","size":15,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T02:12:31.440Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robnewman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-26T19:53:06.000Z","updated_at":"2024-07-12T09:56:12.000Z","dependencies_parsed_at":"2024-07-12T12:27:27.310Z","dependency_job_id":null,"html_url":"https://github.com/robnewman/data-studios-datasources","commit_stats":null,"previous_names":["robnewman/data-studios-datasources"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robnewman%2Fdata-studios-datasources","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robnewman%2Fdata-studios-datasources/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robnewman%2Fdata-studios-datasources/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robnewman%2Fdata-studios-datasources/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robnewman","download_url":"https://codeload.github.com/robnewman/data-studios-datasources/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245359255,"owners_count":20602322,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T00:05:28.746Z","updated_at":"2026-01-06T05:01:37.898Z","avatar_url":"https://github.com/robnewman.png","language":null,"readme":"# data-studios-datasources\n\n## RStudio\n\n### Example 1 - RNASeq\n\nNote that the CSV file paths may be different for you.\n\n```R\n# Adapted from source: https://combine-australia.github.io/RNAseq-R/09-applying-rnaseq-solutions.html\n\ninstall.packages(\"BiocManager\")\nBiocManager::install(c(\"limma\"))\nBiocManager::install(c(\"edgeR\"))\nBiocManager::install(c(\"org.Dm.eg.db\"))\nBiocManager::install(c(\"gplots\"))\nBiocManager::install(c(\"RColorBrewer\"))\nlibrary(limma)\nlibrary(edgeR)\nlibrary(gplots)\nlibrary(RColorBrewer)\nlibrary(org.Dm.eg.db)\n\ncounts \u003c- read.delim(file=\"/workspace/data/datastudios-demo-rstudio/input/2024-05-13/counts_Drosophila.txt\")\ntargets \u003c- read.delim(file=\"/workspace/data/datastudios-demo-rstudio/input/2024-05-13/SampleInfo_Drosophila.txt\")\nhead(counts)\ntargets\ntable(targets$Group)\nmycpm \u003c- cpm(counts)\nplot(counts[,1],mycpm[,1],xlim=c(0,20),ylim=c(0,5))\nabline(v=10,col=2)\nabline(h=2,col=4)\nthresh \u003c- mycpm \u003e 2\nkeep \u003c- rowSums(thresh) \u003e= 3\ntable(keep)\ncounts.keep \u003c- counts[keep,]\ndim(counts.keep)\ny \u003c- DGEList(counts.keep)\nbarplot(y$samples$lib.size)\n\npar(mfrow=c(1,1))\n# Get log2 counts per million\nlogcpm \u003c- cpm(y$counts,log=TRUE)\n# Check distributions of samples using boxplots\nboxplot(logcpm, xlab=\"\", ylab=\"Log2 counts per million\",las=2,outline=FALSE)\n# Let's add a blue horizontal line that corresponds to the median logCPM\nabline(h=median(logcpm),col=\"blue\")\ntitle(\"Boxplots of logCPMs (unnormalised)\")\n\npar(mfrow=c(1,2),oma=c(2,0,0,0))\ngroup.col \u003c- c(\"red\",\"blue\")[targets$Group]\nboxplot(logcpm, xlab=\"\", ylab=\"Log2 counts per million\",las=2,col=group.col,\n        pars=list(cex.lab=0.8,cex.axis=0.8))\nabline(h=median(logcpm),col=\"blue\")\ntitle(\"Boxplots of logCPMs\\n(coloured by groups)\",cex.main=0.8)\n\nlib.col \u003c- c(\"light pink\",\"light green\")[targets$Library]\nboxplot(logcpm, xlab=\"\", ylab=\"Log2 counts per million\",las=2, col=lib.col,\n        pars=list(cex.lab=0.8,cex.axis=0.8))\nabline(h=median(logcpm),col=\"blue\")\ntitle(\"Boxplots of logCPMs\\n(coloured by library prep)\",cex.main=0.8)\n\npar(mfrow=c(1,2))\nplotMDS(y,col=group.col)\nlegend(\"topright\",legend=levels(targets$Group),fill=c(\"red\",\"blue\"))\nplotMDS(y,col=lib.col)\nlegend(\"topleft\",legend=levels(targets$Library),fill=c(\"light pink\",\"light green\"))\n\nlogcounts \u003c- cpm(y,log=TRUE)\nvar_genes \u003c- apply(logcounts, 1, var)\nselect_var \u003c- names(sort(var_genes, decreasing=TRUE))[1:500]\n\nhighly_variable_lcpm \u003c- logcounts[select_var,]\ndim(highly_variable_lcpm)\nmypalette \u003c- brewer.pal(11,\"RdYlBu\")\nmorecols \u003c- colorRampPalette(mypalette)\nheatmap.2(highly_variable_lcpm,col=rev(morecols(50)),trace=\"none\", main=\"Top 500 most variable genes across samples\",ColSideColors=group.col,scale=\"row\",margins=c(10,5))\n```\n\n### Example 2 - Shiny\n\n```R\ninstall.packages(\"dplyr\")\ninstall.packages(\"gapminder\")\ninstall.packages(\"ggplot2\")\ninstall.packages(\"shiny\")\n\nlibrary(shiny)\nlibrary(dplyr)\nlibrary(ggplot2)\nlibrary(gapminder)\n\n# Specify the application port\noptions(shiny.host = \"0.0.0.0\")\noptions(shiny.port = 8180)\n\nui \u003c- fluidPage(\n  sidebarLayout(\n    sidebarPanel(\n      tags$h4(\"Gapminder Dashboard\"),\n      tags$hr(),\n      selectInput(inputId = \"inContinent\", label = \"Continent\", choices = unique(gapminder$continent), selected = \"Europe\")\n    ),\n    mainPanel(\n      plotOutput(outputId = \"outChartLifeExp\"),\n      plotOutput(outputId = \"outChartGDP\")\n    )\n  )\n)\n\nserver \u003c- function(input, output, session) {\n  # Filter data and store as reactive value\n  data \u003c- reactive({\n    gapminder %\u003e%\n      filter(continent == input$inContinent) %\u003e%\n      group_by(year) %\u003e%\n      summarise(\n        AvgLifeExp = round(mean(lifeExp)),\n        AvgGdpPercap = round(mean(gdpPercap), digits = 2)\n      )\n  })\n\n  # Common properties for charts\n  chart_theme \u003c- ggplot2::theme(\n    plot.title = element_text(hjust = 0.5, size = 20, face = \"bold\"),\n    axis.title.x = element_text(size = 15),\n    axis.title.y = element_text(size = 15),\n    axis.text.x = element_text(size = 12),\n    axis.text.y = element_text(size = 12)\n  )\n\n  # Render Life Exp chart\n  output$outChartLifeExp \u003c- renderPlot({\n    ggplot(data(), aes(x = year, y = AvgLifeExp)) +\n      geom_col(fill = \"#0099f9\") +\n      geom_text(aes(label = AvgLifeExp), vjust = 2, size = 6, color = \"#ffffff\") +\n      labs(title = paste(\"Average life expectancy in\", input$inContinent)) +\n      theme_classic() +\n      chart_theme\n  })\n\n  # Render GDP chart\n  output$outChartGDP \u003c- renderPlot({\n    ggplot(data(), aes(x = year, y = AvgGdpPercap)) +\n      geom_line(color = \"#f96000\", size = 2) +\n      geom_point(color = \"#f96000\", size = 5) +\n      geom_label(\n        aes(label = AvgGdpPercap),\n        nudge_x = 0.25,\n        nudge_y = 0.25\n      ) +\n      labs(title = paste(\"Average GDP per capita in\", input$inContinent)) +\n      theme_classic() +\n      chart_theme\n  })\n}\n\nshinyApp(ui = ui, server = server)\n```\n\n### Example 3 - Volcano Plot using Shiny\n\nNote that the CSV file path may be different for you.\n\n```R\ninstall.packages(\"shiny\")\ninstall.packages(\"plotly\")\ninstall.packages(\"tidyverse\")\n\nlibrary(shiny)\nlibrary(plotly)\nlibrary(tidyverse)\n\nui \u003c- fluidPage(\n  titlePanel(\"Volcano Plotly\"),\n  fluidRow(\n    column(\n      width = 7,\n      plotlyOutput(\"volcanoPlot\", height = \"500px\")\n    ),\n    column(\n      width = 5,\n      dataTableOutput(\"selectedProbesTable\")\n    )\n  )\n)\n\ncsv_file = \"/workspace/data/datastudios-demo-rstudio/input/2024-05-20_volcano-examples/NKI-DE-results.csv\"\n\nserver \u003c- function(input, output) {\n\n  differentialExpressionResults \u003c-\n    read.csv(csv_file, stringsAsFactors = FALSE) %\u003e%\n    mutate(\n      probe.type = factor(ifelse(grepl(\"^Contig\", probe), \"EST\", \"mRNA\")),\n      minusLog10Pvalue = -log10(adj.P.Val),\n      tooltip = ifelse(is.na(HUGO.gene.symbol), probe, paste(HUGO.gene.symbol, \" (\", probe, \")\", sep = \"\"))\n    ) %\u003e%\n    sample_n(1000)\n\n  output$volcanoPlot \u003c- renderPlotly({\n\n    plot \u003c- differentialExpressionResults %\u003e%\n      ggplot(aes(x = logFC,\n                 y = minusLog10Pvalue,\n                 colour = probe.type,\n                 text = tooltip,\n                 key = row.names(differentialExpressionResults))) +\n      geom_point() +\n      xlab(\"log fold change\") +\n      ylab(\"-log10(P-value)\")\n\n    plot %\u003e%\n      ggplotly(tooltip = \"tooltip\") %\u003e%\n      layout(dragmode = \"select\")\n  })\n\n  output$selectedProbesTable \u003c- renderDataTable({\n\n    eventData \u003c- event_data(\"plotly_selected\")\n\n    selectedData \u003c- differentialExpressionResults %\u003e% slice(0)\n    if (!is.null(eventData)) selectedData \u003c- differentialExpressionResults[eventData$key,]\n\n    selectedData %\u003e%\n      transmute(\n        probe,\n        gene = HUGO.gene.symbol,\n        `log fold change` = signif(logFC, digits = 2),\n        `p-value` = signif(adj.P.Val, digits = 2)\n      )\n  },\n    options = list(dom = \"tip\", pageLength = 10, searching = FALSE)\n  )\n}\n\nshinyApp(ui, server, options = list(height = 600))\n```\n\n### Example 4 - Load JSON lib, change working directory on startup, read CSV to data frame, export data frame to JSON file\n\nNote that the CSV file path may be different for you.\n\n```R\ninstall.packages(\"RJSONIO\")\nlibrary(RJSONIO)\nsetwd(\"/workspace/data\")\ndf \u003c- read.csv(\"input/2024-01-16/polling_places.csv\")\nexportJson \u003c- toJSON(df)\nwrite(exportJson, \"output/output.json\")\n```\n\n## JupyterLab\n\n### Example 1 - Install additonal packages\n\n```python\n!pip install pandas[pyarrow] jupytext scipy jupyterlab-git qgrid seaborn nb_black\n```\n\n### Example 2 - Read CSV to data frame, export data frame to JSON file\n\nNote that the CSV file path may be different for you.\n\n```python\nimport pandas as pd\ndf = pd.read_csv('2024-01-16/polling_places.csv', low_memory=False)\ndf\ndf.to_json('output.json')\n```\n\n### Example 3 - Use IGV\n\nAdd the following customization via the Conda packages field\n\n```bash\nchannels:\n  - conda-forge\ndependencies:\n  - ipyigv\n```\n\nThen when the session has launched, add the following Python code.\n\n```python\nfrom ipyigv import IgvBrowser as Browser, PUBLIC_GENOMES\nfrom ipyigv.options import ReferenceGenome, Track\n\ngenome = ReferenceGenome(**PUBLIC_GENOMES.hg38)\n\nbrowser = Browser(genome=genome)\nbrowser\n```\n\nYou will see the following:\n\n\u003cimg width=\"2862\" height=\"998\" alt=\"CleanShot 2025-09-24 at 14 36 41@2x\" src=\"https://github.com/user-attachments/assets/ad3b68b8-43fa-46ea-8b35-ccec0e8ff04a\" /\u003e\n\n## VSCode\n\nTBD\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobnewman%2Fdata-studios-datasources","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobnewman%2Fdata-studios-datasources","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobnewman%2Fdata-studios-datasources/lists"}