Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rudeboybert/jsm2018_ecosystem

JSM 2018 "An Emerging Ecosystem for Data Science/Statistics Education"
https://github.com/rudeboybert/jsm2018_ecosystem

Last synced: 13 days ago
JSON representation

JSM 2018 "An Emerging Ecosystem for Data Science/Statistics Education"

Awesome Lists containing this project

README

        

---
title: "An Emerging Ecosystem for Data Science/Statistics Education"
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

* **What**: Slides and speaker information for JSM 2018 Vancouver Session 212 [An Emerging Ecosystem for Data Science/Statistics Education](https://ww2.amstat.org/meetings/jsm/2018/onlineprogram/ActivityDetails.cfm?sessionid=214992)
* **When**: Monday July 30th 2018, 2:00 PM-3:50 PM
* **Where**: Vancouver Convention Centre [CC-West 109](https://www.vancouverconventioncentre.com/facility/floor-plans-and-specs)

The shortened URL [bit.ly/JSM_2018](https://bit.ly/JSM_2018) redirects here.

### 2:05 PM Version Control: The Gain You Get for Your Pain — Jennifer Bryan, RStudio, University of British Columbia

(Slides: [PDF](Bryan/bryan-jsm-version-control.pdf)) Version control is a system for managing the evolution of a set of files across different people, computers, and time. Its roots are in software development, but it is increasingly important in both the practice and teaching of data science. I'll give an accessible description of what version control is and what it feels like to use it. We'll compare and contrast this to alternatives such as collaboration via Google Drive. Version control is important for educators for at least two reasons. First, it facilitates the exchange of code-rich documents between instructor and student. Second, it is a valid learning objective in and of itself, since version control is widely used by potential employers. I'll provide general information, as well as specifics relevant to the statistical programming environment R, the RStudio IDE, Git, and the GitHub hosting service.

+ Jennifer Bryan - Software Engineer at RStudio, on leave from the University of British Columbia
+ Email:
+ Webpage:
+ GitHub: [jennybc](https://github.com/jennybc)
+ Twitter: [\@JennyBryan](https://twitter.com/JennyBryan)
+ **Suggested simple action item to get started**: Take a simple but real project and try to put it on GitHub; just dip your toe in the water

### 2:25 PM Using Data to Drive Curriculum Development — Chester Ismay, DataCamp

(Slides: [PDF](Ismay/ismay_data-driven-curriculum-development.pdf), [Power Point](Ismay/ismay_data-driven-curriculum-development.pptx)) Data science is a relatively new and fast-changing field that borrows from many disciplines, which makes it hard to define and even harder to teach. Having now trained over 1.5 million (aspiring) data scientists, we at DataCamp aim to equip our students with the tools and skills needed to operate effectively in the data science field. For this reason, we do as any good data scientist would do and use data to determine what our students need to know. By combining internal data sources-student surveys, course engagement data, website search trends, etc.-with external data sources-job posts, Stack Overflow trends, package downloads, etc.-we're able to inform curricular decisions that not only reflect what our students want to learn, but also reflect skill sets that are in demand for the 21st century data scientist. DataCamp is among the first to focus on using large amounts of data and corresponding analyses to help drive data science curriculum.

+ Chester Ismay - Senior Curriculum Lead at DataCamp (headquarters in New York City, based in Portland, OR)
+ Email:
+ Webpage:
+ GitHub: [ismayc](https://github.com/ismayc)
+ Twitter: [\@old_man_chester](https://twitter.com/old_man_chester)
+ **Suggested simple action item to get started**: The best way to learn GitHub is to go fix a typo. You can even do this in the browser.

### 2:45 PM Authoring and Utilizing Open Source, Reproducible Statistics/Data Science Textbooks — Alicia Johnson, Macalester College

(Slides: [PDF](Johnson/Alicia Johnson JSM 2018 .pdf), [Power Point](Johnson/Alicia Johnson JSM 2018 .pptx)) Open source, open access, and reproducible materials are essential to the modern practice of statistics and data science. Combined, these provide (free!) code, data, and software that users can directly implement, modify, and use to verify findings. In this talk we will discuss how the statistics / data science textbook publication model can align with and utilize these best practices. The release of the RStudio Bookdown package facilitates this process. Through Bookdown, authors can publish reproducible and fully customizable textbooks that integrate markdown, code, visualizations, interactivity, and text. We will present first steps for authors of and users looking to adopt open source and open access textbooks, as well as existing examples ranging from an introductory textbook (ModernDive) to an advanced undergraduate textbook on Bayesian statistics.

+ Alicia Johnson - Associate Professor of Statistics at Macalester College
+ Email:
+ Webpage:
+ GitHub: [ajohns24](https://github.com/ajohns24)
+ **Suggested simple action item to get started**: Pick one project that you're writing right now and convert it to `bookdown`.

### 3:05 PM Aligning Inference with the Tidyverse: Development of the Infer Package — Andrew Bray, Reed College

(Slides: [PDF](Bray/infer-jsm-2018.pdf), [Keynote](Bray/infer-jsm-2018.key)) How do you teach your students to implement a permutation test? What about an analysis of variance? Do you focus on approximation techniques or utilize computational methods like the bootstrap? The infer package was created to unite these and other common statistical inference tasks into a single expressive framework that emphasizes their shared concepts. This talk will focus on the design principles of the package, which are firmly motivated by Hadley Wickham's tidy tools manifesto. It will also discuss the implementation, centered on the common conceptual threads that link a wide range of hypothesis tests and confidence intervals. Finally, it will serve as a case study for anyone curious about developing an R package.

+ Andrew Bray - Assistant Professor of Statistics at Reed College
+ Email:
+ Webpage:
+ GitHub: [andrewpbray](https://github.com/andrewpbray)
+ **Suggested simple action item to get started**: If you're comfortable with R and may be thinking about package development, specifically contributing to the `tidyverse`, watch Mara Averick's rstudio::conf talk [Contributing to tidyverse packages](https://www.rstudio.com/resources/videos/contributing-to-tidyverse-packages/).

### 3:25 PM Streamline Your Class with RStudio — Garrett Grolemund, RStudio Inc.

(Slides: [PDF](Grolemund/JSM-2018.pdf)) RStudio's free tools provide more than an ecosystem for doing data science; they provide an ecosystem for _teaching data science_. RStudio Cloud provides an easy to access compute environment that your students can use from minute one -- no downloads or installation required. You can use RStudio Cloud to create a shared class space stocked with projects to fork and data sets to use. Packed into RStudio Cloud is a curriculum of self-paced interactive tutorials that cover the basics of the Tidyverse, which frees you up to focus on teaching how to apply R to your subject area (the tutorials cover the same ground as _R for Data Science_ and are written by its authors). You can create your own interactive web tutorials with the learnr package, which transforms R Markdown documents into shiny apps that let students write, run, and submit their own code for feedback. Supporting all of this is the Tidyverse, a clear, coherent system for doing data science in R that is easy to teach and easy to use. This talk will provide a quickstart to using these tools as a foundation for your own data science course.

+ Garrett Grolemund - Data Scientist at RStudio
+ Email:
+ GitHub: [garrettgman](https://github.com/garrettgman)
+ Twitter: [\@statgarrett](https://twitter.com/statgarrett)
+ **Suggested simple action item to get started**: Go to , login, and read the guide.

### 3:45 PM Floor Discussion - Albert Y. Kim, Smith College

+ Albert Y. Kim - Assistant Professor of Statistical & Data Sciences at Smith College, Northampton MA
+ Email:
+ Webpage:
+ GitHub: [rudeboybert](https://github.com/rudeboybert)
+ Twitter: [\@rudeboybert](https://twitter.com/rudeboybert)