https://github.com/centerforassessment/ncme_training_session_2015
Documentation and Materials for 2015 NCME Training Session
https://github.com/centerforassessment/ncme_training_session_2015
Last synced: about 1 year ago
JSON representation
Documentation and Materials for 2015 NCME Training Session
- Host: GitHub
- URL: https://github.com/centerforassessment/ncme_training_session_2015
- Owner: CenterForAssessment
- License: lgpl-3.0
- Created: 2015-03-27T19:49:45.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2015-04-15T16:54:46.000Z (about 11 years ago)
- Last Synced: 2025-01-29T00:52:50.547Z (over 1 year ago)
- Homepage: http://centerforassessment.github.io/NCME_Training_Session_2015/
- Size: 7.21 MB
- Stars: 2
- Watchers: 4
- Forks: 9
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
2015 NCME Training Session
==========================
## Leveraging Open Source Software and Tools for Statistics/Measurement Research
[](https://gitter.im/CenterForAssessment/NCME_Training_Session_2015?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [](https://github.com/CenterForAssessment/NCME_Training_Session_2015/blob/master/LICENSE.md)
Welcome to the GitHub repository for the 2015 NCME Training Session: Leveraging Open Source Software and Tools for Statistics/Measurement Research.
This version controlled repository contains or links to all the resources associated with the training session. During this training session you'll be introduced
to ideas and technologies associated with [open science](http://en.wikipedia.org/wiki/Open_science) and [reproducible research](http://en.wikipedia.org/wiki/Reproducibility).
Traditional work-flows associated with statistics/measurement research currently runs counter to many of the principles associated with open science and reproducible research.
In this training session, we introduce participants to some widely used [open source](http://en.wikipedia.org/wiki/Open_source) tools that overcome many of the limitations of
the closed traditional statistics/measurement research work-flow. Modern development tools and practices can be utilized as part of statistics/measurement
As the title of the title of the training session suggests, participants will need a laptop computer with some open source programs installed to participate in the
training session. The information that follows introduces the software/tools and provides information on how to install The programs/tools we'll
use include:
* [:octocat: GitHub](https://github.com/), a web-based [Git](http://en.wikipedia.org/wiki/Git_(software)) repository hosting service.
* [RStudio](http://www.rstudio.com/) a free and open source [integrated development environment (IDE)](http://en.wikipedia.org/wiki/Integrated_development_environment)
for [**R**](http://cran.r-project.org/).
* [pandoc](http://johnmacfarlane.net/pandoc/), a universal document/format converter that utilizes [LaTeX](http://johnmacfarlane.net/pandoc/installing.html).
One of the most difficult parts of using modern software/tools for this type of research is just getting your work environment (i.e., your laptop) set up with the tools. But once
set up, the benefits are huge. Welcome to the bleeding edge. :smile:
### GitHub
Version control of content is fundamental to reproducible results. Version control has been used by software developers for decades as a way of collaborating and managing the chaos that
ensues when multiple programmers are developing using the same software codebase. [Git](http://en.wikipedia.org/wiki/Git_(software)) is a modern distributed version control system created by
[Linus Torvalds](http://en.wikipedia.org/wiki/Linus_Torvalds), the creator of the [Linux](http://en.wikipedia.org/wiki/Linux) operating system. [GitHub](https://github.com/) is a
web-based version control system based upon [Git](http://en.wikipedia.org/wiki/Git_(software)) with many other bells and whistles that is extremely popular for open source development.
[Netflix](http://netflix.github.io/#repo) uses [GitHub](https://github.com/) for its source code development. Beyond source code development, the whole idea of “version control” has been
implemented with [German law](http://bundestag.github.io/gesetze/) where all laws [GitHub](https://github.com/) in a version controlled fashion so people can examine the law and its development.
For this training session, we'll be using [GitHub](https://github.com/) for version control. You'll need to do two things:
* Create a [GitHub](https://github.com/) account. Public repositories are free and
* Install a [GitHub](https://github.com/) client on your computer. [Mac](https://mac.github.com/), [Windows](https://windows.github.com/), or [Linux](http://www.maketecheasier.com/6-useful-graphical-git-client-for-linux/). Git is a command line application, but it's easier to start with a GUI client.
### RStudio
[**R**](http://cran.r-project.org/) is an open source statistical analysis/computing environment. [**R**](http://cran.r-project.org/)'s rapid growth in use places it among the most [popular
statistical analysis software](http://r4stats.com/articles/popularity/). Much of [**R**](http://cran.r-project.org/)'s popularity stems from it being open source (and free!) together with
its extensible nature. As a programming language, [**R**](http://cran.r-project.org/) has become prominent in recent years as a tool performing high level analytics. As a programming language,
[**R**](http://cran.r-project.org/) has begun borrowing many tools from the programming world. One of the most prominent among them is [RStudio](http://www.rstudio.com/) a
free and open source [integrated development environment (IDE)](http://en.wikipedia.org/wiki/Integrated_development_environment). An integrated development environment is a tool designed
specifically for developing tools using a computer language. In the case of [RStudio](http://www.rstudio.com/), the environment is specifically designed for developing "data products" using
[**R**](http://cran.r-project.org/).
For this training session, we'll be using [RStudio](http://www.rstudio.com/) for creating open and re-usable code for statistical analysis; You'll need to:
* Install [RStudio](http://www.rstudio.com/) on your laptop.
### pandoc
[pandoc](http://johnmacfarlane.net/pandoc/) is a universal document/format converter that utilizes [LaTeX](http://johnmacfarlane.net/pandoc/installing.html). Open research in the 21st
century requires distribution of results across multiple media and platforms. "Create once and distribute everywhere" is the dream of content creators. To make this dream a reality requires
conversion from a base content document to multiple other formats. The [pandoc](http://johnmacfarlane.net/pandoc/) library allows for such conversions and is used extensively.
For this training session, we'll be using [pandoc](http://johnmacfarlane.net/pandoc/) for export to multiple formats. [pandoc](http://johnmacfarlane.net/pandoc/) itself requires
[LaTeX](http://johnmacfarlane.net/pandoc/installing.html) for some conversions. You'll need to:
* Install [pandoc](http://johnmacfarlane.net/pandoc/) for your operating system.
* Using the instructions provided on the [pandoc](http://johnmacfarlane.net/pandoc/) page, install [LaTeX](http://johnmacfarlane.net/pandoc/installing.html) for your operating system.
## Training Session Schedule
**8:00 to 9:15** Overview of open research/dissemination and the tools/platforms that support it. First hour will be a presentation/overview and the last 15 minutes we’ll
confirm everyone is set up with the appropriate software on their computers. Advanced instructions will be sent out regarding software needed for the training session.
**9:15 to 9:30** Break
**9:30 to 10:30** Introduction to [GitHub](https://github.com/) and version control. This part of the training session will introduce users to the version control via GitHub. Users will learn how to
fork a repository, modify that fork, make and accept pull requests.
**10:30 to 12:00** An application of [GitHub](https://github.com/) to document production. Document production and dissemination in the 21st century requires that the document be available to users
in multiple formats. Users access documents using multiple media (e.g., paper and digitally) and on multiple devices (e.g., laptop, phone, e-reader) so that modern document production requires
flexibility to create content once and distribute it everywhere. This part of the training session will introduce users to templates that allow users to realize the "write once, publish everywhere"
goal.
**12:00 to 1:00** Lunch
**1:00 to 2:00** Introduction to [GitHub Wikis](https://help.github.com/articles/about-github-wikis/) and [GitHub Pages](https://pages.github.com/). [GitHub](https://github.com/) includes
two great options for helping users understand the nature of the project you're hosting in your repository. [GitHub Wikis](https://help.github.com/articles/about-github-wikis/) is a
[markdown](https://help.github.com/categories/writing-on-github/) based wiki that allows for collaborative wiki building around a project. [GitHub Pages](https://pages.github.com/) is a
the static HTML website part of all GitHub repositories. [GitHub Pages](https://pages.github.com/) allows projects to produce websites associated with projects that can include highly
sophisticated components including interactive graphics, blogs, and vector math fonts. With a basic distributed version control repository as its foundation.
**2:00 to 3:00** Introduction to [**R**](http://cran.r-project.org/) and its system of user created packages. [**R**](http://cran.r-project.org/) is an open source environment for statistical
analysis/computing whose use has skyrocketed over the last decade. A major part of its success is its extensible nature: Users can create their own packages that extend the functionality of
[**R**](http://cran.r-project.org/). These packages can be distributed via [**R**](http://cran.r-project.org/)'s [CRAN]([**R**](http://cran.r-project.org/) based package installation
or utilize [GitHub](https://github.com/) as the package repository. One is *standing on the shoulder of giants*!
**3:00 to 3:15** Break
**3:15 to 4:30** Introduction to creating a source code repository and package on [GitHub](https://github.com/). Reproducible and open research allows those using the research to reproduce the
results. Version control repositories allow one to provide the source code used and, if using software like [**R**](http://cran.r-project.org/), package that source code in a fashion allowing
others to utilize it.
**4:30 to 5:00** For the final half hour, we will discuss how open source software tools can be combined to create an open source project that can be used as the basis for multi-state/national
analytics, reporting and visualization work. The presentation will be based upon the SGP package which is used by over 25 states for large scale state growth analyses. As part of the project,
state project websites are set up on GitHub for hosting analyses and documentation associated with work performed so that it is reproducible using cloud based computing resources like AWS/EC2.