An open API service indexing awesome lists of open source software.

https://github.com/usccana/socnet

Web Scraping The Social Networks (SOCNET) Listserv
https://github.com/usccana/socnet

rpackage rstats sna social-network-analysis webscraping

Last synced: 8 months ago
JSON representation

Web Scraping The Social Networks (SOCNET) Listserv

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

[![Travis build status](https://travis-ci.org/USCCANA/socnet.svg?branch=master)](https://travis-ci.org/USCCANA/socnet)
[![Coverage status](https://codecov.io/gh/USCCANA/socnet/branch/master/graph/badge.svg)](https://codecov.io/github/USCCANA/socnet?branch=master)

# socnet

This R package is created to access the data available in the
SOCNET website https://lists.ufl.edu/cgi-bin/wa?A0=SOCNET, which is hosted
by The University of Florida in its Listserv website.

## Installation

This package is currently under develoment and is only available by downloading the bleeding edge version. You can use `devtools` to get it:

```r
devtools::install_github("USCCANA/socnet")
```

## Example

Before starts, let's first load the package.

```{r}
library(socnet)
```

Suppose that you want to look at the SOCNET archives, but you don't know from where to start. You can use the function `socnet_list_archives` to get a list of the archives that are available in the Listserv.

```{r example-archives}
# Getting the URLs to the archives per month
archives <- socnet_list_archives(cached = TRUE)
head(archives)
```

Now that we have the list of archives, we can access one of them and list what are the subjects (emails) that show under that archive with the `socnet_list_subjects` function.

```{r example-subjects}
# What was discussed during Oct 17?: Getting the subjects during that time
subjects <- socnet_list_subjects(archives$url[1], cached = TRUE)
```

Let's take a look at the output

```{r}
str(subjects)
head(subjects[,-1])
```

Now, we can use the function `socnet_parse_subject` to actually get the data of a particular subject. Let's try with the subject titled ``r subjects$subject[1]``

```{r example-fetch-subject}
socnet_parse_subject(subjects$url[1])
```

As you can see, the function returned a list with two elements, a vector of meta information, and the actual email.

# Most active user (compose side)

```{r}
rankfun <- function(x, colnames, maxn = 100) {
x <- as.data.frame(table(x))
x <- x[order(-x$Freq),]
dimnames(x) <- list(1:nrow(x), colnames)
knitr::kable(x[1:maxn,], row.names = TRUE)
}

# Getting the from column and removing weird characters
data("subjects")
from <- subjects$from
from <- iconv(from, to="ASCII//TRANSLIT")

# Removing <[log in to unmask]> message
from <- tolower(gsub("[<].+", "", from))

# Fixing some names...
regexp <- "Th?om(as)?( W)?\\.? Valente"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Thomas W. Valente"

regexp <- "Valdis( Krebs)?"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Valdis Krebs"

regexp <- "Steve Borgatti|Borgatti, Steve"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Steve Borgatti"

regexp <- "Snijders, T\\.A\\.B\\.|Tom A\\.B\\. Snijders|T\\.A\\.B\\.Snijders"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Tom Snijders"

regexp <- "Kathleen( M\\.)? Carley"
from[grepl(regexp, from, ignore.case = TRUE)] <- "Kathleen M. Carley"

# Capitalizing the first letter
# I learned (copied) this from stackoverflow!
# https://stackoverflow.com/questions/6364783/capitalize-the-first-letter-of-both-words-in-a-two-word-string
# from <- gsub("(^|[[:space:]])([[:alpha:]])", "\\1\\U\\2", from, perl=TRUE)
from <- stringr::str_to_title(from)

# Creating the table
rankfun(from, colnames=c("User", "Count"))
```

# Latest version of the cache data

```{r, results='asis', echo=FALSE}
readLines("inst/cache/readme.md", warn = FALSE)
```