https://github.com/usccana/socnet

Web Scraping The Social Networks (SOCNET) Listserv
https://github.com/usccana/socnet

rpackage rstats sna social-network-analysis webscraping

Last synced: 8 months ago
JSON representation

Web Scraping The Social Networks (SOCNET) Listserv

Host: GitHub
URL: https://github.com/usccana/socnet
Owner: USCCANA
License: other
Created: 2017-11-23T00:17:33.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2022-08-16T17:14:15.000Z (almost 4 years ago)
Last Synced: 2025-04-06T20:25:40.835Z (about 1 year ago)
Topics: rpackage, rstats, sna, social-network-analysis, webscraping
Language: R
Homepage:
Size: 3.02 MB
Stars: 4
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: ChangeLog
- License: LICENSE

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r setup, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

[![Travis build status](https://travis-ci.org/USCCANA/socnet.svg?branch=master)](https://travis-ci.org/USCCANA/socnet)

 [![Coverage status](https://codecov.io/gh/USCCANA/socnet/branch/master/graph/badge.svg)](https://codecov.io/github/USCCANA/socnet?branch=master)

# socnet

This R package is created to access the data available in the

SOCNET website https://lists.ufl.edu/cgi-bin/wa?A0=SOCNET, which is hosted

by The University of Florida in its Listserv website.

## Installation

This package is currently under develoment and is only available by downloading the bleeding edge version. You can use `devtools` to get it:

```r

devtools::install_github("USCCANA/socnet")

```

## Example

Before starts, let's first load the package.

```{r}

library(socnet)

```

Suppose that you want to look at the SOCNET archives, but you don't know from where to start. You can use the function `socnet_list_archives` to get a list of the archives that are available in the Listserv. 

```{r example-archives}

# Getting the URLs to the archives per month

archives <- socnet_list_archives(cached = TRUE)

head(archives)

```

Now that we have the list of archives, we can access one of them and list what are the subjects (emails) that show under that archive with the `socnet_list_subjects` function.

```{r example-subjects}

# What was discussed during Oct 17?: Getting the subjects during that time

subjects <- socnet_list_subjects(archives$url[1], cached = TRUE)

```

Let's take a look at the output

```{r}

str(subjects)

head(subjects[,-1])

```

Now, we can use the function `socnet_parse_subject` to actually get the data of a particular subject. Let's try with the subject titled ``r subjects$subject[1]``

```{r example-fetch-subject}

socnet_parse_subject(subjects$url[1])

```

As you can see, the function returned a list with two elements, a vector of meta information, and the actual email.

# Most active user (compose side)

```{r}

rankfun <- function(x, colnames, maxn = 100) {

  x <- as.data.frame(table(x))

  x <- x[order(-x$Freq),]

  dimnames(x) <- list(1:nrow(x), colnames)

  knitr::kable(x[1:maxn,], row.names = TRUE)  

}

# Getting the from column and removing weird characters

data("subjects")

from <- subjects$from

from <- iconv(from, to="ASCII//TRANSLIT")

# Removing <[log in to unmask]> message

from <- tolower(gsub("[<].+", "", from))

# Fixing some names...

regexp <- "Th?om(as)?( W)?\\.? Valente"

from[grepl(regexp, from, ignore.case = TRUE)] <- "Thomas W. Valente"

regexp <- "Valdis( Krebs)?"

from[grepl(regexp, from, ignore.case = TRUE)] <- "Valdis Krebs"

regexp <- "Steve Borgatti|Borgatti, Steve"

from[grepl(regexp, from, ignore.case = TRUE)] <- "Steve Borgatti"

regexp <- "Snijders, T\\.A\\.B\\.|Tom A\\.B\\. Snijders|T\\.A\\.B\\.Snijders"

from[grepl(regexp, from, ignore.case = TRUE)] <- "Tom Snijders"

regexp <- "Kathleen( M\\.)? Carley"

from[grepl(regexp, from, ignore.case = TRUE)] <- "Kathleen M. Carley"

# Capitalizing the first letter

# I learned (copied) this from stackoverflow!

# https://stackoverflow.com/questions/6364783/capitalize-the-first-letter-of-both-words-in-a-two-word-string

# from <- gsub("(^|[[:space:]])([[:alpha:]])", "\\1\\U\\2", from, perl=TRUE)

from <- stringr::str_to_title(from)

# Creating the table

rankfun(from, colnames=c("User", "Count"))

```

# Latest version of the cache data

```{r, results='asis', echo=FALSE}

readLines("inst/cache/readme.md", warn = FALSE)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/usccana/socnet

Awesome Lists containing this project

README