{"id":15059014,"url":"https://github.com/vosonlab/vosonsml","last_synced_at":"2025-09-13T16:38:21.565Z","repository":{"id":34221007,"uuid":"138710394","full_name":"vosonlab/vosonSML","owner":"vosonlab","description":"R package for collecting social media data and creating networks for analysis.","archived":false,"fork":false,"pushed_at":"2024-07-21T08:41:21.000Z","size":4394,"stargazers_count":79,"open_issues_count":0,"forks_count":15,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-02-10T02:09:25.137Z","etag":null,"topics":["cran","hyperlink","mastodon","network-graph","r","r-package","reddit","rstats","sna","social-media","social-network-analysis","voson","youtube"],"latest_commit_sha":null,"homepage":"https://vosonlab.github.io/vosonSML/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vosonlab.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-26T08:46:42.000Z","updated_at":"2024-11-15T01:27:30.000Z","dependencies_parsed_at":"2024-07-21T09:44:21.795Z","dependency_job_id":"74f7c8d9-330d-4a0f-a91f-56347919abc5","html_url":"https://github.com/vosonlab/vosonSML","commit_stats":{"total_commits":111,"total_committers":4,"mean_commits":27.75,"dds":"0.33333333333333337","last_synced_commit":"c6ca5956b3c1399d998245557c62bf029c7f6218"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vosonlab%2FvosonSML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vosonlab%2FvosonSML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vosonlab%2FvosonSML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vosonlab%2FvosonSML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vosonlab","download_url":"https://codeload.github.com/vosonlab/vosonSML/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238044095,"owners_count":19407128,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cran","hyperlink","mastodon","network-graph","r","r-package","reddit","rstats","sna","social-media","social-network-analysis","voson","youtube"],"created_at":"2024-09-24T22:35:24.057Z","updated_at":"2025-09-13T16:38:21.529Z","avatar_url":"https://github.com/vosonlab.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# vosonSML - Social Media Lab\u003cimg src=\"https://vosonlab.github.io/vosonSML/images/logo.png\" alt=\"vosonSML logo\" width=\"140px\" align=\"right\"/\u003e\n\n[![Github_Dev](https://img.shields.io/static/v1?label=dev\u0026message=v0.35.1\u0026logo=github)](https://github.com/vosonlab/vosonSML)\n[![Last_Commit](https://img.shields.io/github/last-commit/vosonlab/vosonSML.svg?\u0026logo=github)](https://github.com/vosonlab/vosonSML/commits/master)\n[![Build_Status](https://github.com/vosonlab/vosonSML/workflows/R-CMD-check/badge.svg)](https://github.com/vosonlab/vosonSML/actions)\n[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/vosonSML)](https://CRAN.R-project.org/package=vosonSML)\n[![CRAN_Monthly](https://cranlogs.r-pkg.org/badges/vosonSML)](https://CRAN.R-project.org/package=vosonSML)\n[![CRAN_Total](https://cranlogs.r-pkg.org/badges/grand-total/vosonSML)](https://CRAN.R-project.org/package=vosonSML)\n\nThe `vosonSML` R package is a suite of easy to use functions for\ncollecting and generating different types of networks from social media\ndata. The package supports the collection of data from the `mastodon`,\n`reddit`, `youtube` platforms and `hyperlinks` from web sites. Networks\nin the form of node and edge lists can be generated from collected data,\nsupplemented with additional metadata, and used to create graphs for\nSocial Network Analysis.\n\n## Installation Options\n\nInstalling the github version is recommended at this time as CRAN\nreleases may occur less frequently.\n\n1.  Install the latest github version:\n\n``` r\n# library(remotes)\nremotes::install_github(\"vosonlab/vosonSML\")\n```\n\n2.  Install the CRAN release version:\n\n``` r\ninstall.packages(\"vosonSML\")\n```\n\n## Getting started\n\nThe following usage examples will provide a quick start to using\n`vosonSML` functions. Additionally there is an [Introduction to\nvosonSML](https://vosonlab.github.io/vosonSML/articles/Intro-to-vosonSML.html)\nvignette that is a practical and explanatory guide to collecting data\nand creating networks.\n\n### General Usage\n\nThe process of authentication, data collection and creating networks in\n`vosonSML` is expressed with the three functions: *Authenticate*,\n*Collect* and *Create*. The following are some examples of their usage\nfor supported social media:\n\n[Mastodon](#mastodon-usage) \\| [Reddit](#reddit-usage) \\|\n[YouTube](#youtube-usage) \\| [Hyperlink](#hyperlink-usage) \\|\n[Supplemental Functions](#supplemental-functions)\n\n### General Options\n\n-   `verbose`: most `vosonSML` functions accept a verbosity parameter\n    that is now set to `TRUE` by default. When `FALSE` functions will\n    run silently unless there is a warning or error. If set to `TRUE`\n    then progress and summary information for the function will be\n    printed to the console.\n-   `writeToFile`: `vosonSML` functions accept a write to file\n    parameter. When set `TRUE` the collected data will be saved to a\n    file in either the working directory or a directory set by the\n    `voson.data` option. The file will be saved as a RDS file with a\n    datetime generated name in the following format:\n    `YYYY-MM-DD_HHMMSS-XXXXXX` as a `rds` or `graphml` file.\n\nThe following environment options can also be used:\n\n-   `voson.data`: If set to an existing directory path the `writeToFile`\n    output files will be written to that directory instead of the\n    working directory. Can be set using\n    `options(voson.data = \"~/vsml-data\")` for example, and is cleared by\n    assigning a value of `NULL`. Directory paths can be relative to the\n    working directory e.g. `./data` or full paths.\n-   `voson.cat`: If set to `TRUE` then the verbose output of functions\n    will be printed using the base `cat()` function instead of the\n    `message()` function. Set by entering `options(voson.cat = TRUE)`,\n    and clear by assigning a value of `NULL`.\n\n### Authentication\n\nAuthentication objects generally *only need to be created once* unless\nyour credentials change. It is recommended to save your `mastodon` and\n`youtube` authentication objects to file after creation and then load\nthem in future sessions.\n\n*Please note in the examples provided that the \"\\~\" notation in paths\nare short-hand for the system to use the users home directory, and the\n\".\" at the start of file names signifies it as a hidden file on some OS.\nYou can name and save objects however you wish.*\n\n``` r\n# youtube data api key\nyoutube_auth \u003c- Authenticate(\"youtube\", apiKey = \"xxxxxxxxxx\")\n\n# save the object after Authenticate\nsaveRDS(youtube_auth, file = \"~/.auth_yt\")\n\n# load a previously saved authentication object for use in Collect\nyoutube_auth \u003c- readRDS(\"~/.auth_yt\")\n```\n\n### \u003ca name=\"mastodon-usage\"/\u003eMastodon Usage {#mastodon-usage}\n\nThis implementation of `mastodon` collection uses the `rtoot` package\nand is most suited to collecting public posts.\n\n#### Collect threads or search for posts from server timelines\n\n`Collect` can be used to collect threads by setting the parameter\n`endpoint = thread` and providing the URL's for the starting post of\neach thread to be collected. A mastodon server does not need to be\nspecified, as the function will collect the thread posts from the server\nreferenced in each URL.\n\nThe following example collects and combines all of the posts from the 3\nthreads provided. The result is a named list of two dataframes, one\ncontaining `posts` and one with the metadata for referenced `users` in\nthe collection.\n\n``` r\nlibrary(vosonSML)\noptions(voson.data = \"./mast-data\")\n\nmast_auth \u003c- Authenticate(\"mastodon\")\n\n# collect thread posts belonging to the supplied mastodon\n# threads, the url of the first post in each thread should\n# be used\nmast_data \u003c- mast_auth |\u003e\n  Collect(\n    endpoint = \"thread\",\n    threadUrls = c(\n      \"https://mastodon.social/@xxxxxxxxxxx/111257xxxx48143532\",\n      \"https://mastodon.social/@xxxxxxxxxxx/111257xxxx56847171\",\n      \"https://mastodon.social/@xxxxxxxxxxx/111257xxxx32540480\"\n    ),\n    writeToFile = TRUE\n  )\n\n# Collecting post threads for mastodon urls...\n#\n# id                 | created\n# --------------------------------------------\n# 111257xxxx48143532 | 2023-10-18 18:38:11.509\n# 111257xxxx42879692 | 2023-10-18 19:44:08\n# Collected 36 posts.\n# RDS file written: ./mast-data/2023-10-18_201254-MastodonData.rds\n# Done.\n```\n\n#### Collect posts or search and collect posts from server timelines\n\n`Collect` with the parameter `endpoint = search` can be used to collect\na number of the most recent posts, or the most recent posts containing a\nhashtag from server timelines. This function requires a server to be\nspecified using the `instance` parameter.\n\nThe following example collect the most recent 100 posts made to the\n`mastodon.social` server local timeline. The `local = TRUE` parameter\nrestricts posts to only those made by server users.\n\n``` r\nmast_data \u003c- mast_auth |\u003e\n  Collect(\n    endpoint = \"search\",\n    instance = \"mastodon.social\",\n    local = TRUE,\n    numPosts = 100\n  )\n\n# Collecting timeline posts...\n# Requested 100 posts\n#\n# id                 | created\n# --------------------------------------------\n# 111257xxxx95457456 | 2023-10-18 20:15:36.349\n# 111257xxxx42617952 | 2023-10-18 20:12:59.92\n# Collected 120 posts.\n# Done.\n```\n\nThe next example collects the most recent 100 posts from the\n`mastodon.social` server global timeline containing the hashtag\n`#rstats`. The global timeline includes posts made by users from\n`mastodon.social` as well as posts made by users on its affiliated\nservers. The global timeline is specified by setting `local = FALSE`.\n\n``` r\nmast_data \u003c- mast_auth |\u003e\n  Collect(\n    endpoint = \"search\",\n    instance = \"mastodon.social\",\n    local = FALSE,\n    hashtag = \"rstats\",\n    numPosts = 100,\n    writeToFile = TRUE\n  )\n\n# Collecting timeline posts...\n# Hashtag: rstats\n# Requested 100 posts\n# \n# id                 | created            \n# ----------------------------------------\n# 111851xxxx79684588 | 2024-01-31 17:33:57\n# 111839xxxx72130565 | 2024-01-29 12:55:38\n# Collected 120 posts.\n# RDS file written: 2024-01-31_190125-MastodonData.rds\n# Done.\n```\n\n#### Create mastodon activity and actor network graphs\n\nThe mastodon `Create` function accepts the data from `Collect` and a\ntype parameter of `activity` or `actor` that specifies the type of\nnetwork to create from the collected data. `Create` produces two\ndataframes, one for network `nodes` and one for node relations or\n`edges`. These can then be passed to the `Graph` function to produce an\n`igraph` object.\n\n##### Activity network\n\nNodes are `posts` and edges are the relationship to other posts. The\nonly relationship type supported at this time is `reply` edge.\n\n``` r\nnet_activity \u003c- mast_data |\u003e\n  Create(\"activity\") |\u003e\n  AddText(mast_data) |\u003e\n  Graph()\n\n# Generating mastodon activity network...\n# Done.\n\n# IGRAPH 7cc21ba DN-- 128 12 -- \n# + attr: type (g/c), name (v/c), post.created_at (v/n),\n# | post.visibility (v/c), account.id (v/c), account.username\n# | (v/c), account.acct (v/c), account.displayname (v/c),\n# | user.avatar (v/c), post.tags (v/x), post.tags.urls (v/x),\n# | post.reblogs_count (v/n), post.favourites_count (v/n),\n# | post.replies_count (v/n), post.url (v/c), node_type (v/c),\n# | absent (v/l), vosonTxt_post (v/c), created_at (e/n), edge_type\n# | (e/c)\n# + edges from 7cc21ba (vertex names):\n# [1] 111851xxxx32132167-\u003e111846xxxx99585000\n# + ... omitted several edges\n```\n\n##### Tag network\n\nA variation on the mastodon `activity` network is the subtype `tag`. A\ntag network is a netork of tags (hashtags) found in posts, and their\ncoocurrence with other tags within same posts used to create relations.\n\n``` r\nnet_tag \u003c- mast_data |\u003e\n   Create(\"activity\", subtype = \"tag\") |\u003e\n   Graph()\n   \n# Generating mastodon activity network...\n# Done.\n\n# IGRAPH 23e6e20 DN-- 94 624 -- \n# + attr: type (g/c), name (v/c), post.id (e/c), edge_type (e/c)\n# + edges from 23e6e20 (vertex names):\n#  [1] peerreviewed   -\u003eapackageaday    peerreviewed   -\u003eoss            \n#  [3] peerreviewed   -\u003erstats          apackageaday   -\u003epeerreviewed   \n#  [5] apackageaday   -\u003eoss             apackageaday   -\u003erstats         \n#  [7] oss            -\u003epeerreviewed    oss            -\u003eapackageaday   \n#  [9] oss            -\u003erstats          rstats         -\u003epeerreviewed   \n# [11] rstats         -\u003eapackageaday    rstats         -\u003eoss            \n# [13] rstats         -\u003ereproducibility reproducibility-\u003erstats         \n# [15] rshiny         -\u003erstats          rstats         -\u003ershiny         \n# + ... omitted several edges\n```\n\n##### Actor network\n\nNodes are authors of collected posts and edges are their relationship to\nother authors. The only relationship types supported at this time are\n`reply` and `mention` edges.\n\n``` r\nnet_actor \u003c- mast_data |\u003e\n  Create(\"actor\", inclMentions = TRUE) |\u003e\n  AddText(mast_data) |\u003e\n  Graph()\n\n# Generating mastodon actor network...\n# Done.\n\n# IGRAPH c46e984 DN-B 82 12 -- \n# + attr: type (g/c), name (v/c), user.acct (v/c), user.username\n# | (v/c), user.displayname (v/c), user.url (v/c), user.avatar\n# | (v/c), type (v/c), absent (v/l), post.id (e/c),\n# | post.created_at (e/n), edge_type (e/c), vosonTxt_post (e/c)\n# + edges from c46e984 (vertex names):\n# [1] 1096103xxxx4555149-\u003e10961030xxxx555149\n# + ... omitted several edges\n```\n\n##### Server network\n\nA variation on the mastodon `actor` network is the subtype `server`. A\nserver network simply groups the users into single actors as represented\nby their servers, and similarly combines their relations at the server\nlevel.\n\n``` r\nnet_server \u003c- mast_data |\u003e\n   Create(\"actor\", subtype = \"server\") |\u003e\n   Graph()\n\n# Generating mastodon actor network...\n# Done.\n\n# IGRAPH 845c991 DN-- 23 10 -- \n# + attr: type (g/c), name (v/c), n (v/n), edge_type (e/c)\n# + edges from 845c991 (vertex names):\n#  [1] fosstodon.org  -\u003efosstodon.org   fosstodon.org  -\u003efosstodon.org  \n#  [3] aus.social     -\u003eaus.social      fosstodon.org  -\u003efosstodon.org  \n#  [5] mastodon.social-\u003emastodon.social fosstodon.org  -\u003efosstodon.org  \n#  [7] fosstodon.org  -\u003efosstodon.org   fosstodon.org  -\u003efosstodon.org  \n#  [9] mastodon.social-\u003ehachyderm.io    mstdn.social   -\u003emstdn.social\n```\n\n### \u003ca name=\"youtube-usage\"/\u003eYouTube Usage {#youtube-usage}\n\n#### Authenticate and Collect comments from youtube videos\n\nYouTube uses an API key rather than an OAuth token and is simply set by\ncalling `Authenticate` with the key as a parameter.\n\n``` r\n# youtube authentication sets the api key\nauth_yt \u003c- Authenticate(\"youtube\", apiKey = \"xxxxxxxxxxxxxx\")\n```\n\nOnce the key is set then `Collect` can be used to collect the comments\nfrom specified youtube videos. The following example collects a maximum\nof 100 top-level comments and all replies from each of the 2 specified\nvideo ID's. It produces a dataframe with the combined comment data.\n\n``` r\nvideo_url \u003c- c(\"https://www.youtube.com/watch?v=AQzZ....yWM\",\n               \"https://www.youtube.com/watch?v=lY0Y....T88\u0026t=3152s\")\n\ncollect_yt \u003c- auth_yt |\u003e\n  Collect(videoIDs = video_url,\n          maxComments = 100)\n          \n## Collecting comment threads for YouTube videos...\n## Video 1 of 2\n## ---------------------------------------------------------------\n## ** Creating dataframe from threads of AQzZ....yWM.\n## ** Collecting replies for 1 threads with replies. Please be patient.\n## Comment replies 1 \n## ** Collected replies: 1\n## ** Total video comments: 11\n## (Video API unit cost: 5)\n## ---------------------------------------------------------------\n## Video 2 of 2\n## ---------------------------------------------------------------\n## ** Creating dataframe from threads of lY0Y....T88.\n## ** Collecting replies for 1 threads with replies. Please be patient.\n## Comment replies 6 \n## ** Collected replies: 6\n## ** Total video comments: 14\n## (Video API unit cost: 5)\n## ---------------------------------------------------------------\n## ** Total comments collected for all videos 25.\n## (Estimated API unit cost: 10)\n## Done.\n```\n\n#### Create youtube activity and actor network graphs\n\nThe youtube `Create` function accepts the data from `Collect` and a\nnetwork type parameter of `activity` or `actor`.\n\n##### Activity network\n\nNodes are video comments and edges represent whether they were directed\nto the video as a top-level comment or to another comment as a reply\ncomment.\n\n``` r\nnet_activity \u003c- collect_yt |\u003e Create(\"activity\")\n\n## Generating youtube activity network...\n## -------------------------\n## collected YouTube comments | 25\n## top-level comments         | 18\n## reply comments             | 7\n## videos                     | 2\n## nodes                      | 27\n## edges                      | 25\n## -------------------------\n## Done.\n```\n\n``` r\ng_activity \u003c- net_activity |\u003e Graph()\n\ng_activity\n\n## IGRAPH 5a9fb56 DN-- 27 25 -- \n## + attr: type (g/c), name (v/c), video_id (v/c), published_at (v/c),\n## | updated_at (v/c), author_id (v/c), screen_name (v/c), node_type\n## | (v/c), edge_type (e/c)\n## + edges from 5a9fb56 (vertex names):\n## [1] Ugw13lb0....o4IKFb54AaABAg-\u003eVIDEOID:AQzZ....yWM\n## [2] UgyJBlqZ....ltQTOTt4AaABAg-\u003eVIDEOID:AQzZ....yWM\n## [3] Ugysomx_....4Pqrs1h4AaABAg-\u003eVIDEOID:AQzZ....yWM\n## + ... omitted several edges\n```\n\n##### Actor network\n\nNodes are users who have posted comments and the video publishers, edges\nrepresent comments directed at other users.\n\n``` r\nnet_actor \u003c- collect_yt |\u003e Create(\"actor\")\n\n## Generating YouTube actor network...\n## Done.\n```\n\n``` r\ng_actor \u003c- net_actor |\u003e Graph()\n\ng_actor\n\n## IGRAPH 5aad4c4 DN-- 24 27 -- \n## + attr: type (g/c), name (v/c), screen_name (v/c), node_type (v/c),\n## | video_id (e/c), comment_id (e/c), edge_type (e/c)\n## + edges from 5aad4c4 (vertex names):\n##  [1] UCb9ElH....G9OxDIiSYgdg-\u003eVIDEOID:AQzZ....yWM\n##  [2] UC0DwaB....zUh-LA9sWXKYQ-\u003eVIDEOID:AQzZ....yWM\n##  [3] UCNHA8S....KauefYt1FHmjQ-\u003eUC0DwaB....zUh-LA9sWXKYQ\n## + ... omitted several edges\n```\n\n### \u003ca name=\"reddit-usage\"/\u003eReddit Usage {#reddit-usage}\n\nThe reddit API end-point used by `vosonSML` does not require\nauthentication but an `Authenticate` object is still used to set up the\ncollection and creation operations as part of a reddit workflow.\n\n#### Collect a thread listing from subreddit\n\nBy using the `endpoint = \"listing\"` parameter and a vector of subreddit\nnames, a list of comment threads and their metadata can be collected.\nThe number of list results returned per subreddit can be coarsely\nspecified within 25 items, by using the `max` parameter.\n\n``` r\n# specify subreddit names\nsubreddits \u003c- c(\"datascience\")\n\n# collect a listing of the 25 top threads by upvote of all time\ncollect_rd_listing \u003c- Authenticate(\"reddit\") |\u003e\n  Collect(endpoint = \"listing\", subreddits = subreddits,\n          sort = \"top\", period = \"all\", max = 25,\n          writeToFile = TRUE, verbose = TRUE)\n\n## Collecting thread listing for subreddits...\n## Waiting between 3 and 5 seconds per request.\n## Request subreddit listing: datascience (max items: 25).\n## subreddit_id | subreddit   | count\n## ----------------------------------\n## t5_2sptq     | datascience | 25   \n## Collected metadata for 25 threads in listings.\n## RDS file written: ./vsml-data/2023-04-02_073117-RedditListing.rds\n## Done.\n```\n\n#### Collect reddit threads\n\nThe reddit `Collect` function can then be used to collect comments from\nreddit threads specified by URL's.\n\n``` r\n# specify reddit threads to collect by url\nthread_urls \u003c- c(\n  \"https://www.reddit.com/r/datascience/comments/wc...5/\",\n  \"https://www.reddit.com/r/datascience/comments/wc...g/\"\n)\n\n# or use permalinks from a previously collected listing\nthread_urls \u003c- collect_rd_listing$permalink |\u003e head(n = 3)\n\n# collect comment threads with their comments sorted by best comments first\ncollect_rd \u003c- Authenticate(\"reddit\") |\u003e\n  Collect(threadUrls = thread_urls,\n          sort = \"best\", writeToFile = TRUE, verbose = TRUE)\n\n## Collecting comment threads for reddit urls...\n## Waiting between 3 and 5 seconds per thread request.\n## Request thread: r/datascience (k8..f8) - sort: best\n## Request thread: r/datascience (oe..nl) - sort: best\n## Request thread: r/datascience (ho..gq) - sort: best\n## HTML decoding comments.\n## thread_id | title                                         | subreddit   | count\n## -------------------------------------------------------------------------------\n## ho..gq    | xxxx xxx xx xxx xxx xxxxxxxx xxxx xxxxxxxx... | datascience | 272  \n## k8..f8    | xxxx xxxxx                                    | datascience | 77   \n## oe..nl    | xxx xxxx xxx xxxxxxxxxx                       | datascience | 179\n## Collected 528 total comments.\n## RDS file written: ./vsml-data/2023-04-02_073130-RedditData.rds\n## Done.\n```\n\n*Please note that because of the API end-point used that `Collect` is\nlimited to the first 500 comments per thread (plus 500 for each\n`continue thread` encountered). It is therefore suited to collecting\nonly smaller threads in their entirety.*\n\n#### Create reddit activity and actor networks\n\n##### Activity network\n\nNodes are original thread posts and comments, edges are replies directed\nto the original post and to comments made by others.\n\n``` r\n# create an activity network\nnet_activity \u003c- collect_rd |\u003e Create(\"activity\", verbose = TRUE)\n\n## Generating reddit activity network...\n## -------------------------\n## collected reddit comments | 528\n## subreddits                | 1 \n## threads                   | 3 \n## comments                  | 528\n## nodes                     | 531\n## edges                     | 528\n## -------------------------\n## Done.\n```\n\n``` r\ng_activity \u003c- net_activity |\u003e Graph()\n\ng_activity\n\n## IGRAPH 62e8305 DN-- 531 528 -- \n## + attr: type (g/c), name (v/c), thread_id (v/c), comm_id (v/c),\n## | datetime (v/c), ts (v/n), subreddit (v/c), user (v/c), node_type\n## | (v/c), edge_type (e/c)\n## + edges from 62e8305 (vertex names):\n##  [1] k8..f8.1      -\u003ek8..f8.0     k8..f8.1_1    -\u003ek8..f8.1    \n##  [3] k8..f8.1_2    -\u003ek8..f8.1     k8..f8.1_2_1  -\u003ek8..f8.1_2  \n## + ... omitted several edges\n```\n\n##### Actor network\n\nNodes are reddit users who have commented on threads and edges represent\nreplies to other users.\n\n``` r\n# create an actor network\nnet_actor \u003c- collect_rd |\u003e Create(\"actor\", verbose = TRUE)\n\n## Generating reddit actor network...\n## -------------------------\n## collected reddit comments | 528\n## subreddits                | 1 \n## threads                   | 3 \n## comments                  | 273\n## nodes                     | 321\n## edges                     | 531\n## -------------------------\n## Done.\n```\n\n``` r\ng_actor \u003c- net_actor |\u003e Graph()\n\ng_actor\n\n## IGRAPH 62fa45c DN-- 321 531 -- \n## + attr: type (g/c), name (v/c), user (v/c), subreddit (e/c), thread_id\n## | (e/c), comment_id (e/n), comm_id (e/c)\n## + edges from 62fa45c (vertex names):\n##  [1] 1 -\u003e1  2 -\u003e1  3 -\u003e1  1 -\u003e3  3 -\u003e1  4 -\u003e1  5 -\u003e1  6 -\u003e3  7 -\u003e1  8 -\u003e7 \n## [11] 9 -\u003e1  1 -\u003e9  10-\u003e1  11-\u003e1  1 -\u003e11 1 -\u003e1  1 -\u003e1  1 -\u003e1  1 -\u003e1  1 -\u003e1 \n## + ... omitted several edges\n```\n\n### \u003ca name=\"hyperlink-usage\"/\u003eHyperlink Usage {#hyperlink-usage}\n\n#### Authenticate and Collect from web sites\n\nThe `vosonSML` hyperlink collection functionality does not require\nauthentication as it is not using any web API's, however an\n`Authenticate` object is still used to set up the collection and\ncreation operations as part of the `vosonSML` workflow.\n\nThe hyperlink `Collect` function accepts a dataframe of seed web pages,\nas well as corresponding `type` and `max_depth` parameters for each\npage.\n\n*Please note that this implementalion of hyperlink collection and\nnetworks is still in an experimental stage.*\n\n``` r\n# specify seed web pages and parameters for hyperlink collection\nseed_pages \u003c-\n   tibble::tribble(\n     ~page, ~type, ~max_depth,\n     \"http://vosonlab.net\", \"ext\", 2,\n     \"https://www.oii.ox.ac.uk\", \"ext\", 2,\n     \"https://sonic.northwestern.edu\", \"ext\", 2\n   )\n\n#  A tibble: 3 × 3\n#   page                           type  max_depth\n#   \u003cchr\u003e                          \u003cchr\u003e     \u003cdbl\u003e\n# 1 http://vosonlab.net            ext           2\n# 2 https://www.oii.ox.ac.uk       ext           2\n# 3 https://sonic.northwestern.edu ext           2\n\ncollect_web \u003c- Authenticate(\"web\") |\u003e\n  Collect(pages = seed_pages, verbose = TRUE)\n\n# Collecting web page hyperlinks...\n# *** initial call to get urls - http://vosonlab.net\n# * new domain: http://vosonlab.net\n# + http://vosonlab.net (10 secs)\n# *** end initial call\n# *** set depth: 2\n# *** loop call to get urls - nrow: 6 depth: 2 max_depth: 2\n# * new domain: http://rsss.anu.edu.au\n# + http://rsss.anu.edu.au (0.96 secs)\n# ...\n```\n\n#### Create activity and actor networks\n\n``` r\n# generate a hyperlink activity network\nnet_activity \u003c- collect_web |\u003e Create(\"activity\")\n\n# generate a hyperlink actor network\nnet_actor \u003c- collect_web |\u003e Create(\"actor\")\n```\n\n### \u003ca name=\"supplemental-functions\"/\u003eSupplemental Functions {#supplemental-functions}\n\n#### Merge collected data together\n\nThe `Merge` and `MergeFiles` functions allow two or more `Collect`\nobjects to be merged together provided they are of the same datasource\ntype e.g `mastodon`.\n\n``` r\nget_hashtag_data \u003c- function(tag) {\n  Authenticate(\"mastodon\") |\u003e\n    Collect(\n      endpoint = \"search\",\n      instance = \"mastodon.social\",\n      local = FALSE,\n      hashtag = tag,\n      numPosts = 100,\n      writeToFile = TRUE\n    )\n}\n\nlibrary(vosonSML)\noptions(voson.data = \"./mast-data\")\n\n# collect data\nmast_rstats \u003c- get_hashtag_data(\"rstats\")\nmast_python \u003c- get_hashtag_data(\"python\")\n\n# merge collect objects\ndata \u003c- Merge(mast_rstats, mast_python, writeToFile = TRUE)\n\n# Merging collect data...\n# RDS file written: ./mast-data/2024-07-21_150353-MastodonDataMerge.rds\n# Done.\n\n# merge files from a data directory\ndata \u003c- MergeFiles(\n  \"./mast-data\", pattern = \"*MastodonData.rds\", writeToFile = TRUE\n)\n\n# Merging collect files...\n# Matching files:\n# - ./mast-data/2024-07-21_035919-MastodonData.rds\n# - ./mast-data/2024-07-21_035925-MastodonData.rds\n# Merging collect data...\n# RDS file written: ./mast-data/2024-07-21_150544-DataMergeFile.rds\n# Done.\n```\n\n#### AddText adds collected text data to networks as node or edge attributes\n\nThe `AddText` function can be used following the creation of all\nnetworks for `mastodon`, `youtube` and `reddit`. It will add an\nattribute starting with `vosonTxt_` to nodes of `activity` networks and\nto edges of `actor` networks. It requires a collected `datasource` from\nwhich to extract text data.\n\n``` r\n# create activity network\nnet_activity \u003c- data |\u003e Create(\"activity\")\n\n# activity network with text data added as node attribute\nnet_activity \u003c- net_activity |\u003e AddText(data)\n  \n## Adding text data to network...Done.\n```\n\n``` r\nnames(net_activity)\n#  [1] \"post.id\"               \"post.created_at\"      \n# ..             \n# [17] \"vosonTxt_post\"\n```\n\n`AddText` will also redirect some edges in a youtube `actor` network by\nfinding user references at the beginning of reply comments text using\nthe `repliesFromText` parameter. In the following example an edge would\nbe redirected from `UserC` to `UserB` by text reference as opposed to\n`UserA` who made the top-level comment both users are replying to.\n\n``` r\n# video comments\n# UserA: Great tutorial.\n# |- UserB: I agree, but it could have had more examples.\n# |- UserC: @UserB I thought it probably had too many.\n```\n\nRedirect edge between user nodes `C -\u003e A` to `C -\u003e B`.\n\n``` r\n# create activity network\nnet_actor \u003c- collect_yt |\u003e Create(\"actor\")\n\n# detects replies to users in text\nnet_actor \u003c- net_actor |\u003e\n  AddText(collect_yt, repliesFromText = TRUE)\n\n## Adding text data to network...Done.\n```\n\n#### AddVideoData requests and adds video data to networks\n\n`AddVideoData` adds video information as node attributes in youtube\n`actor` networks and replaces the video ID nodes with a user (channel\nowner or publisher). The `actorSubOnly` parameter can be used to only\nperform the ID substitution.\n\n``` r\n# replaces VIDEOID:xxxxxx references in actor network with their publishers\n# user id (channel ID) and adds additional collected youtube video info to actor\n# network graph as node attributes\nnet_actor \u003c- collect_yt |\u003e\n  Create(\"actor\") |\u003e \n  AddVideoData(auth_yt, actorSubOnly = FALSE)\n\nnames(net_actor)\n## [1] \"nodes\"  \"edges\"  \"videos\"\nnrow(net_actor$videos)\n## [1] 2\n```\n\n`AddVideoData` function will also add a new dataframe to the\n`actor_network` network list containing the retrieved video information\ncalled `videos`.\n\n``` r\ng_actor \u003c- net_actor |\u003e Graph()\n\n## IGRAPH 644cb17 DN-- 23 27 -- \n## + attr: type (g/c), name (v/c), screen_name (v/c), node_type (v/c),\n## | video_id (e/c), comment_id (e/c), edge_type (e/c), video_title (e/c),\n## | video_description (e/c), video_published_at (e/c)\n## + edges from 644cb17 (vertex names):\n## [1] UCb9ElH9...G9OxDIiSYgdg-\u003eUCeiiqmVK07...-wvg3IZiZQ\n## [2] UC0DwaB_...zUh-LA9sWXKYQ-\u003eUCeiiqmVK07...-wvg3IZiZQ\n## + ... omitted several edges\n```\n\n## Where to next?\n\nContinue working with the network graphs using the `igraph` package and\ncheck out some examples of plots in the [Introduction to\nvosonSML](https://vosonlab.github.io/vosonSML/articles/Intro-to-vosonSML.html)\nvignette. The `graphml` files produced by `vosonSML` are also easily\nimported into software such as [Gephi](https://gephi.org/) for further\nvisualization and exploration of networks.\n\nAs an alternative to `vosonSML` using the R command-line interface we\nhave also developed an R Shiny app called [VOSON\nDash](https://vosonlab.github.io/VOSONDash/). It provides a user\nfriendly GUI for the collection of data using `vosonSML` and has\nadditional network visualization and analysis features.\n\nFor more detailed information about functions and their parameters,\nplease refer to the\n[Reference](https://vosonlab.github.io/vosonSML/reference/index.html)\npage.\n\n## Special thanks\n\nThis package would not be possible without a number of excellent\npackages created by others in the R community, we would especially like\nto thank the authors of the [dplyr](https://github.com/tidyverse/dplyr),\n[httr2](https://github.com/r-lib/httr2),\n[igraph](https://github.com/igraph/rigraph),\n[RedditExtractoR](https://github.com/ivan-rivera/RedditExtractoR),\n[rtoot](https://gesistsa.github.io/rtoot/)\n\n## Code of Conduct\n\nPlease note that the VOSON Lab projects are released with a [Contributor\nCode of\nConduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).\nBy contributing to this project, you agree to abide by its terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvosonlab%2Fvosonsml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvosonlab%2Fvosonsml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvosonlab%2Fvosonsml/lists"}