{"id":22966631,"url":"https://github.com/favstats/ica22_conf","last_synced_at":"2026-03-20T00:04:29.682Z","repository":{"id":108350770,"uuid":"495819656","full_name":"favstats/ica22_conf","owner":"favstats","description":null,"archived":false,"fork":false,"pushed_at":"2022-05-24T12:54:11.000Z","size":1835,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-17T01:44:07.064Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/favstats.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-24T12:46:57.000Z","updated_at":"2022-05-24T20:20:06.000Z","dependencies_parsed_at":"2023-03-13T14:27:43.825Z","dependency_job_id":null,"html_url":"https://github.com/favstats/ica22_conf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/favstats/ica22_conf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/favstats%2Fica22_conf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/favstats%2Fica22_conf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/favstats%2Fica22_conf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/favstats%2Fica22_conf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/favstats","download_url":"https://codeload.github.com/favstats/ica22_conf/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/favstats%2Fica22_conf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271560484,"owners_count":24780865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-21T02:00:08.990Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-14T20:44:53.755Z","updated_at":"2026-02-09T09:33:02.837Z","avatar_url":"https://github.com/favstats.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"---\ntitle: \"ICA22 Twitter Analysis\"\nauthor: \"Fabio\"\ndate: \"2022-05-24\"\noutput: github_document\n---\n\nThis is a short notebook outlining the code used to scrape tweets related to the ICA22 conference in Paris.\n\n```{r setup, include=FALSE}\nknitr::opts_chunk$set(echo = TRUE, message = F, error = F, warning = F)\n```\n\n\n## Packages\n\nLoad the necessary packages\n\n```{r}\n# install pacman once if not avaible on your machine\n# install.packages(\"pacman\")\n\npacman::p_load(tidyverse, rtweet, ggraph, igraph, tidygraph)\n```\n\n\n## Get Data\n\nCall Twitter API. If you want to get data yourself you have to register with a free account where you get your personal access point to Twitter. Check out [`rtweet`](https://github.com/mkearney/rtweet/) and follow the instructions. \n\n```{r, eval = F}\n# twitter_token \u003c- readRDS(\"twitter_token.rds\")\n\nrt \u003c- search_tweets(\n  \"#ICA22 OR #ica22\", n = 100000, include_rts = T, retryonratelimit = T, since='2022-05-01', until='2022-05-31'\n)\n\nsave(rt, file = \"data/rt.Rdata\")\n```\n\nLets first look at the data structure and column names. Twitter returns over 1,200 unique tweets.\n\n```{r}\nload(\"data/rt.Rdata\")\n\nrt %\u003e% glimpse # the same as str, returns a df overview\n\n```\n\n\n## The top ten retweeted tweets.\n\n```{r, results=\"asis\"}\n# load(\"rt.Rdata\")\nrt %\u003e% \n  filter(!is_retweet) %\u003e% \n  select(screen_name, text, retweet_count) %\u003e% \n  filter(!str_detect(text, \"^RT\")) %\u003e% \n  mutate(text = str_replace_all(text, \"\\\\\\n\", \" \")) %\u003e% \n  arrange(desc(retweet_count)) %\u003e% \n  top_n(n = 10) %\u003e% \n  knitr::kable(., format = \"markdown\")\n```\n\n\n## Timeline\n\n```{r, fig.height = 6}\nrt %\u003e%\n  ## parse date format\n  mutate(created_at = lubridate::as_datetime(created_at, \"Europe/Germany\")) %\u003e% \n  mutate(\n    cdate = created_at %\u003e% \n      str_extract(\"\\\\d{4}-\\\\d{2}-\\\\d{2}\") %\u003e% \n      lubridate::ymd(),\n    hour = lubridate::hour(created_at)\n  ) %\u003e% #select(created_at)\n  ## select relevant time period\n  filter(cdate \u003e= as.Date(\"2022-05-24\") \u0026 cdate \u003c= as.Date(\"2022-05-31\")) %\u003e% \n  ## count tweet per and and hour\n  group_by(cdate, hour) %\u003e%\n  tally %\u003e%\n  ungroup %\u003e%\n  ggplot(aes(hour, n)) +\n  geom_line() +\n  ## split the visualization \n  facet_wrap(~cdate, ncol = 1) +\n  ggthemes::theme_hc() +\n  scale_x_continuous(labels =  seq(5, 24, 3), breaks = seq(5, 24, 3)) +\n  # scale_y_continuous(labels = seq(0, 60, 20), \n                     # breaks = seq(0, 60, 20), \n                     # minor_breaks = seq(0, 60, 20)) +\n  ggtitle(\"Number of Tweets by Hour of the Day mentioning #ICA22\") +\n  xlab(\"Hour of the Day\") +\n  ylab(\"Number of Tweets\")\n```\n\n\n## Retweet Network\n\n```{r, fig.width = 15, fig.height=15}\nrt_graph \u003c- rt %\u003e% \n  ## select relevant variables\n  dplyr::select(screen_name, retweet_screen_name) %\u003e% \n  ## unnest list of mentions_screen_name\n  unnest %\u003e% \n  ## count the number of coocurences\n  group_by(screen_name, retweet_screen_name) %\u003e% \n  tally(sort = T) %\u003e%\n  ungroup %\u003e% \n  ## drop missing values\n  drop_na %\u003e% \n  ## filter those coocurences that appear at least 2 times\n  filter(n \u003e 1) %\u003e% \n  ## transforming the dataframe to a graph object\n  as_tbl_graph() %\u003e% \n  ## calculating node centrality\n  mutate(centrality = centrality_degree(mode = 'in'))\n\nrt_graph %\u003e% \n  ## create graph layout\n  ggraph(layout = \"kk\") + \n  ## define edge aestetics\n  geom_edge_fan(aes(alpha = n, edge_width = n, color = n)) + \n  ## scale down link saturation\n  scale_edge_alpha(range = c(.5, .9)) +\n  ## define note size param\n  scale_edge_color_gradient(low = \"gray50\", high = \"#1874CD\") +\n  geom_node_point(aes(size = centrality), color = \"gray30\") +\n  ## equal width and height\n  coord_fixed() +\n  ## plain theme\n  theme_void() +\n  ## title\n  ggtitle(\"#ICA22 Retweet Network\")\n\n\nrt_graph %\u003e% \n  ## create graph layout\n  ggraph(layout = \"kk\") + \n  ## define edge aestetics\n  geom_edge_fan(aes(alpha = n, edge_width = n, color = n)) + \n  ## scale down link saturation\n  scale_edge_alpha(range = c(.5, .9)) +\n  ## define note size param\n  scale_edge_color_gradient(low = \"gray50\", high = \"#1874CD\") +\n  geom_node_point(aes(size = centrality), color = \"gray30\") +\n  ## define node labels\n  geom_node_text(aes(label = name), repel = T, fontface = \"bold\") +\n  ## equal width and height\n  coord_fixed() +\n  ## plain theme\n  theme_void() +\n  ## title\n  ggtitle(\"#ICA22 Retweet Network\") +\n  theme(plot.title = element_text(size = 20, hjust = 0.5))\n\n\nrt_graph %\u003e% \n  ## create graph layout\n  ggraph(layout = \"circle\") + \n  ## define edge aestetics\n  geom_edge_fan(aes(alpha = n, edge_width = n, color = n)) + \n  ## scale down link saturation\n  scale_edge_alpha(range = c(.5, .9)) +\n  ## define note size param\n  scale_edge_color_gradient(low = \"gray50\", high = \"#1874CD\") +\n  geom_node_point(aes(size = centrality), color = \"gray30\") +\n  ## define node labels\n  geom_node_text(aes(label = name), repel = F, fontface = \"bold\") +\n  ## equal width and height\n  coord_fixed() +\n  ## plain theme\n  theme_void() +\n  ## title\n  ggtitle(\"#ICA22 Retweet Network\")\n\n```\n\n\n## Mentions Network\n\n```{r, fig.width = 15, fig.height=15}\nrt_graph \u003c- rt %\u003e% \n  ## remove retweets\n  filter(!is_retweet) %\u003e% \n  ## select relevant variables\n  dplyr::select(screen_name, mentions_screen_name) %\u003e% \n  ## unnest list of mentions_screen_name\n  unnest %\u003e% \n  ## count the number of coocurences\n  group_by(screen_name, mentions_screen_name) %\u003e% \n  tally(sort = T) %\u003e%\n  ungroup %\u003e% \n  ## drop missing values\n  drop_na %\u003e% \n  ## filter those coocurences that appear at least 2 times\n  filter(n \u003e 1) %\u003e% \n  ## transforming the dataframe to a graph object\n  as_tbl_graph() %\u003e% \n  ## calculating node centrality\n  mutate(centrality = centrality_degree(mode = 'in'))\n\nrt_graph %\u003e% \n  ## create graph layout\n  ggraph(layout = \"kk\") + \n  ## define edge aestetics\n  geom_edge_fan(aes(alpha = n, edge_width = n, color = n)) + \n  ## scale down link saturation\n  scale_edge_alpha(range = c(.5, .9)) +\n  ## define note size param\n  scale_edge_color_gradient(low = \"gray50\", high = \"#1874CD\") +\n  geom_node_point(aes(size = centrality), color = \"gray30\") +\n  ## equal width and height\n  coord_fixed() +\n  ## plain theme\n  theme_void() +\n  ## title\n  ggtitle(\"#ICA22 Twitter Mentions Network\")\n\nrt_graph %\u003e% \n  ## create graph layout\n  ggraph(layout = \"kk\") + \n  ## define edge aestetics\n  geom_edge_fan(aes(alpha = n, edge_width = n, color = n)) + \n  ## scale down link saturation\n  scale_edge_alpha(range = c(.5, .9)) +\n  ## define note size param\n  scale_edge_color_gradient(low = \"gray50\", high = \"#1874CD\") +\n  geom_node_point(aes(size = centrality), color = \"gray30\") +\n  ## define node labels\n  geom_node_text(aes(label = name), repel = T, fontface = \"bold\") +\n  ## equal width and height\n  coord_fixed() +\n  ## plain theme\n  theme_void() +\n  ## title\n  ggtitle(\"#ICA22 Twitter Mentions Network\")\n\nrt_graph %\u003e% \n  ## create graph layout\n  ggraph(layout = \"circle\") + \n  ## define edge aestetics\n  geom_edge_fan(aes(alpha = n, edge_width = n, color = n)) + \n  ## scale down link saturation\n  scale_edge_alpha(range = c(.5, .9)) +\n  ## define note size param\n  scale_edge_color_gradient(low = \"gray50\", high = \"#1874CD\") +\n  geom_node_point(aes(size = centrality), color = \"gray30\") +\n  ## define node labels\n  geom_node_text(aes(label = name), repel = F, fontface = \"bold\") +\n  ## equal width and height\n  coord_fixed() +\n  ## plain theme\n  theme_void() +\n  ## title\n  ggtitle(\"#ICA22 Twitter Mentions Network\")\n```\n\n### Smaller Mentions Network (n \u003e 2)\n\n```{r, fig.width = 15, fig.height=15}\nrt_graph2 \u003c- rt %\u003e% \n  ## select relevant variables\n  dplyr::select(screen_name, mentions_screen_name) %\u003e% \n  ## unnest list of mentions_screen_name\n  unnest %\u003e% \n  ## count the number of coocurences\n  group_by(screen_name, mentions_screen_name) %\u003e% \n  tally(sort = T) %\u003e%\n  ungroup %\u003e% \n  ## drop missing values\n  drop_na %\u003e% \n  ## filter those coocurences that appear more than 2 times\n  filter(n \u003e 2) %\u003e% \n  ## transforming the dataframe to a graph object\n  as_tbl_graph() %\u003e% \n  ## calculating node centrality\n  mutate(centrality = centrality_degree(mode = 'in'))\n\nrt_graph2 %\u003e% \n  ## create graph layout\n  ggraph(layout = \"kk\") + \n  ## define edge aestetics\n  geom_edge_fan(aes(alpha = n, edge_width = n, color = n)) + \n  ## scale down link saturation\n  scale_edge_alpha(range = c(.5, .9)) +\n  ## define note size param\n  scale_edge_color_gradient(low = \"gray50\", high = \"#1874CD\") +\n  geom_node_point(aes(size = centrality), color = \"gray30\") +\n  ## equal width and height\n  coord_fixed() +\n  geom_node_text(aes(label = name), repel = T, fontface = \"bold\") +\n  ## plain theme\n  theme_void() +\n  ## title\n  ggtitle(\"#ICA22 Twitter Mentions Network\")\n\n```\n\n\n## Most Frequent Hashtags\n\n```{r}\nrt_hashtags \u003c- rt %\u003e% \n  filter(!is_retweet) %\u003e% \n  select(hashtags) %\u003e% \n  ## unnest list of hastags\n  unnest %\u003e% \n    na.omit %\u003e% \n  ## clean hashtags\n  mutate(hashtags = stringr::str_to_lower(hashtags) %\u003e% \n           str_replace_all(\"2018\", \"18\") %\u003e% \n           ## add #symbol to vector\n           paste0(\"#\", .)) %\u003e% \n  ## count each hashtag and sort\n  count(hashtags, sort = T) %\u003e% \n  filter(n \u003e 5)\n\nrt_hashtags %\u003e% \n  filter(hashtags != \"#ica22\") %\u003e%\n  mutate(hashtags = forcats::fct_reorder(hashtags, n)) %\u003e% \n  ggplot(aes(hashtags, n)) +\n  geom_bar(stat = \"identity\", alpha = .7) +\n  coord_flip() +\n  theme_minimal() +\n  ggtitle(\"Most Frequent Hastags related to #ICA22\")\n```\n\n## Most Frequent Bigram Network\n\n```{r}\ngg_bigram \u003c- rt %\u003e%\n  ## remove retweets\n  filter(!is_retweet) %\u003e% \n  select(text) %\u003e% \n  ## remove text noise\n  mutate(text = stringr::str_remove_all(text, \"w |amp \")) %\u003e% \n  ## remove retweets\n  filter(!stringr::str_detect(text, \"^RT\")) %\u003e% \n  ## remove urls\n  mutate(text = stringr::str_remove_all(text, \"https?[:]//[[:graph:]]+\")) %\u003e% \n  mutate(id = 1:n()) %\u003e% \n  ## split text into words\n  tidytext::unnest_tokens(word, text, token = \"words\") %\u003e% \n  ## remove stop words\n  anti_join(tidytext::stop_words) %\u003e% \n  ## paste words to text by id\n  group_by(id) %\u003e% \n  summarise(text = paste(word, collapse = \" \")) %\u003e% \n  ungroup %\u003e% \n  ## again split text into bigrams (word occurences or collocations)\n  tidytext::unnest_tokens(bigram, text, token = \"ngrams\", n = 2) %\u003e% \n  separate(bigram, c(\"word1\", \"word2\"), sep = \" \") %\u003e% \n  ## remove the hashtag and count bigrams \n  filter(word1 != \"ica22\", word2 != \"ica22\") %\u003e%\n  count(word1, word2, sort = T) %\u003e% \n  ## select first 50\n  slice(1:50) %\u003e% \n  drop_na() %\u003e%\n  ## create tidy graph object\n  as_tbl_graph() %\u003e% \n  ## calculate node centrality\n  mutate(centrality = centrality_degree(mode = 'in'))\n```\n\n\n```{r}\ngg_bigram %\u003e% \n  ggraph() +\n  geom_edge_link(aes(edge_alpha = n, edge_width = n)) +\n  geom_node_point(aes(size = centrality)) + \n  geom_node_text(aes(label = name),  repel = TRUE) +\n  theme_void() +\n  scale_edge_alpha(\"\", range = c(0.3, .6)) +\n  ggtitle(\"Top Bigram Network from Tweets using hashtag #ICA22\")\n```\n\n\n```{r}\nsessionInfo()\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffavstats%2Fica22_conf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffavstats%2Fica22_conf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffavstats%2Fica22_conf/lists"}