{"id":15343982,"url":"https://github.com/tarilabs/joind-ex042015","last_synced_at":"2025-04-05T22:26:37.929Z","repository":{"id":29904602,"uuid":"33450310","full_name":"tarilabs/joind-ex042015","owner":"tarilabs","description":"An exercise in named versus anonymous comment analysis using Java 8 Stream API and R","archived":false,"fork":false,"pushed_at":"2015-04-06T08:02:12.000Z","size":220,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-12T04:16:07.198Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tarilabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-04-05T18:17:45.000Z","updated_at":"2015-04-06T11:29:36.000Z","dependencies_parsed_at":"2022-09-07T00:53:07.590Z","dependency_job_id":null,"html_url":"https://github.com/tarilabs/joind-ex042015","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarilabs%2Fjoind-ex042015","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarilabs%2Fjoind-ex042015/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarilabs%2Fjoind-ex042015/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tarilabs%2Fjoind-ex042015/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tarilabs","download_url":"https://codeload.github.com/tarilabs/joind-ex042015/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247410044,"owners_count":20934553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-01T10:53:29.056Z","updated_at":"2025-04-05T22:26:37.901Z","avatar_url":"https://github.com/tarilabs.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\r\ntitle: \"joind-ex042015\"\r\noutput: html_document\r\n---\r\nAn exercise in named versus anonymous comment analysis using Java 8 Stream API and R.\r\n\r\n# Executive summary\r\nA few days after Codemotion Rome 2015 - a very interesting Italian Tech Conference - a little conversation has started on social media, arguing on the solidity and validity of anonimous comments and anonimous ratings of the several talks which held at the conference. My perspective is that attendees left anonimous feedbacks for reasons which are different than lack-of-personal-confidence or, in some cases, trolling; this analysis also because I wanted some real data to exercise over Java 8 Stream API and R. This is to support my thesis that anonymous comments and feedbacks have their relevance in this case.\r\n\r\n# Introduction\r\n\r\nSetting the goals:\r\n\r\n* Exercise with __Java 8 Stream API__ including custom Collector\r\n* Exercise with __R__ and R markdown\r\n* Back with data my thesis that anonymous feedbacks have relevance in the case illustrated above\r\n\r\nNon-goals:\r\n\r\n* Java code to be idiomatic FP style\r\n* Idiomatic statistical analysis\r\n\r\n# Case study: Event ID# 3347\r\nThis is case study of comments Event ID 3347 on `joind.in` which is Codemotion Rome 2015.\r\n\r\n## Sourcing the data with Java 8\r\nThe Java code consume the `joind.in` API in order to cycle on the Event's talks, fetching all the comments for each talk and filtering out anonymous ratings which have repeated comment text, possibly clicked on upload form multiple times? Java 8 stream API is very helpful to process this data in streams and perform some custom pre-aggregations to be used later in the analysis.\r\nTechnologies used: JAX-RS with RESTeasy, Jackson for JSON tree walking with Java 8 stream API.\r\n\r\n## Data load and preparation in R\r\nLoading the data from the Java generated code\r\n\r\n\r\n```r\r\nmDF \u003c- read.csv(\"data/20150404200655/3347/stats.csv\", header=FALSE, col.names=c(\"id\", \"h_avg\", \"h_commentCnt\", \"h_starCnt\", \"anon_Cnt\", \"named_Cnt\", \"anonAvg\", \"namedAvg\", \"totalAvg\"))\r\n```\r\n\r\nIntroducing a new column in the data frame to represent the ratio of named comment Vs anonymous comment.\r\n\r\n\r\n```r\r\nmDF$namedRatio \u003c- mDF$named_Cnt / (mDF$named_Cnt+mDF$anon_Cnt)\r\n```\r\n\r\nKeeping only those talks which have at least a comment/feedback\r\n\r\n\r\n```r\r\nmDF \u003c- mDF[mDF$anon_Cnt+mDF$named_Cnt \u003e 0,]\r\nmDF$tot_Cnt \u003c- mDF$anon_Cnt+mDF$named_Cnt\r\nsummary(mDF)\r\n```\r\n\r\n```\r\n##        id            h_avg        h_commentCnt      h_starCnt\r\n##  Min.   :13842   Min.   :0.000   Min.   : 1.000   Min.   :0  \r\n##  1st Qu.:14097   1st Qu.:3.000   1st Qu.: 1.000   1st Qu.:0  \r\n##  Median :14124   Median :4.000   Median : 2.000   Median :0  \r\n##  Mean   :14073   Mean   :3.486   Mean   : 3.129   Mean   :0  \r\n##  3rd Qu.:14145   3rd Qu.:5.000   3rd Qu.: 4.000   3rd Qu.:0  \r\n##  Max.   :14345   Max.   :5.000   Max.   :16.000   Max.   :0  \r\n##     anon_Cnt        named_Cnt        anonAvg         namedAvg    \r\n##  Min.   : 0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  \r\n##  1st Qu.: 0.000   1st Qu.:1.000   1st Qu.:0.000   1st Qu.:1.625  \r\n##  Median : 1.000   Median :1.000   Median :2.500   Median :4.000  \r\n##  Mean   : 1.386   Mean   :1.571   Mean   :2.362   Mean   :3.228  \r\n##  3rd Qu.: 2.000   3rd Qu.:2.000   3rd Qu.:4.625   3rd Qu.:5.000  \r\n##  Max.   :12.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  \r\n##     totalAvg       namedRatio        tot_Cnt      \r\n##  Min.   :1.000   Min.   :0.0000   Min.   : 1.000  \r\n##  1st Qu.:3.208   1st Qu.:0.2125   1st Qu.: 1.000  \r\n##  Median :4.500   Median :0.6333   Median : 2.000  \r\n##  Mean   :4.079   Mean   :0.5932   Mean   : 2.957  \r\n##  3rd Qu.:5.000   3rd Qu.:1.0000   3rd Qu.: 4.000  \r\n##  Max.   :5.000   Max.   :1.0000   Max.   :14.000\r\n```\r\n\r\n```r\r\nrow.names(mDF) = mDF$id\r\n```\r\n\r\n### Ratings\r\n\r\nA quick look at the ratings:\r\n\r\n\u003cimg src=\"figure/unnamed-chunk-4-1.png\" title=\"plot of chunk unnamed-chunk-4\" alt=\"plot of chunk unnamed-chunk-4\" style=\"display: block; margin: auto;\" /\u003e\r\n\r\nFrom `joind.in` API: *rating: A rating from 1-5 where 5 is the best and 1 is rubbish*.\r\n\r\nData suggest a vast amount of the talks have been very positively enjoyed by the attendees, confirming once again the success of the event!\r\n\r\n### Named Vs Anonymous ratio\r\n\r\nA quick look at the Named Vs Anonymous comments ratio:\r\n\r\n\u003cimg src=\"figure/unnamed-chunk-5-1.png\" title=\"plot of chunk unnamed-chunk-5\" alt=\"plot of chunk unnamed-chunk-5\" style=\"display: block; margin: auto;\" /\u003e\r\n\r\nData suggest anonymous comments and rating are a very meaningful population in the dataset, hence shall not be excluded in the futher analysis. \r\n\r\n### Named and Anonymous ratings\r\n\r\nA visual attempt to highlight distribution of data, considering Named Vs Anonymous ratio and total average rating. To help visually distinguish positively-rated talks from negatively-rated talk, the following threshold are set:\r\n\r\n\r\n```r\r\nlow \u003c- mDF$totalAvg \u003c= 2.5   # negatively-rated talk, red\r\nhigh \u003c- mDF$totalAvg \u003e 2.5   # positibely-rated talk, blue\r\n```\r\n\u003cimg src=\"figure/unnamed-chunk-7-1.png\" title=\"plot of chunk unnamed-chunk-7\" alt=\"plot of chunk unnamed-chunk-7\" style=\"display: block; margin: auto;\" /\u003e\r\n\r\n### More on Named and Anonymous ratings\r\n\r\nA visual attempt, as above, but also to highlight most commented talks: the more commented, the less transparent the label:\r\n\r\n\u003cimg src=\"figure/unnamed-chunk-8-1.png\" title=\"plot of chunk unnamed-chunk-8\" alt=\"plot of chunk unnamed-chunk-8\" style=\"display: block; margin: auto;\" /\u003e\r\n\r\n# Conclusions\r\nData suggest anonymous comments and rating are a very meaningful population in the dataset, moreover visual representation highlight how anonymous feedback is relevant for a number of positively-rated talk, hence with this data at hand, I'm not convinced on the argumentation that anonymity is an alibi for lack-of-confidence or trolling. Further explanation could be that attendees are not keen in sign-up to yet another social network just to leave feedbacks, and/or that they fear personal consequences in leaving named severe critics.\r\n\r\n* * * * \r\n\r\nFollowing notes on extensions.\r\n\r\n# Extension\r\nAn experimentation in using principal component analysis for dimensionality reduction for representing the dataset.\r\n\r\n![plot of chunk unnamed-chunk-9](figure/unnamed-chunk-9-1.png) \r\n\r\n\r\n\r\n![plot of chunk unnamed-chunk-11](figure/unnamed-chunk-11-1.png) \r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftarilabs%2Fjoind-ex042015","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftarilabs%2Fjoind-ex042015","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftarilabs%2Fjoind-ex042015/lists"}