{"id":21685885,"url":"https://github.com/sodascience/bestworst_analysis","last_synced_at":"2025-03-20T11:45:24.674Z","repository":{"id":213504284,"uuid":"671544357","full_name":"sodascience/bestworst_analysis","owner":"sodascience","description":"Bayesian analysis of best-worst scaling data","archived":false,"fork":false,"pushed_at":"2023-12-21T11:37:21.000Z","size":279,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-25T12:09:36.384Z","etag":null,"topics":["bayesian-inference","best-worst-scaling","ranking-data","word-associations"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sodascience.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-07-27T15:01:06.000Z","updated_at":"2024-02-25T02:05:32.000Z","dependencies_parsed_at":"2023-12-21T12:34:09.977Z","dependency_job_id":"ea7a07b2-3b95-4198-afce-263b0c80fe97","html_url":"https://github.com/sodascience/bestworst_analysis","commit_stats":null,"previous_names":["sodascience/bestworst_analysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fbestworst_analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fbestworst_analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fbestworst_analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sodascience%2Fbestworst_analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sodascience","download_url":"https://codeload.github.com/sodascience/bestworst_analysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244609357,"owners_count":20480779,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-inference","best-worst-scaling","ranking-data","word-associations"],"created_at":"2024-11-25T16:23:31.718Z","updated_at":"2025-03-20T11:45:24.652Z","avatar_url":"https://github.com/sodascience.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bayesian analysis of best-worst word association data\nPreprocessing and analysis scripts for Bayesian analysis of best-worst scaling data, such as that generated by our [`word_norms_survey`](https://github.com/sodascience/word_norms_survey). The model in this repository allows for regression on the latent log-odds scale for the item values, which can answer questions such as:\n\n\u003e _Which properties of my words influence their association with evilness as measured by repeated best-worst rankings?_\n\n\u003e _How well does my language model predict word associations on femininity as measured by repeated best-worst rankings?_\n\n## Installation\nThis repository uses an RStudio project. Open the [`bestworst_analysis.Rproj`](bestworst_analysis.Rproj) file in RStudio to open the project. To run the code in this repository, first install the dependencies as follows in `R`\n\n```r\n# install packages\npks \u003c- c(\"cmdstanr\", \"tidyverse\", \"patchwork\", \"arrow\")\ninstall.packages(pks, repos = c(\"https://mc-stan.org/r-packages/\", getOption(\"repos\")))\n\n# install stan to compile \u0026 run models\ncmdstanr::install_cmdstan()\n```\n\n## Model\nThe model is a Bayesian rank-ordered logit (ROL) model which estimates latent item values based on (partial) rankings of these items on a specific task. The model is implemented in [`stan`](https://mc-stan.org). The stan code, data preparation functions, and posterior summarization functions can be found in the [`stan/`](./stan/) subfolder.\n\nIn these models, the likelihood of observing a rank ordering $y$ of $N$ items given each item's latent \"worth\" parameter $\\theta_n$ is:\n\n$$ P(y | \\theta) = \\sum^N_{n=1}\\left[ \\exp \\theta_n \\div \\sum^N_{m = n} \\exp \\theta_m \\right] $$\n\nTo learn more about these types of rank-ordered logit models, read:\n- For an intuitive understanding, the introduction from the [Plackett-Luce package](https://hturner.github.io/PlackettLuce/)\n  \u003e Turner, H.L., van Etten, J., Firth, D. and Kosmidis, I. (2020). Modelling Rankings in R: The PlackettLuce Package Computational Statistics, 35, 1027-1057. URL https://doi.org/10.1007/s00180-020-00959-3.\n- For how this maps to best-worst experiments, Case 1 \u0026 the section on Models of Ranking by Repeated Best and/or Worst Choice from [Marley, Flynn, \u0026 Australia (2015)](https://www.sciencedirect.com/science/article/pii/B9780080970868431223)\n  \u003e Marley, A. A., Flynn, T. N., \u0026 Australia, V. (2015). Best worst scaling: theory and practice. International encyclopedia of the social \u0026 behavioral sciences, 2(2), 548-552.\n- For the stochastic (Bayesian) implementation: [Glickman \u0026 Hennessy (2015)](https://www.degruyter.com/document/doi/10.1515/jqas-2015-0012/html)\n  \u003e Glickman, M. E., \u0026 Hennessy, J. (2015). A stochastic rank ordered logit model for rating multi-competitor games and sports. Journal of Quantitative Analysis in Sports, 11(3), 131-144.\n\n## Scripts\n### Experiment data processing\nThe experiment data processing script ([`01_experiment_process.R`](01_experiment_process.R)) takes in data from a best-worst scaling experiment ([`data_raw/experiment_data/`](./data_raw/experiment_data/)) and creates a long-format version of this data which contains the following information:\n\n- __subj_id__ the (anonymous) identifier of the participant in the study\n- __trial__ the trial number of the participant\n- __association__ the association that was tested (e.g., evilness, femininity)\n- __wordtype__ the type of the words in the trial (first names, company names, non-words)\n- __option__ the option number of the words (1 to 4)\n- __word__ the word belonging to this option in the trial\n- __ranking__ how the word was ranked. 1 is best, 4 is worst, and the remaining (unranked) words are given an equal middle rank (2.5).\n\nIn addition, the following inclusion criteria are applied: \n\n- include only participants who fully passed the attention check (i.e., both best and worst answers correct)\n- remove trials with response time \u003c= 3 seconds\n- remove trials with log-response time \u003e= 4 sd (i.e., approx 27 seconds)\n\nThis reduces the total number of trials from 12341 to 10266.\n\nThis long-format data is then stored as an `rds` file in the processed data folder.\n\n### Word data preprocessing\nThe word data processing script [`02_word_preprocess.R`](02_word_preprocess.R) reads the word data from [`data_raw/word_data/`](./data_raw/word_data/) and stores it as processed data (an `rds` file) in the processed data folder.\n\n\u003e NB: for testing, the word data preprocessing script also adds a random item-level predictor to this data: `languagemodel_prediction_evilness`\n\n### Estimating item-level associations\nThe first analysis script [`03_estimate_log_worth.R`](03_estimate_log_worth.R) estimates log-worths for each word in a single word-type category on a single association. It produces the following plot of latent worth on a log-odds scale:\n\n![](img/evil_summary.png)\n\n\n### Predicting item-level associations using item-level predictors\nThe second analysis script [`04_regress_log_worth.R`](04_regress_log_worth.R) performs regression for the log-worths using item-level predictors from the word data. Using this approach, it is possible to perform inference for the regression parameters:\n\n```\n# A tibble: 1 × 10\n  variable                           mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail\n  \u003cchr\u003e                             \u003cdbl\u003e  \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e  \u003cdbl\u003e \u003cdbl\u003e \u003cdbl\u003e    \u003cdbl\u003e    \u003cdbl\u003e\n1 languagemodel_prediction_evilness 0.142  0.144 0.272 0.257 -0.301 0.587  1.00    3201.    4100.\n```\n\n## Contributing\n\nContributions are what make the open source community an amazing place to learn, inspire, and create.\n\nAny contributions you make are greatly appreciated.\n\nTo contribute:\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## Contact\n\n\u003cimg src=\"./img/word_colour-l.png\" alt=\"SoDa logo\" width=\"250px\"/\u003e\n\nThis project is developed and maintained by the [ODISSEI Social Data\nScience (SoDa)](https://odissei-soda.nl) team.\n\nDo you have questions, suggestions, or remarks? File an issue in the\nissue tracker or feel free to contact the team at [`odissei-soda.nl`](https://odissei-soda.nl)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsodascience%2Fbestworst_analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsodascience%2Fbestworst_analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsodascience%2Fbestworst_analysis/lists"}