{"id":19844571,"url":"https://github.com/repsychling/sim-tutorial","last_synced_at":"2025-07-15T06:42:29.596Z","repository":{"id":110422512,"uuid":"240323983","full_name":"RePsychLing/sim-tutorial","owner":"RePsychLing","description":"Tutorial for power simulation","archived":false,"fork":false,"pushed_at":"2020-08-27T14:35:07.000Z","size":2167,"stargazers_count":3,"open_issues_count":1,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-24T06:53:57.088Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RePsychLing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-13T17:47:28.000Z","updated_at":"2021-09-14T20:33:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"5e824c01-29cd-4082-aa4d-28cee05c0d44","html_url":"https://github.com/RePsychLing/sim-tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RePsychLing%2Fsim-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RePsychLing%2Fsim-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RePsychLing%2Fsim-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RePsychLing%2Fsim-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RePsychLing","download_url":"https://codeload.github.com/RePsychLing/sim-tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241223634,"owners_count":19929733,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T13:04:42.688Z","updated_at":"2025-02-28T20:14:50.363Z","avatar_url":"https://github.com/RePsychLing.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\ntitle: \"Simulation Tutorial\"\nauthor: \"Lisa DeBruine\"\ndate: 2020-02-17\n---\n\n## Setup \n\n### Julia\n\nLoad the packages we'll be using in Julia. In pkg run the following to get the versions we're using:\n\n* `add MixedModels#master`\n* `add https://github.com/RePsychLing/MixedModelsSim.jl#master`\n\n~~~~{.julia}\njulia\u003e using Pkg \n\njulia\u003e Pkg.activate()\nActivating environment at `~/Desktop/Julia/sim-tutorial/Project.toml`\n\njulia\u003e Pkg.instantiate()\n\njulia\u003e \nusing MixedModels        # run mixed models\n\njulia\u003e using MixedModelsSim     # simulation functions for mixed models\n\njulia\u003e using RCall              # call R functions from inside Julia\n\njulia\u003e using DataFrames, Tables # work with data tables\n\njulia\u003e using Random             # random number generator\n\njulia\u003e using CSV                # write CSV files\n\n~~~~~~~~~~~~~\n\n\n\n\n\n### R\n\nAlso load any packages we'll be using in R through `RCall()`.\n\n~~~~{.julia}\nR\"\"\"\nrequire(ggplot2, quietly = TRUE) # for visualisation\nrequire(dplyr, quietly = TRUE)   # for data wrangling\nrequire(tidyr, quietly = TRUE)   # for data wrangling\n\"\"\";\n~~~~~~~~~~~~~\n\n\n\n\n\n### Define Custom functions\n\nIt's useful to be able to weave your file quickly while you're debugging, \nso set the number of simulations to a relatively low number while you're \nsetting up your script and change it to a larger number when everything\nis debugged.\n\n~~~~{.julia}\nnsims = 1000 # set to a low number for test, high for production\n~~~~~~~~~~~~~\n\n\n~~~~\n1000\n~~~~\n\n\n\n\n\n#### Define: ggplot_betas\n\nThis function plots the beta values returned from `simulate_waldtests` using ggplot in R.\nIf you set a figname, it will save the plot to the specified file.\n\n~~~~{.julia}\nfunction ggplot_betas(sim, figname = 0, width = 7, height = 5) \n\n    beta_df = DataFrame(columntable(sim).β)\n\n    R\"\"\"\n        p \u003c- $beta_df %\u003e%\n            gather(var, val, 1:ncol(.)) %\u003e%\n            ggplot(aes(val, color = var)) +\n            geom_density(show.legend = FALSE) +\n            facet_wrap(~var, scales = \"free\")\n\n        if (is.character($figname)) {\n            ggsave($figname, p, width = $width, height = $height)\n        }\n\n        p\n    \"\"\"\nend\n~~~~~~~~~~~~~\n\n\n~~~~\nggplot_betas (generic function with 4 methods)\n~~~~\n\n\n\n\n\n## Existing Data\n\nLoad existing data from this morning's tutorial. Set the contrasts and run model 4 from the tutorial.\n\n~~~~{.julia}\n# load data\nkb07 = MixedModels.dataset(\"kb07\");\n\n# set contrasts\ncontrasts = Dict(:spkr =\u003e HelmertCoding(), \n                 :prec =\u003e HelmertCoding(), \n                 :load =\u003e HelmertCoding());\n\n# define formula\nkb07_f = @formula( rt_trunc ~ 1 + spkr+prec+load + (1|subj) + (1+prec|item) );\n\n# fit model\nkb07_m = fit(MixedModel, kb07_f, kb07, contrasts=contrasts)\n~~~~~~~~~~~~~\n\n\n~~~~\nLinear mixed model fit by maximum likelihood\n rt_trunc ~ 1 + spkr + prec + load + (1 | subj) + (1 + prec | item)\n     logLik        -2 logLik          AIC             BIC       \n -1.43319251×10⁴  2.86638501×10⁴  2.86818501×10⁴  2.87312548×10⁴\n\nVariance components:\n             Column      Variance   Std.Dev.   Corr.\nitem     (Intercept)     133015.240 364.71254\n         prec: maintain   63766.936 252.52116 -0.70\nsubj     (Intercept)      88819.437 298.02590\nResidual                 462443.388 680.03190\n Number of obs: 1789; levels of grouping factors: 32, 56\n\n  Fixed-effects parameters:\n──────────────────────────────────────────────────────\n                 Estimate  Std.Error  z value  P(\u003e|z|)\n──────────────────────────────────────────────────────\n(Intercept)     2181.85      77.4681    28.16   \u003c1e-99\nspkr: old         67.879     16.0785     4.22   \u003c1e-4 \nprec: maintain  -333.791     47.4472    -7.03   \u003c1e-11\nload: yes         78.5904    16.0785     4.89   \u003c1e-5 \n──────────────────────────────────────────────────────\n~~~~\n\n\n\n\n\n### Simulate data with same parameters\n\nUse the `simulate_waldtests()` function to run 1000 iterations of data sampled using the parameters from `m4`. Set up a random seed to make the simulation reproducible. You can use your favourite number.\n\nTo use multithreading, you need to set the number of cores you want to use. In Visual Studio Code, open the settings (gear icon in the lower left corner or cmd-,) and search for \"thread\". Set `julia.NumThreads` to the number of cores you want to use (at least 1 less than your total number).\n\n~~~~{.julia}\n# set seed for reproducibility\nrng = MersenneTwister(8675309);\n\n# run nsims iterations\nkb07_sim = simulate_waldtests(rng, nsims, kb07_m, use_threads = true);\n~~~~~~~~~~~~~\n\n\n\n\n\n**Try**: Run the code above with and without `use_threads`.\n\nSave all data to a csv file.\n\n~~~~{.julia}\nkb07_sim_df = sim_to_df(kb07_sim)\n\nCSV.write(\"sim/kb07_sim.csv\", kb07_sim_df)\n\nfirst(kb07_sim_df, 8)\n~~~~~~~~~~~~~\n\n\n~~~~\n8×6 DataFrame. Omitted printing of 1 columns\n│ Row │ iteration │ coefname       │ beta     │ se       │ z        │\n│     │ Int64     │ Symbol         │ Float64⍰ │ Float64⍰ │ Float64⍰ │\n├─────┼───────────┼────────────────┼──────────┼──────────┼──────────┤\n│ 1   │ 1         │ (Intercept)    │ 2248.0   │ 84.8585  │ 26.4912  │\n│ 2   │ 1         │ load: yes      │ 48.9212  │ 15.8225  │ 3.09187  │\n│ 3   │ 1         │ prec: maintain │ -320.632 │ 45.4898  │ -7.04844 │\n│ 4   │ 1         │ spkr: old      │ 62.8771  │ 15.8225  │ 3.9739   │\n│ 5   │ 2         │ (Intercept)    │ 2165.36  │ 79.8084  │ 27.132   │\n│ 6   │ 2         │ load: yes      │ 91.9312  │ 16.2066  │ 5.67246  │\n│ 7   │ 2         │ prec: maintain │ -353.079 │ 37.4427  │ -9.42985 │\n│ 8   │ 2         │ spkr: old      │ 46.542   │ 16.2066  │ 2.87179  │\n~~~~\n\n\n\n\n\nPlot betas in ggplot. In the code editor or Jupyter notebooks, you can omit the file name to just display the figure in an external window. \n\n~~~~{.julia}\n# just display the image\n# ggplot_betas(kb07_sim) \n\n# save the image to a file and display (display doesn't work in weave)\nggplot_betas(kb07_sim, \"fig/kb07_betas.png\");\n~~~~~~~~~~~~~\n\n\n\n\n\nIn documents you want to weave, save the image to a file and use markdown to display the file. Add a semicolon to the end of the function to suppress creating the images in new windows during weaving.\n\n![](fig/kb07_betas.png)\n\n\n### Power calculation\n\nThe function `power_table()` from `MixedModelsSim` takes the output of `simulate_waldtests()` and calculates the proportion of simulations where the p-value is less than alpha for each coefficient. You can set the `alpha` argument to change the default value of 0.05 (justify your alpha ;).\n\n~~~~{.julia}\npower_table(kb07_sim)\n~~~~~~~~~~~~~\n\n\n~~~~\n4×2 DataFrame\n│ Row │ coefname       │ power   │\n│     │ Symbol         │ Float64 │\n├─────┼────────────────┼─────────┤\n│ 1   │ (Intercept)    │ 1.0     │\n│ 2   │ spkr: old      │ 0.991   │\n│ 3   │ prec: maintain │ 1.0     │\n│ 4   │ load: yes      │ 0.999   │\n~~~~\n\n\n\n\n\n### Change parameters\n\nLet's say we want to check our power to detect effects of spkr, prec, and load \nthat are half the size of our pilot data. We can set a new vector of beta values \nwith the `β` argument to `simulate_waldtests`.\n\n~~~~{.julia}\nnewβ = kb07_m.β\nnewβ[2:4] = kb07_m.β[2:4]/2\n\nkb07_sim_half = simulate_waldtests(rng, nsims, kb07_m, β = newβ, use_threads = true);\n\npower_table(kb07_sim_half)\n~~~~~~~~~~~~~\n\n\n~~~~\n4×2 DataFrame\n│ Row │ coefname       │ power   │\n│     │ Symbol         │ Float64 │\n├─────┼────────────────┼─────────┤\n│ 1   │ (Intercept)    │ 1.0     │\n│ 2   │ spkr: old      │ 0.529   │\n│ 3   │ prec: maintain │ 0.928   │\n│ 4   │ load: yes      │ 0.692   │\n~~~~\n\n\n\n\n\n\n# Simulating Data from Scratch\n\n\n## simdat_crossed\n\nThe `simdat_crossed()` function from `MixedModelsSim` lets you set up a data frame with a specified experimental design. For now, it only makes fully balanced crossed designs, but you can generate an unbalanced design by simulating data for the largest cell and deleting extra rows. \n\nWe will set a design where `subj_n` subjects per `age` group (O or Y) respond to `item_n` items in each of two `condition`s (A or B).\n\nYour factors need to be specified separately for between-subject, between-item, and within-subject/item factors using `Dict` with the name of each factor as the keys and vectors with the names of the levels as values.\n\n~~~~{.julia}\n# put between-subject factors in a Dict\nsubj_btwn = Dict(\"age\" =\u003e [\"O\", \"Y\"])\n\n# there are no between-item factors in this design so you can omit it or set it to nothing\nitem_btwn = nothing\n\n# put within-subject/item factors in a Dict\nboth_win = Dict(\"condition\" =\u003e [\"A\", \"B\"])\n\n# simulate data\ndat = simdat_crossed(10, 30, \n                     subj_btwn = subj_btwn, \n                     item_btwn = item_btwn, \n                     both_win = both_win);\n~~~~~~~~~~~~~\n\n\n\n\n\n\n## Fit a model\n\nNow you need to fit a model to your simulated data. Because the `dv` is just random numbers from N(0,1), there will be basically no subject or item random variance, residual variance will be near 1.0, and the estimates for all effects should be small. Don't worry, we'll specify fixed and random effects directly in `simulate_waldtests`. \n\n~~~~{.julia}\n# set contrasts\ncontrasts = Dict(:age =\u003e HelmertCoding(), \n                 :condition =\u003e HelmertCoding());\n\nf1 = @formula dv ~ 1 + age * condition + (1|item) + (1|subj);\nm1 = fit(MixedModel, f1, dat, contrasts=contrasts)\n~~~~~~~~~~~~~\n\n\n~~~~\nLinear mixed model fit by maximum likelihood\n dv ~ 1 + age + condition + age \u0026 condition + (1 | item) + (1 | subj)\n   logLik   -2 logLik     AIC        BIC    \n -1735.9276  3471.8552  3485.8552  3521.4858\n\nVariance components:\n            Column    Variance   Std.Dev.  \nitem     (Intercept)  0.00000000 0.00000000\nsubj     (Intercept)  0.00556038 0.07456796\nResidual              1.05205446 1.02569706\n Number of obs: 1200; levels of grouping factors: 30, 20\n\n  Fixed-effects parameters:\n───────────────────────────────────────────────────────────────\n                          Estimate  Std.Error  z value  P(\u003e|z|)\n───────────────────────────────────────────────────────────────\n(Intercept)             0.0137907   0.0339813     0.41   0.6849\nage: Y                  0.00847966  0.0339813     0.25   0.8029\ncondition: B           -0.0257098   0.0296093    -0.87   0.3852\nage: Y \u0026 condition: B  -0.00811351  0.0296093    -0.27   0.7841\n───────────────────────────────────────────────────────────────\n~~~~\n\n\n\n\n\n## Simulate\n\nSet a seed for reproducibility and specify β, σ, and θ.\n\n~~~~{.julia}\nrng = MersenneTwister(8675309);\n\nnew_beta = [0, 0.25, 0.25, 0]\nnew_sigma = 2.0\nnew_theta = [1.0, 1.0]\n\nsim1 = simulate_waldtests(rng, nsims, m1, \n                        β = new_beta, \n                        σ = new_sigma, \n                        θ = new_theta,\n                        use_threads = true);\n~~~~~~~~~~~~~\n\n\n\n\n\n\n## Explore simulation output\n\n\n~~~~{.julia}\nggplot_betas(sim1, \"fig/simbetas.png\");\n~~~~~~~~~~~~~\n\n\n\n\n\n![](fig/simbetas.png)\n\n\n## Power\n\n~~~~{.julia}\npower_table(sim1)\n~~~~~~~~~~~~~\n\n\n~~~~\n4×2 DataFrame\n│ Row │ coefname              │ power   │\n│     │ Symbol                │ Float64 │\n├─────┼───────────────────────┼─────────┤\n│ 1   │ (Intercept)           │ 0.062   │\n│ 2   │ age: Y                │ 0.132   │\n│ 3   │ condition: B          │ 0.989   │\n│ 4   │ age: Y \u0026 condition: B │ 0.056   │\n~~~~\n\n\n\n\n\n## Try your own design\n\nEdit `my_dat` below and make sure `my_f` is updated for your new design. Also make sure `my_beta` has the right number of elements (check `my_m.β` for the number and order). You can also change `my_sigma` and `my_theta`. Set the seed in `my_rng` to your favourite number.\n\n~~~~{.julia}\nmy_dat = simdat_crossed(10, 10)\n\nmy_f = @formula dv ~ 1 + (1|item) + (1|subj);\nmy_m = fit(MixedModel, my_f, my_dat)\n\nmy_beta = [0.0]\nmy_sigma = 2.0\nmy_theta = [1.0, 1.0]\n\nmy_rng = MersenneTwister(8675309);\n\nmy_nsims = 1000\n\nmy_sim = simulate_waldtests(my_rng, my_nsims, my_m, \n                        β = my_beta, \n                        σ = my_sigma, \n                        θ = my_theta,\n                        use_threads = true);\n\npower_table(my_sim)\n~~~~~~~~~~~~~\n\n\n~~~~\n1×2 DataFrame\n│ Row │ coefname    │ power   │\n│     │ Symbol      │ Float64 │\n├─────┼─────────────┼─────────┤\n│ 1   │ (Intercept) │ 0.083   │\n~~~~\n\n\n\n\n\n\n## Write a function to vary something\n\n~~~~{.julia}\nfunction mysim(subj_n, item_n, nsims = 1000, \n               beta  = [0, 0, 0, 0],\n               sigma = 2.0, \n               theta = [1.0, 1.0],\n               seed = convert(Int64, round(rand()*1e8))\n               )\n    # generate data\n    dat = simdat_crossed(subj_n, item_n, subj_btwn = subj_btwn, both_win = both_win )\n\n    # set contrasts\n    contrasts = Dict(:age =\u003e HelmertCoding(), \n                     :condition =\u003e HelmertCoding());\n\n    # set up model\n    f = @formula dv ~ 1 + age*condition + (1|item) + (1|subj);\n    m = fit(MixedModel, f, dat, contrasts=contrasts)\n\n    # run simulation\n    rng = MersenneTwister(seed);\n\n    simulate_waldtests(\n        rng, nsims, m, \n        β = beta, \n        σ = sigma, \n        θ = theta, \n        use_threads = true\n    );\nend\n~~~~~~~~~~~~~\n\n\n~~~~\nmysim (generic function with 6 methods)\n~~~~\n\n\n\n\n\nRun simulations over a range of values for any parameter.\n\n~~~~{.julia}\n# varying\nsubj_ns = [20, 30, 40]\nitem_ns = [10, 20, 30]\n\n# fixed\nnsims = 1000\nnew_beta = [0, 0.4, 0.1, 0]\nnew_sigma = 2.0\nnew_theta = [1.0, 1.0]\n\nd = DataFrame()\n\nfor subj_n in subj_ns\n    for item_n in item_ns\n        s = mysim(subj_n, item_n, nsims, new_beta, new_sigma, new_theta);\n        pt = power_table(s)\n        pt[!, :item_n] .= item_n\n        pt[!, :sub_n] .= subj_n\n        append!(d, pt)\n    end\nend\n\n# save the data in long format\nCSV.write(\"sim/power.csv\", d)\n\n# spread the table for easier viewing\nunstack(d, :coefname, :power)\n~~~~~~~~~~~~~\n\n\n~~~~\n9×6 DataFrame. Omitted printing of 1 columns\n│ Row │ item_n │ sub_n │ (Intercept) │ age: Y   │ age: Y \u0026 condition: B │\n│     │ Int64  │ Int64 │ Float64⍰    │ Float64⍰ │ Float64⍰              │\n├─────┼────────┼───────┼─────────────┼──────────┼───────────────────────┤\n│ 1   │ 10     │ 20    │ 0.069       │ 0.236    │ 0.041                 │\n│ 2   │ 10     │ 30    │ 0.069       │ 0.35     │ 0.03                  │\n│ 3   │ 10     │ 40    │ 0.093       │ 0.425    │ 0.044                 │\n│ 4   │ 20     │ 20    │ 0.065       │ 0.246    │ 0.049                 │\n│ 5   │ 20     │ 30    │ 0.055       │ 0.358    │ 0.051                 │\n│ 6   │ 20     │ 40    │ 0.054       │ 0.45     │ 0.056                 │\n│ 7   │ 30     │ 20    │ 0.077       │ 0.265    │ 0.054                 │\n│ 8   │ 30     │ 30    │ 0.046       │ 0.34     │ 0.047                 │\n│ 9   │ 30     │ 40    │ 0.069       │ 0.439    │ 0.049                 │\n~~~~\n\n\n\n\n\n## Convert this file \n\n~~~~{.julia}\n\n# using Weave\n\n# convert to html\n# weave(\"simulation_tutorial.jmd\")\n\n# convert to a python notebook\n# convert_doc(\"simulation_tutorial.jmd\", \"simulation_tutorial.ipynb\")\n\n# convert to md for README\n# weave(\"simulation_tutorial.jmd\", doctype=\"pandoc\", out_path = \"README.md\")\n~~~~~~~~~~~~~\n\n## Acknowledgements\n\nThis work was supported by the Center for Interdisciplinary Research, Bielefeld (ZiF) Cooperation Group \"Statistical models for psychological and linguistic data\".\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frepsychling%2Fsim-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frepsychling%2Fsim-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frepsychling%2Fsim-tutorial/lists"}