{"id":13687819,"url":"https://github.com/mvdoc/budapest-fmri-data","last_synced_at":"2025-07-05T16:38:12.649Z","repository":{"id":73067572,"uuid":"251371344","full_name":"mvdoc/budapest-fmri-data","owner":"mvdoc","description":"Quality assurance analyses of fMRI data collected while participants watched The Grand Budapest Hotel by Wes Anderson.","archived":false,"fork":false,"pushed_at":"2020-11-11T17:47:02.000Z","size":23977,"stargazers_count":28,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-29T10:44:23.361Z","etag":null,"topics":["dataset","fmri","fmri-dataset","grand-budapest-hotel","neuroimaging"],"latest_commit_sha":null,"homepage":"https://www.nature.com/articles/s41597-020-00735-4","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mvdoc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-03-30T16:58:06.000Z","updated_at":"2025-03-27T14:04:59.000Z","dependencies_parsed_at":"2024-01-14T16:11:02.581Z","dependency_job_id":"1c071c64-772a-46d9-a4fa-2502e326cbc1","html_url":"https://github.com/mvdoc/budapest-fmri-data","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/mvdoc/budapest-fmri-data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvdoc%2Fbudapest-fmri-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvdoc%2Fbudapest-fmri-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvdoc%2Fbudapest-fmri-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvdoc%2Fbudapest-fmri-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mvdoc","download_url":"https://codeload.github.com/mvdoc/budapest-fmri-data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvdoc%2Fbudapest-fmri-data/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259824810,"owners_count":22917341,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","fmri","fmri-dataset","grand-budapest-hotel","neuroimaging"],"created_at":"2024-08-02T15:01:01.158Z","updated_at":"2025-06-14T13:39:02.390Z","avatar_url":"https://github.com/mvdoc.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"\n\n[![DOI](https://zenodo.org/badge/251371344.svg)](https://zenodo.org/badge/latestdoi/251371344)\n\n\n# An fMRI dataset in response to \"The Grand Budapest Hotel\", a socially-rich, naturalistic movie\n\nThis repository contains quality-assurance scripts for an fMRI dataset collected while 25 participants watched [The Grand Budapest Hotel](https://en.wikipedia.org/wiki/The_Grand_Budapest_Hotel) by Wes Anderson. The associated manuscript *An fMRI dataset in response to \"The Grand Budapest Hotel\", a socially-rich, naturalistic movie* by Matteo Visconti di Oleggio Castello, Vassiki Chauhan, Guo Jiahui, \u0026 M. Ida Gobbini is available as a preprint [here](https://www.biorxiv.org/content/10.1101/2020.07.14.203257v1).\n\nThe dataset is available on OpenNeuro: https://openneuro.org/datasets/ds003017. See below for information on how to install the dataset. If you use the dataset, please cite the corresponding paper:\n\n\u003e Visconti di Oleggio Castello, M., Chauhan, V., Jiahui, G., \u0026 Gobbini, M. I. An fMRI dataset in response to “The Grand Budapest Hotel”, a socially-rich, naturalistic movie. *Sci Data* **7**, 383 (2020). https://doi.org/10.1038/s41597-020-00735-4\n\nThis repository and associated code can be cited as follows:\n\n\u003e Visconti di Oleggio Castello, M., Chauhan,  V., Jiahui, G., \u0026 Gobbini, M. I. (2020).  *mvdoc/budapest-fmri-data*. Zenodo.  http://doi.org/10.5281/zenodo.3942173\n\n## Cloning this repository and downloading the dataset\n\nTo clone this repository, run\n\n```bash\n$ git clone https://github.com/mvdoc/budapest-fmri-data.git\n```\n\nThe OpenNeuro dataset is included in this repository as a git submodule, and it can be downloaded with [DataLad](https://www.datalad.org/) (see also the next section). Once you have cloned the repository, obtaining the data is as simple as\n\n```bash\n$ cd budapest-fmri-data\n$ datalad install data\n# If for example you want to download the data from one subject, you can run\n$ datalad get data/sub-sid000005\n# Alternatively, to get all the data, you can run\n$ datalad get data\n```\n\nThe dataset can also be installed from [DataLad](https://www.datalad.org/) to a different location by running\n\n```bash\n$ datalad install ///labs/gobbini/budapest/openneuro\n```\n\nOr it can be downloaded from the [OpenNeuro's website, dataset ds003017]( https://openneuro.org/datasets/ds003017). \n\nPlease refer to the [DataLad handbook](http://handbook.datalad.org/en/latest/) to learn how to use DataLad.\n\n## Setting up a python environment\n\nWe provide a conda environment file to set up an appropriate python environment for the preprocessing scripts. This environment has been tested on Linux and Mac OS X, however there's a chance it might not work on your system. Please feel free to open an issue here and we'll try to help.\n\nAssuming you have already installed [anaconda or miniconda](https://docs.anaconda.com/anaconda/install/) on your system, you can set up a new conda environment with requirements as follows (note that it can take a while):\n\n```bash\n$ conda env create -f conda-environment.yml --name budapest\n```\n\nOnce all packages have been installed, you should activate the environment and install an additional python package that we provide which contains additional helper functions:\n\n```bash\n$ conda activate budapest\n$ pip install ./code\n```\n\n## Presentation, preprocessing, and quality assurance scripts\n\nIn this repository we provide the scripts used to generate and preprocess the stimuli, to present the stimuli in the scanner, to preprocess the fMRI data, and to run quality assurance analyses. These scripts can be found in the [`scripts`](scripts) directory. In particular,\n\n- [`scripts/preprocessing-stimulus`](scripts/preprocessing-stimulus) contains the scripts to\n  split the movie into separate parts to be presented in the scanner, and preprocess the audio of the movie to make it more audible in the scanner.\n- [`scripts/presentation`](scripts/presentation) contains PsychoPy presentation scripts.\n- [`scripts/preprocessing-fmri`](scripts/preprocessing-fmri) contains the scripts used to run [fMRIprep](https://fmriprep.readthedocs.io/) for preprocessing.\n- [`scripts/quality-assurance`](scripts/quality-assurance) contains scripts to run QA analyses and generate the figures reported in the data paper.\n- [`scripts/hyperalignment-and-decoding`](hyperalignment-and-decoding) contains scripts to perform hyperalignment and movie segment classification.\n- [`notebooks`](notebooks) contains jupyter notebooks to generate figures and run additional analyses.\n\nBelow we describe the content of these directories and their role in the analyses.\n\n### Stimulus preprocessing\n\nThe movie was extracted from a DVD and converted into mkv (`libmkv 0.6.5.1`) format using [HandBrake](https://handbrake.fr/). Unfortunately, this process was not scripted. The DVD had [UPC code 024543897385](https://www.upcitemdb.com/upc/24543897385). We provide additional metadata associated with the converted movie file to make sure that future conversions would match our stimuli as best as possible. The information is available in [`scripts/preprocessing-stimulus/movie-file-info.txt`](scripts/preprocessing-stimulus/movie-file-info.txt). The total duration of the movie was `01:39:55.17`. The video and audio were encoded with the following codecs:\n\n```\nStream #0:0(eng): Video: h264 (High), yuv420p(tv, smpte170m/smpte170m/bt709, progressive), 720x480 [SAR 32:27 DAR 16:9], SAR 186:157 DAR 279:157, 30 fps, 30 tbr, 1k tbn, 60 tbc (default)\nStream #0:1(eng): Audio: ac3, 48000 Hz, stereo, fltp, 160 kb/s (default)\nStream #0:2(eng): Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s\n```\n\nOnce the movie was extracted and converted, it was split into different parts for a behavioral session and five imaging runs. The times for the behavioral session are available in [`scripts/preprocessing-stimulus/splits_behav.txt`](scripts/preprocessing-stimulus/splits_behav.txt). These first ~45 minutes of the movie were shown outside the scanner, right before the imaging session. The times of the five additional splits of the second part of the movie are available in [`scripts/preprocessing-stimulus/splits.txt`](scripts/preprocessing-stimulus/splits.txt). Each row indicates a pair of start/end times for each split.\n\nWe also provide the scripts used to generate these splits, which used `ffmpeg`. While the movies were converted, the audio was also postprocessed and passed through an audio compressor to reduce the dynamic range and make dialogues more audible in the scanner. These scripts are  [`scripts/preprocessing-stimulus/split_movie_behav.sh`](scripts/preprocessing-stimulus/split_movie_behav.sh) and [`scripts/preprocessing-stimulus/split_movie.sh`](scripts/preprocessing-stimulus/split_movie.sh), for the behavioral and imaging sessions respectively. They produce six files named `budapest_part[1-6].mp4` that were used for the experiment.\n\nDuring the first anatomical scan, subjects were shown the last five minutes of `budapest_part1.mp4` so that they could select an appropriate volume for the remaining five functional scans. The clip showed during the anatomical scan is generated by the script [`scripts/preprocessing-stimulus/split_part1_soundcheck.sh`](scripts/preprocessing-stimulus/split_part1_soundcheck). This script generates a file named `budapest_soundcheck.mp4`. \n\n### Presentation scripts\n\nFor the behavioral session outside the scanner, subjects were  shown `budapest_part1.mp4` (generated as described above) using VLC and high-quality headphones. Subjects could adjust the volume as much as they liked, and no instructions were given.\n\nAll presentation scripts used [PsychoPy](https://www.psychopy.org/). Unfortunately, we are unable to access the computer used for presentation, so we cannot provide the specific version used in our experiment. Any recent version of PsychoPy should be able to run the presentation code. Feel free to open an issue on this repository if you encounter problems.\n\nAll presentation scripts assume that the stimuli are placed in a subdirectory named `stim`.\n\nDuring the anatomical scan, subjects were shown the last five minutes of the part they saw outside the scanner. This was done so that subjects could select an appropriate volume. The presentation script used for this run is [`scripts/presentation/soundcheck.py`](scripts/presentation/soundcheck.py). The subject can decrease/increase the volume using the buttons `1` and `2` respectively. Once the script has run, it saves the volume level in a json file called `subjectvolume.json`. This is an example of such file\n\n```json\n{\n \"sid000020\": 1.0,\n \"sid000021\": 0.5,\n \"sid000009\": 0.75,\n}\n```\n\nThe presentation script used for the functional imaging runs is [`scripts/presentation/show_movie.py`](scripts/presentation/show_movie.py). Some (limited) config values can be defined in the config json file [`scripts/presentation/config.json`](scripts/presentation/config.json). Once the presentation script is loaded, it shows a dialog box to select the subject id and the run number. The volume is automatically selected by loading the volume information stored in `subjectvolume.json`. Log files are stored in a subdirectory named `res`. It's possible to stop the experiment at any point using `CTRL + q`. In that case, the logs are flushed, saved, and moved to a file with suffix `__halted.txt`. \n\nThe logs save detailed timing information (perhaps eccessive) about each frame. By default, useful information for extracting event files is logged with a `BIDS` log level. Thus, one can easily generate a detailed events file by grepping `BIDS`. For example\n\n```bash\n$ grep BIDS sub-test_task-movie_run-1_20200916T114100.txt | awk '{for (i=3; i\u003cNF; i++) printf $i\"\\t\";print $NF}' | head -20\nonset\tduration\tframeidx\tvideotime\tlasttrigger\n10.008\t{duration:.3f}\t1\t0.000\t9.000\n10.009\t{duration:.3f}\t2\t0.000\t10.008\n10.011\t{duration:.3f}\t3\t0.000\t10.008\n10.013\t{duration:.3f}\t4\t0.000\t10.008\n10.015\t{duration:.3f}\t5\t0.000\t10.008\n10.019\t{duration:.3f}\t6\t0.000\t10.008\n10.021\t{duration:.3f}\t7\t0.000\t10.008\n10.032\t{duration:.3f}\t8\t0.000\t10.008\n10.045\t{duration:.3f}\t9\t0.000\t10.008\n10.059\t{duration:.3f}\t10\t0.033\t10.008\n10.072\t{duration:.3f}\t11\t0.033\t10.008\n10.085\t{duration:.3f}\t12\t0.033\t10.008\n10.099\t{duration:.3f}\t13\t0.067\t10.008\n10.112\t{duration:.3f}\t14\t0.067\t10.008\n10.125\t{duration:.3f}\t15\t0.100\t10.008\n10.139\t{duration:.3f}\t16\t0.100\t10.008\n10.152\t{duration:.3f}\t17\t0.100\t10.008\n10.165\t{duration:.3f}\t18\t0.133\t10.008\n10.179\t{duration:.3f}\t19\t0.133\t10.008\n```\n\nThe available columns are `onset` (frame onset); `duration` (containing a python format string so that duration information can be added with a trivial parser); `frameidx` (index of the frame shown); `videotime` (time of the video); `lasttrigger` (time of the last received trigger).\n\nWe provide a simplified events file with the published BIDS dataset. These events file were generated in the notebook  [`notebooks/2020-06-08_make-event-files.ipynb`](notebooks/2020-06-08_make-event-files.ipynb).\n\n### fMRI preprocessing with fMRIprep\n\nThe dataset was preprocessed using [fMRIprep](https://fmriprep.org) (version 20.1.1) in a singularity container. To obtain the container, run the following line (assuming you have singularity installed):\n\n```bash\nVERSION=\"20.1.1\"; singularity build fmriprep-\"$VERSION\".simg docker://poldracklab/fmriprep:\"$VERSION\"\n```\n\nIn [`scripts/preprocessing-fmri`](scripts/preprocessing-fmri) we provide the scripts that were used to run fMRIprep on the Dartmouth HPC cluster ([Discovery](https://rc.dartmouth.edu/index.php/discovery-overview/)). Please consider those scripts as an example, and refer to the documentation of fMRIprep for more details on preprocessing.\n\n### Quality assurance scripts\n\nWe performed QA analyses looking at subject's motion, temporal SNR (tSNR), inter-subject correlation (ISC), and time-segment classification after hyperalignment. Please refer to the manuscript for more details.\n\nThe script [`scripts/quality-assurance/compute-motion.py`](scripts/quality-assurance/compute-motion.py) and notebook  [`notebooks/2020-07-07_compute-outliers-and-median-motion.ipynb`](notebooks/2020-07-07_compute-outliers-and-median-motion.ipynb) were used to inspect subject's motion across subjects and to compute additional metrics.\n\nThe scripts [`scripts/quality-assurance/compute-tsnr-volume.py`](scripts/quality-assurance/compute-tsnr-volume.py) and [`scripts/quality-assurance/compute-tsnr-fsaverage.py`](scripts/quality-assurance/compute-tsnr-fsaverage.py) were used to estimate tSNR in the subject's native space (volume) and in fsaverage. The scripts load the fMRIprep-processed data and perform denoising (as described in the manuscript and implemented in [`budapestcode.utils.clean_data`](https://github.com/mvdoc/budapest-fmri-data/blob/7b9059a1ead5002368487d8376c7345acc4e5511/code/budapestcode/utils.py#L55)) prior to computing tSNR. The tSNR values in volumetric space are plotted in a violin plot across subjects in [`notebooks/2020-04-04_plot-tsnr-group.ipynb`](notebooks/2020-04-04_plot-tsnr-group.ipynb). The script [`scripts/quality-assurance/plot-tsnr-fsaverage.py`](scripts/quality-assurance/plot-tsnr-fsaverage.py) plots the median tSNR across subjects on fsaverage using [pycortex](https://gallantlab.github.io/pycortex).\n\nThe script  [`scripts/quality-assurance/compute-isc-fsaverage.py`](scripts/quality-assurance/compute-isc-fsaverage.py) computes inter-subject correlation on data projected to fsaverage, after denoising the data as described in the manuscript. The median ISC across subjects is plotted with [pycortex](https://gallantlab.github.io/pycortex) in [`scripts/quality-assurance/plot-isc-fsaverage.py`](scripts/quality-assurance/plot-isc-fsaverage.py).\n\nThe data was hyperaligned with [PyMVPA](https://www.pymvpa.org) using the script [`scripts/hyperalignment-and-decoding/hyperalignment_pymvpa_splits.py`](scripts/hyperalignment-and-decoding/hyperalignment_pymvpa_splits.py) and time-segment classification across subjects was performed using the script [`scripts/hyperalignment-and-decoding/decoding_segments_splits.py`](scripts/hyperalignment-and-decoding/decoding_segments_splits.py). Please refer to the manuscript for more details on these analyses.\n\n## Acknowledgements\n\nThis work was supported by the [NSF grant #1835200](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1835200) to M. Ida Gobbini. We would like to thank Jim Haxby, Yaroslav Halchenko, Sam Nastase, and the members of the Gobbini and Haxby lab for helpful discussions during the development of this project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmvdoc%2Fbudapest-fmri-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmvdoc%2Fbudapest-fmri-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmvdoc%2Fbudapest-fmri-data/lists"}