{"id":19713375,"url":"https://github.com/ctsit/nacculator","last_synced_at":"2025-04-29T18:31:10.767Z","repository":{"id":37547166,"uuid":"52021680","full_name":"ctsit/nacculator","owner":"ctsit","description":"Converts a CSV data file exported from REDCap into the NACC's UDS3 fixed-width format.","archived":false,"fork":false,"pushed_at":"2025-02-12T19:17:19.000Z","size":1163,"stargazers_count":9,"open_issues_count":11,"forks_count":24,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-04-17T23:39:57.582Z","etag":null,"topics":["adrc","csv","python","redcap"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ctsit.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null}},"created_at":"2016-02-18T16:33:08.000Z","updated_at":"2024-08-26T17:49:48.000Z","dependencies_parsed_at":"2023-10-13T12:36:50.790Z","dependency_job_id":"d99f30ae-d5a0-4fc7-88ed-63a2f5a20ba5","html_url":"https://github.com/ctsit/nacculator","commit_stats":{"total_commits":482,"total_committers":19,"mean_commits":25.36842105263158,"dds":0.5228215767634854,"last_synced_commit":"d0e69dea1e1e69d41584803c92c1c3f02865a358"},"previous_names":[],"tags_count":39,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctsit%2Fnacculator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctsit%2Fnacculator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctsit%2Fnacculator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctsit%2Fnacculator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ctsit","download_url":"https://codeload.github.com/ctsit/nacculator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251559848,"owners_count":21609087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adrc","csv","python","redcap"],"created_at":"2024-11-11T22:21:32.361Z","updated_at":"2025-04-29T18:31:10.189Z","avatar_url":"https://github.com/ctsit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"NACCulator\n==========\n\n[![DOI](https://zenodo.org/badge/20501/ctsit/nacculator.svg)](https://zenodo.org/badge/latestdoi/20501/ctsit/nacculator)\n\nNACCulator is a Python 3-based data converter that changes REDCap .csv exported\ndata to NACC’s fixed-width .txt format. It is configured for UDS3 forms,\nincluding FTLD and LBD (versions 3.0 and 3.1). It will perform basic data\nintegrity checks during a run: verifying that each field is the correct type\nand length, verifying that there are no illegal characters in the Char fields,\nverifying that Num fields are within the acceptable range as defined in NACC's\nData Element Dictionary for each form, and checking that no blanking rules have\nbeen violated. NACCulator outputs a .txt file that is immediately ready to\nsubmit to NACC's database.\n\n_Note:_ NACCulator _**requires Python 3.**_\n\n\nHOW TO Convert from REDCap to NACC\n----------------------------------\n\nTo install NACCulator, run:\n\n    $ pip3 install git+https://github.com/ctsit/nacculator.git\n\nOnce the project data is exported from REDCap to the CSV file `data.csv`, run:\n\n    $ redcap2nacc \u003cdata.csv \u003edata.txt\n\nThis command will work only in the simplest case; UDS3 IVP data only.\nNACCulator will automatically skip PTIDs with errors, so the output `data.txt`\nfile will be ready to submit to NACC.\nIn order to properly filter the data in the csv, NACCulator is expecting that\nREDCap visits (denoted by `redcap_event_name`) contain certain keywords:\n    \"initial\" for all initial visit packets (including telephone and optional modules such as lbd),\n    \"follow\" for all followups (including version 3.1 telephone and optional modules),\n    \"milestone\" for milestone packets,\n    \"neuropath\" for neuropathology packets,\n    \"tele\" for old (version 3.0) telephone followups,\n    \"covid\" for covid-related survey packets\n\nNACCulator collects data from the Z1X form first and uses that to determine the\npresence of other forms in the packet. The Z1X form for that record must be\nmarked \"Unverified\" or \"Complete\" for NACCulator to recognize the record, and\neach optional form must be marked as submitted within the Z1X for NACCulator to\nfind those forms.\n\n_Note: For UDS visits (the -ivp and -fvp flags), NACCulator also expects the\nA1 subject demographics form to be either Unverified or Complete._\n\n_Note: output is written to `STDOUT`; errors are written to `STDERR`; input is\nexpected to be from `STDIN` (the command line) unless a file is specified using\nthe `-file` flag._\n\n\n### Usage\n\n    $ redcap2nacc -h\n    usage: redcap2nacc [-h]\n                       [-fvp | -ivp | -tip | -tfp | -tfp3 | -np | -np10 | -m | -cv | -csf | -f {cleanPtid,replaceDrugId,fixHeaders,fillDefault,updateField,removePtid,removeDateRecord,getPtid}]\n                       [-lbd | lbdsv | -ftld] [-file FILE] [-meta FILTER_META] [-ptid PTID]\n                       [-vnum VNUM] [-vtype VTYPE]\n\n    Process redcap export data through nacculator.\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      -fvp                  Set this flag to process as FVP data\n      -ivp                  Set this flag to process as IVP data\n      -tfp                  Set this flag to process as Telephone Followup Packet v3.2 data\n      -tip                  Set this flag to process as Telephone Initial Packet data\n      -tfp3                 Set this flag to process as TFP v3.0 (pre-2020) data\n      -np                   Set this flag to process as Neuropathology version 11 data\n      -np10                 Set this flag to process as Neuropathology version 10 data\n      -m                    Set this flag to process as Milestone data\n      -cv                   Set this flag to process as COVID data\n      -csf                  Set this flag to process as NACC BIDSS CSF data\n\n      -f {cleanPtid,replaceDrugId,fixHeaders,fillDefault,updateField,removePtid,removeDateRecord,getPtid}, --filter {cleanPtid,replaceDrugId,fixHeaders,fillDefault,updateField,removePtid,removeDateRecord,getPtid}\n                              Set this flag to run the data through the chosen filter\n      -lbd                  Set this flag to process as Lewy Body Dementia data (FORMVER = 3)\n      -lbdsv                Set this flag to process as Lewy Body Dementia short version data (FORMVER = 3.1)\n      -ftld                 Set this flag to process as Frontotemporal Lobar Degeneration data\n\n      -file FILE            Path of the csv file to be processed\n      -meta FILTER_META     Input file for the filter metadata (in case -filter is used)\n      -ptid PTID            Ptid for which you need the records\n      -vnum VNUM            Visit number for which you need the records\n      -vtype VTYPE          Visit type for which you need the records\n\n\n**Example** - Process a Neuropathology form:\n\n    $ redcap2nacc -np -file data.csv \u003edata.txt\n\n**Example** - Processing LBD Follow-up visit packets:\n\n    redcap2nacc -lbd -fvp -file data.csv \u003edata.txt\n\nBoth LBD / LBDSV and FTLD forms can have IVP or FVP arguments.\n\n**Example** - Run data through the `cleanPtid` filter:\n\n    $ redcap2nacc -f cleanPtid -meta nacculator_cfg.ini \u003cdata.csv \u003efiltered_data.csv\n\n\nHOW TO Filter Data Using NACCulator\n-----------------------------------\n\nIf your data is not clean enough to be processed by NACCulator, there are some\nbuilt in functions to clean (read: transform) the data.\n\nIn order to properly use the filters, the first step is to check and validate\nthat `nacculator_cfg.ini` has the proper settings for the filter to run. In\norder to create this file, find the `nacculator_cfg.ini.example` file and\nremove the `.example` portion, and then fill in your center's information.\nThe config file contains sections with in-code filter function name. Each of\nthese sections contains elements necessary for the filter to run.\nThe filters described below will discuss what is required, if anything.\n\nThe filters can be run all at once with your REDCap API token using:\n\n    $ nacculator_filters nacculator_cfg.ini\n\nYou can find more details on `nacculator_filters` under the section:\nHOW TO Acquire current-db-subjects.csv for the filters\n\n\nRUNNING ALL FILTERS ON A LOCAL FILE\n------------------------------------------------------\n\nREDCap has an export size limit that can be exceeded with a large project like\nthe ADRC. When the size of the project surpasses the REDCap limit, the\n`nacculator_filters` command will no longer work. The data must be manually\nexported from the project in chunks (whether by event or by ptid). However you\nchoose to export the data, keep in mind that all of the fields in a packet need\nto be present in the input csv you use. So, for example, the A1 and A2 forms in\nthe IVP cannot be exported and run separately through NACCulator.\n\nYou can still run all the filters using your config file on a REDCap-exported\ncsv, even when not using `nacculator_filters`. The command to use this filter\nlocally is:\n\n    $ python3 nacc/local_filters.py nacculator_cfg.ini redcap_input.csv\n\nwhere `redcap_input.csv` is the location of the file you want to filter. The\nfilter will then run as normal, creating a `run_CURRENT-DATE` folder and\ndepositing each stage of the filter process in this folder. The final output\nof the filter process is a csv file called `final_Update.csv` which can then\nbe run through NACCulator.\n\n\nRUNNING INDIVIDUAL FILTERS\n------------------------------------------------------\n\nThe filters can also be run one at a time on a `.csv` file with the `-f` and `-meta`\nflags.\n\nFor example, to run the fixHeaders filter:\n\n    $ redcap2nacc -f fixHeaders -meta nacculator_cfg.ini \u003cdata_input.csv \u003efiltered_output.csv\n\nIf the filter requires the config, it must be passed with the `-meta` flag like\nthe example above shows.\n\n\n* **cleanPtid**\n\n  This filter requires a section in the config called `filter_clean_ptid`. This\n  section will contain a single key `filepath` which will point to a csv \n  (usually called `current-db-subjects.csv`) file of ptids to be removed. All \n  the records whose ptid with same packet and visit num found in the passed \n  meta file will be discarded in the output file. This filter also removes \n  events that lack a visit number in REDCap.\n\n  Example meta file:\n\n      Patient ID,Packet type,Visit Num,Status\n      110001,I,1,Current\n      110001,M,M1,Current\n      110003,I,001,Current\n      110003,F,002,Current\n\n\n* **replaceDrugId**\n\n  This filter replaces the first character of non empty fields of columns\n  `drugid_1` to `drugid_30` with character \"**d**\".\n\n\n* **fixHeaders**\n\n  This filter requires a section in the config called `filter_fix_headers` with\n  as many keys as needed to replace the necessary columns. See example below.\n  This filter fixes the column names of any column found in the filter mapping.\n  This filter does not check for any data. It only replaces the column names\n  if found.\n\n  For example, the configuration would look like this:\n\n      [filter_fix_headers]\n      c1s_2a_npsylan: c1s_2_npsycloc\n      c1s_2a_npsylanx: c1s_2a_npsylan\n      b6s_2a1_npsylanx: c1s_2a1_npsylanx\n      fu_otherneur: fu_othneur\n      fu_otherneurx: fu_othneurxs\n      fu_strokedec: fu_strokdec\n      fukid9agd: fu_kid9agd\n      fusib17pdx: fu_sib17pdx\n\n\n* **fillDefault**\n\n  This filter is used to set some predefined fields to their corresponding\n  predefined values. Below are the current defaults :\n\n      nogds    -\u003e 0\n      formver  -\u003e 3\n\n  *If field is blank, it will be updated to default value.*\n\n\n* **updateField**\n\n  This filter is used to update fields that already had a value in the REDCap\n  export. Currently, only `adcid` is updated.\n\n\n* **fixVisitNum**\n\n  This filter is used to ensure that the `visitnum` field is always an integer.\n  It is currently only accessible from the config file when running all\n  filters.\n\n\n* **removePtid**\n\n  This filter requires a section in the config called `filter_remove_ptid` with\n  a single key called `ptid_format`. The value for that key is a regex string\n  to match ptids that are to be kept.\n  11\\d.* keeps all PTIDs that fit the format 11xxxx, such as 110001.\n\n  This filter is used to remove ptids that may have a different set of ids for\n  a different study, or help limit which ids show up in the final result.\n\n      config:\n      ptid_format: 11\\d.*\n\n* **removeDateRecord**\n\n  This filter is used to remove records who may be missing visit dates. It\n  searches for rows missing the visit day, month, or year. If any of those\n  fields are missing, it removes the row.\n\n* **getPtid**\n\n    This filter is used to get information about a single PatientID and is not\n    present in the config file. You need to use the `-ptid` flag to specify the\n    patient ID.\n    You can use the `-vnum` to get the records with particular visit number and\n    Patient ID or use `-vtype` to get records with particular visit type and\n    Patient ID.\n\n        $ redcap2nacc -f getPtid -ptid $SOME_PATIENT_ID -vnum $SOME_VISIT_NUM -vtype $SOMEVISIT_TYPE \u003cdata.csv \u003edata.txt\n\n\nHOW TO Acquire current-db-subjects.csv for the filters\n------------------------------------------------------\n\nThis file is a csv that determines which of your center's PTIDs are already\npresent in NACC's current database using the patient's PTID, the packet type\n(ivp or fvp, etc), the visit number, and the status (working or current). In\norder to get it, you need to use the contents of\n`tools/preprocess/get_subject_list.js`. The script is meant to be run on the\n\"Finalize Data\" page of the NACC UDS3 upload system.\n\nNavigate to \"Finalize Data\" and right-click anywhere on the page. Select\n\"Inspect\" or \"Inspect element\" to open the browser's Inspect panel. Click on\nthe \"Console\" tab and copy/paste the contents of `get_subject_list.js` into the\nconsole. Then, press the \"Enter\" or \"Return\" key on your keyboard. This will\ncollect all of the PTIDs in your center's Working and Current databases into a\ncsv called `current-db-subjects.csv` in your Downloads folder. You may then\nmove it to whatever location you specified in your `nacculator_cfg.ini` file.\n\nThe csv is used by the filter_clean_ptid filter to identify and cull all\npackets already in NACC's Current database from your input csv. It is used to\nmake NACCulator run faster for very large databases.\n\n\nExample Workflow\n----------------\n\nOnce you have edited the `nacculator_cfg.ini` file with your API token and\ndesired filters, you can get a filtered CSV file of the raw REDCap data with:\n\n    $ nacculator_filters nacculator_cfg.ini\n\nThis will create a run folder labeled with the current date \n(`$run_CURRENT-DATE`) (for example, `run_01-01-2000`) that contains the csv and\neach iteration of filter, ending with `final_update.csv`.\n\nNote: The files created by `redcap2nacc` will not be in the run folder created\nby `run_filters.py`. They will be in the base directory. The filepaths in the\nfollowing commands are modified so that the output is deposited in your\n`$run_CURRENT-DATE` folder.\n\nNext, you will need to run the actual `redcap2nacc` program to produce the\nfixed width text file for NACC. One type of flag can be used at a time, so the\nprogram must be run once for each type of packet.\n\n    $ redcap2nacc -ivp \u003c $run_CURRENT-DATE/final_Update.csv \u003e $run_CURRENT-DATE/iv_nacc_complete.txt 2\u003e $run_CURRENT-DATE/ivp_errors.txt\n    $ redcap2nacc -fvp \u003c $run_CURRENT-DATE/final_Update.csv \u003e $run_CURRENT-DATE/fv_nacc_complete.txt 2\u003e $run_CURRENT-DATE/fvp_errors.txt\n\nThis will place the text files (`iv_nacc_complete.txt`) in the run folder\ncreated earlier, as well as a log of the run that contains any found errors\n(`ivp_errors.txt`).\n\n\nDevelopment\n-----------\n\n### Quickstart\n\n    $ git clone https://github.com/ctsit/nacculator.git nacculator\n    $ cd nacculator\n    $ python3 -mvenv venv\n    $ source venv/bin/activate\n    $ pip install -e .\n\n### Files\n\nThis is not exhaustive, but here is an explanation of some important files.\n\n* `nacc/`:\n    top-level Python package for all things NACC.\n\n* `nacc/redcap2nacc.py`:\n    converts a CSV data file exported from REDCap into NACC's UDS3 fixed-width\n    format.\n\n* `nacc/uds3/blanks.py`:\n    specialized library for \"Blanking Rules\".\n\n* `nacc/uds3/ivp/forms.py`:\n    UDS3 IVP forms represented as Python classes.\n\n* `tools/generator.py`:\n    generates Python objects based on NACC Data Element Dictionaries in CSV.\n    Used by developers to update the existing forms.py files as necessary.\n\n* `nacculator_cfg.ini`:\n    configuration file for the filters, built from `nacculator_cfg.ini.example`\n    in the root `nacculator/` directory.\n\n* `nacc/run_filters.py` and `tools/preprocess/run_filters.sh`:\n    pulls data from REDCap based on the settings found in `nacculator_cfg.ini`\n    (for .py) and `filters_config.cfg` (for .sh). Supports exporting data from multiple REDCap projects by adding a comma-delimited list of tokens without spaces e.g., `token=token1,token2` to `token` in the `nacculator_cfg.ini` config file.\n\n\n### Testing\n\nTo run all the tests:\n\n    $ python3 -m unittest\n\n\nTo run only the tests in a specific file:\n\n    $ python3 tests/test_$SPECIFIC_FILE.py\n\n\n### Generating Forms\n\n**Warning: the generator is currently broken due to changes in the CSV format.**\n\nYou only need to generate forms when there are new DEDs from NACC. The\nNACCulator install includes the current forms automatically.\n\nBefore running the generator, read the warnings in `./nacc/uds3/ivp/forms.py`\nfirst.\n\n    $ python3 tools/generator.py tools/uds3/ded/csv/ \u003enacc/uds3/ivp/forms.py\n    $ edit nacc/uds3/ivp/forms.py\n\n_Note: execute `generator.py` from the same folder as the `corrected`\nfolder, which should contain any \"corrected\" DEDs._\n\n### Resources\n\n* UDS3 forms: https://www.alz.washington.edu/NONMEMBER/UDS/DOCS/VER3/UDS3csvded.html\n* NACC forms and documentation: https://www.alz.washington.edu/NONMEMBER/NACCFormsAndDoc.html\n* UDS submission site: https://www.alz.washington.edu/MEMBER/sitesub.htm\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctsit%2Fnacculator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fctsit%2Fnacculator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctsit%2Fnacculator/lists"}