{"id":48584586,"url":"https://github.com/stomioka/sdtm_mapper","last_synced_at":"2026-04-08T17:41:48.063Z","repository":{"id":57465040,"uuid":"136808330","full_name":"stomioka/sdtm_mapper","owner":"stomioka","description":"AI SDTM mapping  (R for ML, Python, TensorFlow for DL)","archived":false,"fork":false,"pushed_at":"2023-12-14T18:57:16.000Z","size":19166,"stargazers_count":53,"open_issues_count":3,"forks_count":25,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-09-24T21:51:25.582Z","etag":null,"topics":["cdisc","deep-learning","machine-learning","nlp-machine-learning","sdtm"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stomioka.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-10T12:48:05.000Z","updated_at":"2025-09-09T04:51:51.000Z","dependencies_parsed_at":"2023-01-31T05:30:52.789Z","dependency_job_id":null,"html_url":"https://github.com/stomioka/sdtm_mapper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/stomioka/sdtm_mapper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stomioka%2Fsdtm_mapper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stomioka%2Fsdtm_mapper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stomioka%2Fsdtm_mapper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stomioka%2Fsdtm_mapper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stomioka","download_url":"https://codeload.github.com/stomioka/sdtm_mapper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stomioka%2Fsdtm_mapper/sbom","scorecard":{"id":853868,"data":{"date":"2025-08-11","repo":{"name":"github.com/stomioka/sdtm_mapper","commit":"d411f3deb72376c6deeac6dcb11d01420295eecf"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.7,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: GNU General Public License v3.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"16 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-cjgq-5qmw-rcj6","Warn: Project is vulnerable to: GHSA-x4wf-678h-2pmq","Warn: Project is vulnerable to: PYSEC-2018-34 / GHSA-2fc2-6r4j-p65h","Warn: Project is vulnerable to: PYSEC-2021-856 / GHSA-5545-2q6w-2gh6","Warn: Project is vulnerable to: PYSEC-2019-108 / GHSA-9fq2-x9r6-wfmf","Warn: Project is vulnerable to: PYSEC-2018-33 / GHSA-cw6w-4rcx-xphc","Warn: Project is vulnerable to: PYSEC-2021-857 / GHSA-f7c7-j99h-c22f","Warn: Project is vulnerable to: GHSA-fpfv-jqm9-f5jm","Warn: Project is vulnerable to: PYSEC-2017-1 / GHSA-frgw-fgh6-9g52","Warn: Project is vulnerable to: PYSEC-2020-73","Warn: Project is vulnerable to: PYSEC-2020-107 / GHSA-jjw5-xxj6-pcv5","Warn: Project is vulnerable to: PYSEC-2024-110 / GHSA-jw8x-6495-233v","Warn: Project is vulnerable to: PYSEC-2020-108","Warn: Project is vulnerable to: PYSEC-2025-49 / GHSA-5rjg-fvgr-3xxf","Warn: Project is vulnerable to: GHSA-cx63-2mw6-8hw5","Warn: Project is vulnerable to: PYSEC-2022-43012 / GHSA-r9hx-vwmv-q579"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-23T23:18:30.588Z","repository_id":57465040,"created_at":"2025-08-23T23:18:30.588Z","updated_at":"2025-08-23T23:18:30.588Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31567225,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdisc","deep-learning","machine-learning","nlp-machine-learning","sdtm"],"created_at":"2026-04-08T17:41:47.533Z","updated_at":"2026-04-08T17:41:48.054Z","avatar_url":"https://github.com/stomioka.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sdtm-mapper\nSam Tomioka\n\nFeb 2019\n\n- [About](#about)\n- [Installation](#installation)\n- [Tutorials](#tutorials)\n- [Notes](#notes)\n- [Issues](#issues)\n- [Disclaimer](#disclaimer)\n- [References](#reference)\n\n## [About](#about)\n\n[**sdtm-mapper**](https://pypi.org/project/sdtm-mapper/) is a Python package to generate machine readable CDISC SDTM mapping specifications with help from AI. This can be used for following tasks.\n\n1. Generates an empty specifications for training data from a user provided SAS dataset. This empty specification will contain SAS dataset attributes.  You don't need to use `Proc Contents` in SAS to do this! SAS datasets maybe in your aws s3 bucket or local folder.\n2. Runs models to generate a mapping specifications.\n3. Generates your own mapping algorithms using your data. The models can be trained to generate the target variables but also programming sudo code.\n\nThe first version comes with **three pre-trained models** (Included in the package). These are trained on feed forward NN with trainable ELMo embedding layer for 34 classes using **adverse event** datasets from 18 clinical trials, and validation was done on 3 clinical trials until the models were optimized. Test was done on 1 clinical trial. 22 clinical trials data are extracted from **Medidata Rave** built by 3 different CROs and Sunovion Pharmaceuticals.\n\n| Models                 | Parameters | Training Acc | Validation Acc | Test Acc* |\n|------------------------|------------|--------------|----------------|----------|\n|1. Elmo+sfnn+ae+Model1.h5 | 271,142    |  0.9795        | 0.9800        | 0.9540   |\n|2. Elmo+fnn+ae+Model2.h5  | 664,870    | 0.9846      | **1.0000**         | 0.9425   |\n|3. Elmo+fnn+ae+Model3.h5  | 594,854    | **0.9966**       | **1.0000**         | **0.9666**   |\n\n**Table 1 - Performance of three models** \u003cbr\u003e\n\\* Macro accuracy account for system variables for 'drop'.\n\nHigh variance models may be due to addition of CDASH metadata, and probably better to remove them.\n\nImprovement of the task specific model are explored by Peters et.al [1]:\n\n1. Freeze context-independent representations from the pre-trained biLM and concatenate them and $ELMo^{task}_{k}$ and pass that into task RNN.\n2. Replacing $h_k$ with $[x_k; ELMo^{task}_{k}]$. Peters et.al [1] has shown improved performance in some tasks such as SNLI and SQuAD by including ELMo at the output of the task RNN.\n3. Add a moderate amount of dropout to ELMo.\n4. Regularize the ELMo weights by adding $\\gamma||w||^2_2$ to the loss function.\n\nThese can be considered as future enhancment for other domains that may not perform well.\n\n\nHere is the architecture of ELMo.\n\n![](images/README-06c97452.png)\n**Figure 1** - biLM architecture for ELMo\n\n## [Installation](#installation)\n```unix\npip install sdtm-mapper\n```\n\n## [Tutorials on Google Colab](#tutorials)\n\n1. [How to prepare training data using sdtm-mapper from SAS7bdat files?](https://colab.research.google.com/drive/1Kv9B4Guw74I2hFDodlsuvqiYsrLjTDu7) \n2. [Tutorial on how to use sdtm-mapper to generate mapping specifications](https://colab.research.google.com/drive/1A8rzsYq7jKhTgTki7DSzDlvdrew414j4?ts=5c78a25c) \n3. [Train your data using SDTMMapper on Model 1](https://colab.research.google.com/drive/1d73e0ZZDxVGcUgY8P_Bz1PCMuCLRpL7D): Note that you need to supply your training data.\n\n\n## [Notes](#notes)\nYou have to have an environment to use **tensorflow**, **tensorflow-hub** etc.\n\nIf you want to contribute for adding more models for different SDTM domains, please join [PhUSE ML Project Community](https://www.phusewiki.org/wiki/index.php?title=Machine_Learning_/_Artificial_Intelligence). Most of the work has been done during the weekends or evening. Your contributions are always welcome!\n\n**Notes about the trained models**:\n\nThe models were build and trained on raw AE datasets from clincial trials conducted by Sunovion Pharmaceuticals. The EDC system we use is Medidata RaveX. The training data contains some e-source data. The performance may not be good for your data.  You can also build your models using SDTMMapper tool and use your custom model for your datasets.\n\nOld reame file is found [here](https://github.com/stomioka/sdtm_mapper/blob/master/old_readme.md)\n\n\n## [Issues](#issues)\n\nFor any questions, comments, suggestions, or issues, please post them [here](https://github.com/stomioka/sdtm_mapper/issues)\n\nFor personal communication related to SDTMMapper, please contact [Sam Tomioka](sam.tomioka@sunovion.com)\n\n## [Disclaimer](#disclaimer)\nThis is not an official Sunovion Pharmaceuticals product.\n\n\n## [References](#reference)\n1] Peters,M et al. (2018). Deep contextualized word representations\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstomioka%2Fsdtm_mapper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstomioka%2Fsdtm_mapper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstomioka%2Fsdtm_mapper/lists"}