{"id":45689902,"url":"https://github.com/imsweb/data-generator","last_synced_at":"2026-03-09T20:08:00.876Z","repository":{"id":51379093,"uuid":"53259983","full_name":"imsweb/data-generator","owner":"imsweb","description":"This Java library allows to create synthetic (fabricated) NAACCR data files.","archived":false,"fork":false,"pushed_at":"2026-02-24T14:47:30.000Z","size":12223,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2026-02-24T18:53:03.041Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/imsweb.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2016-03-06T14:52:30.000Z","updated_at":"2026-02-24T14:47:35.000Z","dependencies_parsed_at":"2024-11-04T20:19:11.946Z","dependency_job_id":"6f1f3a13-4e49-4507-a83b-30ae8eff4c9c","html_url":"https://github.com/imsweb/data-generator","commit_stats":null,"previous_names":[],"tags_count":42,"template":false,"template_full_name":null,"purl":"pkg:github/imsweb/data-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imsweb%2Fdata-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imsweb%2Fdata-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imsweb%2Fdata-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imsweb%2Fdata-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/imsweb","download_url":"https://codeload.github.com/imsweb/data-generator/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imsweb%2Fdata-generator/sbom","scorecard":{"id":486460,"data":{"date":"2025-08-11","repo":{"name":"github.com/imsweb/data-generator","commit":"02a416118724751d0495e9d4d1bced55a2001caf"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4.1,"checks":[{"name":"Code-Review","score":0,"reason":"Found 1/29 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"1 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/integration.yml:1","Warn: no topLevel permission defined: .github/workflows/publish.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":9,"reason":"binaries present in source code","details":["Warn: binary detected: gradle/wrapper/gradle-wrapper.jar:1"],"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/integration.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/imsweb/data-generator/integration.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/integration.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/imsweb/data-generator/integration.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/integration.yml:26: update your workflow using https://app.stepsecurity.io/secureworkflow/imsweb/data-generator/integration.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/publish.yml:17: update your workflow using https://app.stepsecurity.io/secureworkflow/imsweb/data-generator/publish.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/publish.yml:20: update your workflow using https://app.stepsecurity.io/secureworkflow/imsweb/data-generator/publish.yml/master?enable=pin","Info:   0 out of   5 GitHub-owned GitHubAction dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Packaging","score":10,"reason":"packaging workflow detected","details":["Info: Project packages its releases by way of GitHub Actions.: .github/workflows/publish.yml:11"],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 3 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-19T17:56:54.552Z","repository_id":51379093,"created_at":"2025-08-19T17:56:54.552Z","updated_at":"2025-08-19T17:56:54.552Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30310076,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T20:05:46.299Z","status":"ssl_error","status_checked_at":"2026-03-09T19:57:04.425Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-24T16:05:31.963Z","updated_at":"2026-03-09T20:08:00.865Z","avatar_url":"https://github.com/imsweb.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Generator\n\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=imsweb_data-generator\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=imsweb_data-generator)\n[![integration](https://github.com/imsweb/data-generator/workflows/integration/badge.svg)](https://github.com/imsweb/data-generator/actions)\n[![Maven Central](https://img.shields.io/maven-central/v/com.imsweb/data-generator.svg)](https://central.sonatype.com/artifact/com.imsweb/data-generator)\n\nThis Java library can be used to create cancer-related synthetic data files.\n\n## Download\n\nThe library is available on [Maven Central](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22com.imsweb%22%20AND%20a%3A%22data-generator%22).\n\nTo include it to your Maven or Gradle project, use the group ID `com.imsweb` and the artifact ID `data-generator`.\n\nYou can check out the [release page](https://github.com/imsweb/data-generator/releases) for a list of the releases and their changes.\n\nThis library requires Java 8 or a more recent version.\n\n## Usage\n\nAs of version 1.10, the GUI Standalone component of the library has been retired. The free [File*Pro](https://seer.cancer.gov/tools/filepro/) software\ncan be used to generate synthetic data using a user-friendly interface.\n\nWhen embedding the library in your project, the following generators are available:\n - RecordDataGenerator can bu used with generic fixed-columns layouts.\n - NaaccrDataGenerator can be used with NAACCR fixed-columns layouts.\n - NaaccrXmlDataGenerator can be used with NAACCR XML layouts.\n - NaaccrHl7DataGenerator can be used with NAACCR HL7 layouts.\n - PhysicianDataGenerator can be used to generate physicians.\n - FacilityDataGenerator can be used to generate facilities.\n\n### NAACCR Fixed-columns Generator\n\nThe NAACCR generator current provides rules for the following fields:\n - Patient ID Number\n - Sex\n - Race 1-5\n - Spanish/Hispanic Origin\n - Social Security Number\n - Name (Last, First, Middle, Prefix, Suffix and Maiden if needed)\n - Vital Status\n - Cause of death and ICD Revision Number\n - Date of Birth, Birthplace Country and State\n - Current address\n - Computed Ethnicity\n - Registry ID\n - Tumor Record Number\n - SEER Record Number\n - Sequence Number Central\n - Date of Diagnosis\n - Primary Site and related fields\n - Age at DX\n - Date of Initial RX\n - Date of Last Contact\n - Address at DX\n - Marital Status at DX\n - Diagnostic Confirmation\n - Type of Reporting Source\n - Census fields\n - RX Summary fields\n - RX Text fields (random data only)\n - SEER Type of Follow Up\n - Primary Payer at DX\n - Tumor Marker 1, 2 and 3\n - SEER Coding System\n - Multiple Tumors fields\n - Date Conclusive DX\n - Collaborative Stage fields\n - NHIA\n - NAPIIA\n - Diagnostic Procedure Text fields (random data only)\n - Tumor Text fields (random data only)\n - Telephone\n\nHere is an example using the NAACCR generator:\n```java\n// create the generator\nNaaccrDataGenerator generator = new NaaccrDataGenerator(LayoutFactory.LAYOUT_ID_NAACCR_18_ABSTRACT);\n\n// generate a single patient with 2 tumors\nList\u003cMap\u003cString, String\u003e\u003e patient = generator.generatePatient(2);\n\n// generate a file with 500 tumors, each patient will have a random number of tumors (mostly 1)\ngenerator.generateFile(targetFile, 500)\n```\n\nThe generator accepts an additional options object as an input to the generate methods, that object can be used to customize the\nrandom data generation of some of the fields.\n\n### NAACCR XML Generator\n\nThe NAACCR XML generator uses the same rules as the NAACCR fixed-columns generator.\n\nHere is an example using the NAACCR XML generator:\n```java\n// create the generator\nNaaccrXmlDataGenerator generator = new NaaccrXmlDataGenerator(LayoutFactory.LAYOUT_ID_NAACCR_XML_18_ABSTRACT);\n\n// generate a single patient with 2 tumors\nPatient patient = generator.generatePatient(2);\n\n// generate a file with 500 tumors, each patient will have a random number of tumors (mostly 1)\ngenerator.generateFile(targetFile, 500)\n```\n\nThe generator accepts an additional options object as an input to the generate methods, that object can be used to customize the\nrandom data generation of some of the fields.\n\n### NAACCR HL7 Generator\n\nThe NAACCR HL7 generator provides rules for the following segments:\n - Control Segment (MSH)\n - Patient Identifier Segment (PID)\n - Next of Kin Segment (NK1)\n - Patient Visit Segment (PV1)\n - Common Order Segment (ORC)\n - Observation Request Segment (OBR)\n - Observation/Result Segment (OBX)\n\nHere is an example using the NAACCR HL7 generator:\n```java\n// create the generator\nNaaccrHl7DataGenerator generator = new NaaccrHl7DataGenerator(LayoutFactory.LAYOUT_ID_NAACCR_HL7_2_5_1);\n\n// generate a single message\nHl7Message message = generator.generateMessage();\n\n// generate a file with 10 messages\ngenerator.generateFile(targetFile, 10)\n```\n\nThe generator accepts an additional options object as an input to the generate methods, that object can be used to customize the\nrandom data generation of some of the fields.\n\n### Physician and Facility Data Generator\n\nThose generator don't require a layout; they can be used generate physicians and facilities. The data is created from publicly available NPI data files.\n\n## Defining Variables\n\nThis library supports variables and file formats through the [layout framework](https://github.com/imsweb/layout). A layout object must be used\nto initialize one of the data generator objects (although some of them supports providing just the layout ID).\n\n## Creating Random Data\n\nThe library uses rules to create data; each rule being responsible for assigning one or several fields. The NAACCR generators come with a set of basic rules to assign some of the NAACCR fields;\nthe generic generator does't come with any rules.\n\nThe library uses three ways to assign values:\n\n1. ***Constant values***: the rule always assigns the same value to the field.\n2. ***Random values from a list***: the rule assigns a random value from a specific list of values.\n3. ***Random values based on a frequency***: the rule uses a frequency (usually from a CSV file) to get the value to assign; this results in more common values being assigned more often.\n\nIn addition to those assignment mechanisms, each rule might have dependencies to the values assigned by previous rules.\n\nThe default NAACCR rules use frequencies extracted from the SEER data.\n\nTo know more about the default NAACCR rules, check out the [rule package](https://github.com/imsweb/data-generator/tree/master/src/main/java/com/imsweb/datagenerator/naaccr/rule).\n\n## About SEER\n\nThis library was developed through the [SEER](http://seer.cancer.gov/) program.\n\nThe Surveillance, Epidemiology and End Results program is a premier source for cancer statistics in the United States.\nThe SEER program collects information on incidence, prevalence and survival from specific geographic areas representing\na large portion of the US population and reports on all these data plus cancer mortality data for the entire country.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimsweb%2Fdata-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimsweb%2Fdata-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimsweb%2Fdata-generator/lists"}