{"id":34045686,"url":"https://github.com/usnistgov/sdnist","last_synced_at":"2026-04-06T07:03:03.793Z","repository":{"id":40289372,"uuid":"427405761","full_name":"usnistgov/SDNist","owner":"usnistgov","description":"SDNist: Benchmark data and evaluation tools for data synthesizers.","archived":false,"fork":false,"pushed_at":"2025-07-16T15:14:30.000Z","size":55728,"stargazers_count":39,"open_issues_count":3,"forks_count":15,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-22T19:46:26.004Z","etag":null,"topics":["dataset","differential-privacy","privacy","python","python3","synthetic-data","synthetic-data-generator"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/usnistgov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":"deidentification techniques/Open_Source_Techniques_for_Privacy_Newcomers.pdf","publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-11-12T15:22:28.000Z","updated_at":"2025-11-21T11:36:44.000Z","dependencies_parsed_at":"2023-01-31T17:01:03.036Z","dependency_job_id":"8201a694-a7f6-4c94-81f6-a0c07462e733","html_url":"https://github.com/usnistgov/SDNist","commit_stats":{"total_commits":273,"total_committers":9,"mean_commits":"30.333333333333332","dds":0.4212454212454212,"last_synced_commit":"e2fd29e81889963be4b3855ae0dd370711f865bc"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/usnistgov/SDNist","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/usnistgov%2FSDNist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/usnistgov%2FSDNist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/usnistgov%2FSDNist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/usnistgov%2FSDNist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/usnistgov","download_url":"https://codeload.github.com/usnistgov/SDNist/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/usnistgov%2FSDNist/sbom","scorecard":{"id":912713,"data":{"date":"2025-08-11","repo":{"name":"github.com/usnistgov/SDNist","commit":"f048b1c2ca3815239982ac181ec8cd01840fc351"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":5,"reason":"5 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 5","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":3,"reason":"Found 5/15 approved changesets -- score normalized to 3","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v2.5.0 not signed: https://api.github.com/repos/usnistgov/SDNist/releases/232893157","Warn: release artifact v2.3.0 not signed: https://api.github.com/repos/usnistgov/SDNist/releases/109055997","Warn: release artifact v2.2.0 not signed: https://api.github.com/repos/usnistgov/SDNist/releases/99389721","Warn: release artifact v2.1.1 not signed: https://api.github.com/repos/usnistgov/SDNist/releases/96020678","Warn: release artifact v2.5.0 does not have provenance: https://api.github.com/repos/usnistgov/SDNist/releases/232893157","Warn: release artifact v2.3.0 does not have provenance: https://api.github.com/repos/usnistgov/SDNist/releases/109055997","Warn: release artifact v2.2.0 does not have provenance: https://api.github.com/repos/usnistgov/SDNist/releases/99389721","Warn: release artifact v2.1.1 does not have provenance: https://api.github.com/repos/usnistgov/SDNist/releases/96020678"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 24 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-24T19:50:40.985Z","repository_id":40289372,"created_at":"2025-08-24T19:50:40.985Z","updated_at":"2025-08-24T19:50:40.985Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31463016,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"online","status_checked_at":"2026-04-06T02:00:07.287Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","differential-privacy","privacy","python","python3","synthetic-data","synthetic-data-generator"],"created_at":"2025-12-13T23:17:32.721Z","updated_at":"2026-04-06T07:03:03.786Z","avatar_url":"https://github.com/usnistgov.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Python Version](https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)\n\n# SDNist v2.5: Deidentified Data Report Tool\n\n## [SDNist is the official software package for engaging in the NIST Collaborative Research Cycle](https://pages.nist.gov/privacy_collaborative_research_cycle)\n\nWelcome! SDNist is a python package that provides benchmark data and evaluation metrics for deidentified data generators. This version of SDNist only supports using the [NIST ACS Data Excerpts](https://github.com/usnistgov/SDNist/tree/main/BenchmarkData), a geographically partitioned, limited feature data set.  Future versions of SDNist will be extended to support additional NIST Excerpt Benchmark data sets. \n\nThe deidentified data report evaluates utility and privacy of a given deidentified dataset and generates a summary quality report with performance of a deidentified dataset enumerated and illustrated for each utility and privacy metric.\n\n# [Preview sample reports produced by the tool here.](https://github.com/usnistgov/SDNist/tree/main/sample-reports)\n\nThis tool is being actively developed. Please (raise an Issue)[https://github.com/usnistgov/SDNist/issues]  if you catch a bug or would like have feature suggestions.\n\n\n### Project Team  \n**Karan Bhagat**, *Knexus Research* - Developer *sdnist.report* package  \n\n**Damon Streat**, *Knexus Research* - Developer \n\n**Christine Task**, *Knexus Research* - Project technical lead\n\n**Gary Howarth**, *NIST* - Project PI [gary.howarth@nist.gov](mailto:gary.howarth@nist.gov)\n\n### Acknowledgements\n\nSDNist v2 grew from [SDNist v1](https://github.com/usnistgov/Differential-Privacy-Temporal-Map-Challenge-assets), developed in partnership with Saurus Technologies under CRADA CN-21-0143.\n\n### Reporting Issues\nHelp us improve the package and this guide by reporting issues [here](https://github.com/usnistgov/SDNist/issues).\n\n### Temporal Map Challenge Environment\n\nSDNist v2.0 and above does not support the Temporal Map Challenge environment.\n\nTo run the testing environment from the [*NIST PSCR Differential Privacy Temporal Map Challenge*](https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2020-differential-privacy-temporal) for the Chicago Taxi data sprint or the American Community Survey sprint, please go to the the [Temporal Map Challenge assets repository](https://github.com/usnistgov/Differential-Privacy-Temporal-Map-Challenge-assets).\n\n\nSetting Up the SDNIST Report Tool\n------------------------\n\n### Brief Setup Instructions\n\nSDNist is compatible with Python versions from 3.9 to 3.12. If you have installed a previous version of the SDNist library, we recommend installing v2.5 in a virtual environment. v2.5 can be installed via [Release 2.5](https://github.com/usnistgov/SDNist/releases/tag/v2.5.0) or via the Pypi server: `pip install sdnist` or, if you already have a version installed, `pip install --upgrade sdnist`.\n\nThe NIST ACS Data Excerpt data will download on the fly.\n\n\n\n### Detailed Setup Instructions Using Pypi\n\n1. The SDNist Report Tool is a part of the sdnist Python library that can be installed on a user’s MAC OS, Windows, or Linux machine.\n\n\n2. The sdnist library requires Python be installed on the user's machine. It supports Python versions from 3.9 to 3.12. Check whether an installation exists on the machine by executing the following command in your terminal on Mac/Linux or powershell on Windows:\n   ```\n    c:\\\\\u003e python -V\n   ```\n    If Python is already installed, the above command should return the currently installed version. If Python is not found or the version is below 3.7, then you can download Python from the [Python website](https://www.python.org/downloads/).\n\n\n3.  Create a local directory/folder on the machine to set up the SDNist library. This guide assumes the local directory to be sdnist-project; an example of a complete file path is c:\\\\sdnist-project:\n    ```\n    c:\\\\sdnist-project\u003e     \n    ```\n\n\n4. In the already-opened terminal or powershell window, execute the following command to create a new Python environment. The sdnist library will be installed in this newly created Python environment:\n\n    ```\n    c:\\\\sdnist-project\u003e python -m venv venv\n    ```\n\n5. The new Python environment will be created in the sdnist-project directory, and the files of the environment should be in the venv directory. To check whether a new Python environment was created successfully, use the following command to list all directories in the sdnist-project directory, and make sure the venv directory exists.\n\n    **MAC OS/Linux:**\n    ```\n    sdnist-project\u003e ls\n    ```\n    **Windows:**\n    ```\n    c:\\\\sdnist-project\u003e dir\n    ```\n\n6. Now activate the Python environment and install the sdnist library into it.\n\n    **MAC OS/Linux:**\n    ```\n    sdnist-project\u003e . venv/bin/activate\n    ```\n    The python virtual environment should now be activated. You should see environment name (**venv** in this case) appended to the terminal prompt as below:  \n    ```\n    (venv) sdnist-project\u003e\n    ```\n\n    **Windows:**\n    ```\n    c:\\\\sdnist-project\u003e . venv/Scripts/activate\n    ```\n    The python virtual environment should now be activated. You should see environment name (**venv** in this case) appended to the command/powershell prompt as below:  \n    ```\n    (venv) c:\\\\sdnist-project\u003e\n    ```\n\n    On Windows, a few users may encounter the following error if their machines are new (executing scripts is disabled by default on some Windows machines):\n    ```\n    C:\\\\sdnist-project\\\\venv\\\\Scripts\\\\Activate.ps1 cannot be loaded because running scripts is disabled on this system.\n    ```\n    Run the following command to let Windows execute scripts:\n    ```\n    Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope LocalMachine\n    ```\n\n7. Install sdnist Python library:\n       ```\n       (venv) c:\\\\sdnist-project\u003e pip install sdnist\n       ```\n\n8. Installation is successful if executing the following command outputs a help menu for the sdnist.report package:\n       ```\n       (venv) c:\\\\sdnist-project\u003e python -m sdnist.report -h\n       ```\n       Output:\n       ```\n        usage: __main__.py [-h] [--labels LABELS] [--data-root DATA_ROOT]\n                           PATH_DEIDENTIFIED_DATASET TARGET_DATASET_NAME\n        \n        positional arguments:\n          PATH_DEIDENTIFIED_DATASET\n                                Location of deidentified dataset (csv or parquet\n                                file).\n          TARGET_DATASET_NAME   Select name of the target dataset that was used to\n                                generated given deidentified dataset.\n        \n        options:\n          -h, --help            show this help message and exit\n          --labels LABELS       This argument is used to add meta-data to help\n                                identify which deidentified data was was evaluated in\n                                the report. The argument can be a string that is a\n                                plain text label for the file, or it can be a file\n                                path to a json file containing [label, value] pairs.\n                                This labels will be included in the printed report.\n          --data-root DATA_ROOT\n                                Path of the directory to be used as the root for the\n                                target datasets.\n        \n        Choices for Target Dataset Name:\n          [DATASET NAME]        [FILENAME]\n          MA                    ma2019\n          TX                    tx2019\n          NATIONAL              national2019\n          SBO                   sbo_target\n       ```\n\n9. These instructions install sdnist into a virtual environment. The virtual environment must be activated (step 9) each time a new terminal window is used with sdnist.\n\n\nGenerate Data Quality Report\n---------------------------\n\n1.  The sdnist.report package requires a path to the deidentified dataset file and the name of the target dataset from which the deidentified dataset file will be created. Following is the command line usage of the sdnist.report package:\n      ```\n      python -m sdnist.report PATH_DEINDETIFIED_DATASET TARGET_DATSET_NAME\n      ```\n\n      The above command is just an example usage signature of the package. Steps 3 through 5 show the actual commands to run the tool, where the parameter PATH_DEIDENTIFIED_DATASET is replaced with the path of the deidentified dataset file on the your machine, and the parameter TARGET_DATASET_NAME is replaced with one of the bundled dataset names (MA, TX, or NATIONAL).\n\n      A deidentified dataset file can be anywhere on your machine. You only need the path of the file to pass it as an argument to the sdnist.report package. For illustration purposes, this guide assumes an example deidentified dataset file named syn_tx.csv is generated from the bundled dataset file named TX that is present in the sdnist-project directory. You can also use the bundled toy deidentified datasets for generating some toy evaluation reports using the sdnist.report package by following steps 5 and 6 in the next section, Setup Data for SDNIST Report Tool.\n\n     The sdnist.report packages come bundled with three target datasets: MA, TX, and NATIONAL. If these datasets are not available locally, the package will download them automatically when you run any one of the commands in steps 3 through 5 for the first time. In case of any trouble while downloading the datasets, please refer to the next section, Setup Data for SDNIST Report Tool.\n\n\n2.  If you have closed the terminal or the powershell window that was used for the tool setup, open a new one, and after navigating the to sdnist-project directory, run the activate script as explained in step 9 of the Setup SDNIST Report Tool section.\n\n\n3.  Use the following command to generate a data quality report for the example deidentified dataset (syn_tx.csv) that is generated using the bundled dataset TX:\n      ```\n      (venv) c:\\\\sdnist-project\u003e python -m sdnist.report syn_tx.csv TX\n      ```\n      At the completion of the process initiated by the above command, an .html report will open in the default web browser on your machine. Likewise, .html report files will be available in the reports directory created automatically in the sdnist-project directory.\n\n\n4.  Use the following command to generate a data quality report for the example deidentified dataset (syn_ma.csv) that is generated using the bundled dataset MA:\n      ```\n      (venv) c:\\\\sdnist-project\u003e python -m sdnist.report syn_ma.csv MA\n      ```\n\n\n5.  Use the following command to generate a data quality report for the example deidentified dataset (syn_national.csv) that is generated using the bundled dataset NATIONAL:\n      ```\n      (venv) c:\\\\sdnist-project\u003e python -m sdnist.report syn_national.csv NATIONAL\n      ```\n    \n6. Use the following command to generate a data quality report for the example deidentified dataset (syn_sbo.csv) that is generated using the bundled dataset SBO:\n      ```\n      (venv) c:\\\\sdnist-project\u003e python -m sdnist.report syn_sbo.csv SBO \n      ```   \n    \n7. Starting from version 2.1, SDNist allow users to add labels for the deidentified dataset used to generate report:\n    * To add single string label to the report, use command line option **--labels** followed by a string as given in the following example command:\n      ```\n      (venv) c:\\\\sdnist-project\u003e python -m sdnist.report syn_national.csv NATIONAL --labels used_epsilon_1\n      ```\n      This is how the string label *used_epsilon_1* will appear in the report:\n      ![string label in report](readme_resource/string_label.png)\n    * To add multiple string labels to the report, use command line option **--labels** followed by a path to the json file containing labels:\n      ```\n      (venv) c:\\\\sdnist-project\u003e python -m sdnist.report syn_national.csv NATIONAL --labels example_labels.json\n      ```\n      Where example_labels.json can be:\n      ```\n        {\n          \"epsilon\": \"1\",\n          \"delta\": \"10^-5\",\n          \"created on\": \"March 3, 2023\",\n          \"deidentification method\": \"example_method\"\n        }\n      ```\n      This is how the *example_labels.json* will appear in the report:\n     ![multiple labels in report](readme_resource/multiple_labels.png)\n8. The following are all the parameters offered by the sdnist.report package:\n\n     - **PATH_DEIDENTIFIED_DATASET**: The absolute or relative path to the deidentified dataset .csv or parquet file. If the provided path is relative, it should be relative to the current working directory. This guide assumes the current working directory is sdnist-project.\n     - **TARGET_DATASET_NAME**: This should be the name of one of the datasets bundled with the sdnist.report package. It is the name of the dataset from which the input deidentified dataset is generated, and it can be one of the following:\n       - MA\n       - TX\n       - NATIONAL\n       - SBO\n\n     - **--data-root**: The absolute or relative path to the directory containing the bundled dataset, or the directory where the bundled dataset should be downloaded to if it is not available locally. The default directory is set to **BenchmarkData**.\n     - **--labels**: This argument is used to add meta-data to help identify which deidentified data was was evaluated in the report.  The argument can be a string that is a plain text label for the file, or it can be a file path to a json file containing label, value pairs. \n\nSetup Data for SDNIST Report Tool\n---------------------------------\n\n1.  The sdnist.report package comes with built-in datasets. The package will automatically download the datasets from Github if they are not already available locally on your machine. You should see following message on your terminal or powershell window when the datasets are downloaded by the sdnist.report package:\n      ```\n      (venv) c:\\\\sdnist-project\u003e python -m sdnist.report syn_tx.csv TX\n\n      Downloading all SDNist datasets from:  \n      (link change) https://github.com/usnistgov/SDNist/releases/download/v2.2.0/BenchmarkData.zip ...  \n      ...5%, 47352 KB, 8265 KB/s, 5 seconds elapsed\n      ```\n\n      Follow the next subsection, Download Data Manually, if the sdnist.report package is unable to download the datasets.\n\n\n2. All the datasets required by the sdnist.report package are installed into the sdnist \\_toy \\_data directory, which should be now present inside the sdnist-project directory. sdnist \\_toy \\_data is also a data root directory. You can use some other directory as a data root by providing the –data-root argument to the sdnist.report package. If you provide a –data-root argument with a path, the sdnist.report package will look for datasets in the data root directory you have specified, and the package will download it if it is not present in the data root.\n\n\n3. The sdnist.report package also needs a deidentified dataset that it can evaluate against its original counterpart. Since the sdnist.report package comes bundled with the datasets, the deidentified dataset should be generated using the bundled datasets.\n\n   You can download a copy of the datasets from Github [NIST ACS Data Excerpts](https://github.com/usnistgov/SDNist/tree/main/BenchmarkData). This copy is similar to the one bundled with the sdnist.report package, but it contains more documentation and a description of the datasets.\n\n\n4. You can download the toy deidentified datasets from Github [Sdnist Toy Deidentified Dataset](https://github.com/usnistgov/SDNist/releases/download/v2.1.1/toy_deidentified_data.zip). Unzip the downloaded file, and move the unzipped toy_deidentified_dataset directory to the sdnist-project directory.\n\n\n5. Each toy deidentified dataset file is generated using the [NIST ACS Data Excerpts](https://github.com/usnistgov/SDNist/tree/main/BenchmarkData/ACSDataExcerpts). The syn_ma.csv, syn_tx.csv, and syn_national.csv deidentified dataset files are created from target datasets MA (ma2019.csv), TX (tx2019.csv), and NATIONAL(national2019.csv), respectively. You can use one of the toy deidentified dataset files for testing whether the sdnist.report package is installed correctly on your system.\n\n\n6. Use the following commands for generating reports if you are using a toy deidentified dataset file:\n\n   For evaluating the Massachusetts dataset:\n   ```\n   (venv) c:\\\\sdnist-project\u003e python -m sdnist.report toy_deidentified_data/syn_ma.csv MA\n   ```\n\n   For evaluating the Texas dataset:\n   ```\n   (venv) c:\\\\sdnist-project\u003e python -m sdnist.report toy_deidentified_data/syn_tx.csv TX\n   ```\n\n   For evaluating the national dataset:\n   ```\n   (venv) c:\\\\sdnist-project\u003e python -m sdnist.report toy_deidentified_data/syn_national.csv NATIONAL\n   ```\n\n7.  A deidentified dataset can be a .csv or a parquet file, and the path of this file is required\nby the sdnist.report package to generate a data quality report.\n\n\nDownload Data Manually\n----------------------\n\n1.  If the sdnist.report package is not able to download the datasets, you can download them from Github [NIST ACS Data Excerpts](link).\n3.  Unzip the **BenchmarkData.zip** file and move the unzipped **BenchmarkData** directory to the **sdnist-project** directory.\n4.  Delete the **BenchmarkData.zip** file once the data is successfully extracted from the zip.\n\nCiting SDNist Deidentified Data Report Tool\n-------------------------------------------\n\nIf you publish work that utilizes the SDNist Deidentified Data Tool, please cite the software. Citation recommendation:  \n\u003e Task C., Bhagat K., and Howarth G.S. (2023), SDNist v2: Deidentified Data Report Tool, \n\u003e National Institute of Standards and Technology, \n\u003e https://doi.org/10.18434/mds2-2943.\n\nCredits \n----------\n\n- [Christine Task](mailto:christine.task@knexusresearch.com) - Project technical lead - christine.task@knexusresearch.com\n- [Karan Bhagat](https://github.com/kbtriangulum) - Contributor\n- [David Lee](https://www.linkedin.com/in/david-lee-13872922/) - Documentation\n- [Gary Howarth](https://www.nist.gov/people/gary-howarth) - Project PI - gary.howarth@nist.gov","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fusnistgov%2Fsdnist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fusnistgov%2Fsdnist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fusnistgov%2Fsdnist/lists"}