{"id":23421395,"url":"https://github.com/nhsdigital/artificial-data-plug-and-play","last_synced_at":"2025-08-28T04:39:01.859Z","repository":{"id":190189050,"uuid":"682043494","full_name":"NHSDigital/artificial-data-plug-and-play","owner":"NHSDigital","description":"Get up and running with experimenting on artificial NHS data!","archived":false,"fork":false,"pushed_at":"2023-08-25T15:06:54.000Z","size":1313,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-09T09:44:39.798Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://digital.nhs.uk/services/artificial-data","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NHSDigital.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-23T10:11:24.000Z","updated_at":"2023-08-29T13:27:55.000Z","dependencies_parsed_at":"2023-10-12T23:50:40.877Z","dependency_job_id":null,"html_url":"https://github.com/NHSDigital/artificial-data-plug-and-play","commit_stats":null,"previous_names":["nhsdigital/artificial-data-plug-and-play"],"tags_count":0,"template":true,"template_full_name":"NHSDigital/rap-package-template","purl":"pkg:github/NHSDigital/artificial-data-plug-and-play","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NHSDigital%2Fartificial-data-plug-and-play","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NHSDigital%2Fartificial-data-plug-and-play/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NHSDigital%2Fartificial-data-plug-and-play/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NHSDigital%2Fartificial-data-plug-and-play/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NHSDigital","download_url":"https://codeload.github.com/NHSDigital/artificial-data-plug-and-play/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NHSDigital%2Fartificial-data-plug-and-play/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272439512,"owners_count":24935397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-28T02:00:10.768Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-23T02:15:00.106Z","updated_at":"2025-08-28T04:39:01.835Z","avatar_url":"https://github.com/NHSDigital.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Artificial Data Plug and Play\n\nGet up and running with experimenting on artificial NHS data!\n\n\u003e **This material is maintained by the [NHS England Data Science team](mailto:datascience@nhs.net)**.\n\u003e\n\u003e See our other work here: [NHS England Analytical Services](https://github.com/NHSDigital/data-analytics-services).\n\nTo contact us raise an issue on Github or via [email](mailto:datascience@nhs.net) and we will respond promptly.\n\n\n## What is artificial data?\nArtificial data sets provide users with large volumes of data that share some of the characteristics of real data while protecting patient confidentiality. They are designed to model the structure of real data but are completely artificial – they do not contain any actual patient records. We are piloting this new service with a limited number of artificial data sets.\n\nYou can find out more about the pilot on the [NHS website](https://digital.nhs.uk/services/artificial-data).\n\n## What is this repo for?\nThis repo contains some example code for getting started with using artificial data with minimal setup. \n\nIt was creating using the [rap-package-template](https://github.com/NHSDigital/rap-package-template/tree/main) which provides a neat way to create new repositories for Reproducible Analytical Pipelines.\n\n### What does the repo contain?\nThe repo contains the following files and directories:\n```\n|- sql                  # Code for interacting with SQL\n|- src                  # Source code for data ingestion, cleaning, processing, etc\n|- templates            # Templates for excel reporting\n|- tests                # Test modules\n|- pyproject.toml       # Configuration\n|- plug_and_play.ipynb  # Plug and play notebook\n|- requirements.txt     # Python dependencies to be installed via pip\n|- ...                  # Additional repo files (e.g. .gitignore)\n```\n\n**Note:** because this repo was created from the [rap-package-template](https://github.com/NHSDigital/rap-package-template/tree/main) there are a number of files / folders that persist from that template. \nThese have been left in the repo so that you can fork the repo and adapt to your own needs! \n\nFor the plug and play tutorial, the main file you'll be interacting with is [plug_and_play.ipynb](./plug_and_play.ipynb). See below for instructions on how to get set up to run the tutorial. \n\n\n## How do I get started?\n\nIf you are setting up the tutorial in an environment which is provisioned out of the box (such as Google Colab or GitHub Codespaces), see *Quick start*.\nMore detailed instructions can be found in *Full setup*.\n\n### Quick start\nThe easiest way to run the tutorial is in an environment which is provisioned out of the box.\nClicking one of the buttons below will open the repo in the respective environment with all the dependencies setup so you can just get coding!\n\n\u003ca href=\"https://github.com/codespaces/new?template_repository=NHSDigital/artificial-data-plug-and-play\" target=\"_parent\"\u003e\u003cimg src=\"https://github.com/codespaces/badge.svg\" width=\"200\" alt=\"Open In Codespaces\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://colab.research.google.com/github/NHSDigital/artificial-data-plug-and-play/blob/main/plug_and_play.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" width=\"150\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e\n\n\n### Full setup\nPrerequisites:\n- A bash terminal (although similar instructions will work in PowerShell)\n- Python \u003e= 3.10\n- An IDE or text editor (such as VS Code or PyCharm)\n\nOpen a terminal and execute the following\n1. Navigate to a directory you want to create the tutorial repo in (using `cd DESTINATION_DIRECTORY`)\n1. Clone the repo using `git clone https://github.com/NHSDigital/artificial-data-plug-and-play.git`\n1. Open the repo in the terminal using `cd artificial-data-plug-and-play` and create a virtual environment via `python -m venv .venv` (note you don't have to do this in a virtual environment, but it is recommended)\n1. Activate the environment and install the requirements `. .venv/bin/activate \u0026\u0026 pip install -r requirements.txt`\n1. (Optional) Install jupyter via `pip install jupyter`. This will allow you to use jupyter notebooks thoough the classic web interface.\n1. Open the tutorial\n    - Using jupyter if you installed it using the command above `jupyter notebook plug_and_play.ipynb`\n    - Alternatively, you can open the notebook in your IDE of choice (for example using [VS Code](https://code.visualstudio.com/docs/datascience/jupyter-notebooks))\n\nYou should now be ready to run the plug and play!\n\n## See also\nHere are some other related projects that are worth checking out:\n1. [Reproducible Analytical Pipeline example](https://github.com/NHSDigital/RAP_example_pipeline_python/tree/main)  which uses artificial HES data to create a simple stats publication\n1. [Codebase to generate artificial data](https://github.com/NHSDigital/artificial-data-generator) written for Databricks using Python / PySpark","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnhsdigital%2Fartificial-data-plug-and-play","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnhsdigital%2Fartificial-data-plug-and-play","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnhsdigital%2Fartificial-data-plug-and-play/lists"}