https://github.com/samhollings/nhs_data_cleansing
A repo of reusable functions for cleansing data
https://github.com/samhollings/nhs_data_cleansing
cleansing data data-cleaning data-cleansing preprocessing pyspark python python3
Last synced: 8 months ago
JSON representation
A repo of reusable functions for cleansing data
- Host: GitHub
- URL: https://github.com/samhollings/nhs_data_cleansing
- Owner: SamHollings
- License: mit
- Created: 2025-01-06T15:55:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-26T20:57:18.000Z (over 1 year ago)
- Last Synced: 2025-02-26T21:28:36.294Z (over 1 year ago)
- Topics: cleansing, data, data-cleaning, data-cleansing, preprocessing, pyspark, python, python3
- Language: Python
- Homepage:
- Size: 52.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# nhs_data_cleansing
[](https://github.com/SamHollings/nhs_data_cleansing/actions/workflows/main.yml)  [](https://github.com/psf/black)
[](https://opensource.org/licenses/MIT)
## Description
This repo builds the `nhs_data_cleansing` python package, which contains generic Python functions (specifically using the PySpark library and data structures) for data cleansing.
The functions can be seen in [`src`](src).
ToDo: Add sphinx documentation (or something similar, automatically built)
## Instalation
```bash
pip install nhs_data_cleansing
```
## Usage
Generally, simply add `nhs_data_cleansing` to your list of dependencies/requirements, then install the package.
> [!NOTE]
> It's best practice to specify a version of the library in your list of dependencies - then when the package is updated, your existing work will not be affected.
> The verion numbers may need to be updated in the future, particularly if you want to use newer functionality.
### pip
Add `nhs_data_cleansing` to a `requirements.txt` file within the project, and then do `pip install -r requirements.txt`
### Foundry
Add `nhs_data_cleansing` to the `conda_recipe/meta.yml` file following the [Foundry "python libraries" guidance](https://www.palantir.com/docs/foundry/transforms-python/use-python-libraries)
## Contact
## Licence
Unless stated otherwise (and in keeping with the [NHS Open Source Policy](https://github.com/nhsx/open-source-policy/blob/main/open-source-policy.md#b-readmes)), the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation. The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.
## Contribution
If you want to help build and improve this package, see the [contributing guidelines](CONTRIBUTE.md)
---
This readme has neem built in line with guidance from the [NHS Open Source Policy](https://github.com/nhsx/open-source-policy/blob/main/open-source-policy.md#b-readmes) and [govtcookiecutter](https://github.com/best-practice-and-impact/govcookiecutter/tree/main)