An open API service indexing awesome lists of open source software.

https://github.com/samhollings/nhs_data_cleansing

A repo of reusable functions for cleansing data
https://github.com/samhollings/nhs_data_cleansing

cleansing data data-cleaning data-cleansing preprocessing pyspark python python3

Last synced: 8 months ago
JSON representation

A repo of reusable functions for cleansing data

Awesome Lists containing this project

README

          

# nhs_data_cleansing
[![CI](https://github.com/SamHollings/nhs_data_cleansing/actions/workflows/main.yml/badge.svg)](https://github.com/SamHollings/nhs_data_cleansing/actions/workflows/main.yml) ![Static Badge](https://img.shields.io/badge/status-development-blue) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Description
This repo builds the `nhs_data_cleansing` python package, which contains generic Python functions (specifically using the PySpark library and data structures) for data cleansing.

The functions can be seen in [`src`](src).

ToDo: Add sphinx documentation (or something similar, automatically built)

## Instalation
```bash
pip install nhs_data_cleansing
```

## Usage
Generally, simply add `nhs_data_cleansing` to your list of dependencies/requirements, then install the package.

> [!NOTE]
> It's best practice to specify a version of the library in your list of dependencies - then when the package is updated, your existing work will not be affected.
> The verion numbers may need to be updated in the future, particularly if you want to use newer functionality.

### pip
Add `nhs_data_cleansing` to a `requirements.txt` file within the project, and then do `pip install -r requirements.txt`

### Foundry
Add `nhs_data_cleansing` to the `conda_recipe/meta.yml` file following the [Foundry "python libraries" guidance](https://www.palantir.com/docs/foundry/transforms-python/use-python-libraries)

## Contact

## Licence
Unless stated otherwise (and in keeping with the [NHS Open Source Policy](https://github.com/nhsx/open-source-policy/blob/main/open-source-policy.md#b-readmes)), the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation. The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.

## Contribution
If you want to help build and improve this package, see the [contributing guidelines](CONTRIBUTE.md)

---
This readme has neem built in line with guidance from the [NHS Open Source Policy](https://github.com/nhsx/open-source-policy/blob/main/open-source-policy.md#b-readmes) and [govtcookiecutter](https://github.com/best-practice-and-impact/govcookiecutter/tree/main)