https://github.com/okp4/data-selector
Select (or filter) data of different types and in multiple dimensions.
https://github.com/okp4/data-selector
Last synced: 26 days ago
JSON representation
Select (or filter) data of different types and in multiple dimensions.
- Host: GitHub
- URL: https://github.com/okp4/data-selector
- Owner: okp4
- License: bsd-3-clause
- Created: 2022-07-26T09:32:32.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-03-27T20:05:21.000Z (about 1 year ago)
- Last Synced: 2024-11-04T19:42:31.523Z (6 months ago)
- Language: Python
- Size: 294 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-ccamel - okp4/data-selector - Select (or filter) data of different types and in multiple dimensions. (Python)
README
# Data Selector
> Tool to select or filter data of different types and in multiple dimensions.
[](https://github.com/okp4/data-selector/releases)
[](https://github.com/okp4/data-selector/actions/workflows/lint.yml)
[](https://github.com/okp4/data-selector/actions/workflows/build.yml)
[](https://github.com/okp4/data-selector/actions/workflows/test.yml)
[](https://codecov.io/gh/okp4/data-selector)
[](https://conventionalcommits.org)
[](https://github.com/okp4/.github/blob/main/CODE_OF_CONDUCT.md)
[](https://opensource.org/licenses/BSD-3-Clause)## Purpose
The `Data Selector` tool allows you to select a sample of the dataset, specifying a number of rows. It is possible to delete or keep columns. Another parameter allows the selection of data in a column according to its value(s).
The output result is saved in a `.csv` file.
## Usage
The usage is given as follows:
```sh
Usage: data-selector [OPTIONS] COMMAND [ARGS]...Data selection interactive tool.
Options:
--help Show this message and exit.Commands:
selector Tool to select Data to Keep/Delete
version Print the application version information
```To use the command to select data in file:
```sh
Usage: data-selector selector [OPTIONS]Tool to select Data to Keep/Delete
Options:
-i, --input FILE Data file to convert [required]
-out, --output TEXT name for the output files [required]
-f, --force Overwrite existing files
-s, --file_sep TEXT File separator (csv).
-row, --nb_rows INTEGER Number of rows to import from input_file.
-keep, --columns_to_keep FILE Path to file with columns to keep.
-delete, --columns_to_delete FILE
Path to file with columns to delete.
-values, --values_to_keep FILE Path to file with columns and data to keep.
--help Show this message and exit.
```### Specification for json parameter files
For the -keep and the `-delete` option, the template is given below :
```json
[
"",
"",
"",
"",
""
]
```**column#x** are the columns you want to select/delete.
**Note that you can add as many columns as needed.**For the `values` option, the template is given below :
```json
{
"":["", "":["", "":[""]
}
```**column#x** are the columns you want to select/delete.
**value** is a list of the values you want to keep on this column.
**Note that you can add as many columns as needed.**### Build & run docker image (locally)
Build a local docker image using the following command line:
```sh
docker build -t data-selector .
```Once built, you can run the container locally with the following command line:
```sh
docker run -ti --rm -v :/DATA data-selector selector -i DATA/ -out DATA/ -s -keep DATA/ -delete DATA/ -values DATA/
```-v allows to mount a volume and to use your local data on the docker environment.
**your_path**: Local directory where the data (data to be selected, and parameter json files) are stored
**path_to_data**: The name of the file to select data from (in the Directory).
**out_name**: The name you want to give to the output file.
**file_sep**: File separator of the input file.
**path_to_select_columns**: Path towards json parametrization file.
**path_to_delete_columns**: Path towards json parametrization file.
**path_to_select_data_columns**: Path towards json parametrization file.
## Build
Project is built by [poetry](https://python-poetry.org).
```sh
poetry install
```## Lint
### Python linting
> ⚠️ Be sure to write code compliant with linters or else you'll be rejected by the CI.
**Code linting** is performed by [flake8](https://flake8.pycqa.org).```sh
poetry run flake8 --count --show-source --statistics
```**Static type check** is performed by [mypy](http://mypy-lang.org/).
```sh
poetry run mypy .
```## Unit Test
> ⚠️ Be sure to write tests that succeed or else you'll be rejected by the CI.
Unit tests are performed by the [pytest](https://docs.pytest.org) testing framework.```sh
poetry run pytest -v
```## You want to get involved? 😍
Please check out OKP4 health files :
- [Contributing](https://github.com/okp4/.github/blob/main/CONTRIBUTING.md)
- [Code of conduct](https://github.com/okp4/.github/blob/main/CODE_OF_CONDUCT.md)