Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/okp4/data-selector
Select (or filter) data of different types and in multiple dimensions.
https://github.com/okp4/data-selector
Last synced: 4 days ago
JSON representation
Select (or filter) data of different types and in multiple dimensions.
- Host: GitHub
- URL: https://github.com/okp4/data-selector
- Owner: okp4
- License: bsd-3-clause
- Created: 2022-07-26T09:32:32.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-27T20:05:21.000Z (8 months ago)
- Last Synced: 2024-08-01T13:35:12.466Z (3 months ago)
- Language: Python
- Size: 294 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-ccamel - okp4/data-selector - Select (or filter) data of different types and in multiple dimensions. (Python)
README
# Data Selector
> Tool to select or filter data of different types and in multiple dimensions.
[![version](https://img.shields.io/github/v/release/okp4/data-selector?style=for-the-badge&logo=github)](https://github.com/okp4/data-selector/releases)
[![lint](https://img.shields.io/github/actions/workflow/status/okp4/data-selector/lint.yml?branch=main&label=lint&style=for-the-badge&logo=github)](https://github.com/okp4/data-selector/actions/workflows/lint.yml)
[![build](https://img.shields.io/github/actions/workflow/status/okp4/data-selector/build.yml?branch=main&label=build&style=for-the-badge&logo=github)](https://github.com/okp4/data-selector/actions/workflows/build.yml)
[![test](https://img.shields.io/github/actions/workflow/status/okp4/data-selector/test.yml?branch=main&label=test&style=for-the-badge&logo=github)](https://github.com/okp4/data-selector/actions/workflows/test.yml)
[![codecov](https://img.shields.io/codecov/c/github/okp4/data-selector?style=for-the-badge&logo=codecov)](https://codecov.io/gh/okp4/data-selector)
[![conventional commits](https://img.shields.io/badge/Conventional%20Commits-1.0.0-yellow.svg?style=for-the-badge&logo=conventionalcommits)](https://conventionalcommits.org)
[![contributor covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg?style=for-the-badge)](https://github.com/okp4/.github/blob/main/CODE_OF_CONDUCT.md)
[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg?style=for-the-badge)](https://opensource.org/licenses/BSD-3-Clause)## Purpose
The `Data Selector` tool allows you to select a sample of the dataset, specifying a number of rows. It is possible to delete or keep columns. Another parameter allows the selection of data in a column according to its value(s).
The output result is saved in a `.csv` file.
## Usage
The usage is given as follows:
```sh
Usage: data-selector [OPTIONS] COMMAND [ARGS]...Data selection interactive tool.
Options:
--help Show this message and exit.Commands:
selector Tool to select Data to Keep/Delete
version Print the application version information
```To use the command to select data in file:
```sh
Usage: data-selector selector [OPTIONS]Tool to select Data to Keep/Delete
Options:
-i, --input FILE Data file to convert [required]
-out, --output TEXT name for the output files [required]
-f, --force Overwrite existing files
-s, --file_sep TEXT File separator (csv).
-row, --nb_rows INTEGER Number of rows to import from input_file.
-keep, --columns_to_keep FILE Path to file with columns to keep.
-delete, --columns_to_delete FILE
Path to file with columns to delete.
-values, --values_to_keep FILE Path to file with columns and data to keep.
--help Show this message and exit.
```### Specification for json parameter files
For the -keep and the `-delete` option, the template is given below :
```json
[
"",
"",
"",
"",
""
]
```**column#x** are the columns you want to select/delete.
**Note that you can add as many columns as needed.**For the `values` option, the template is given below :
```json
{
"":["", "":["", "":[""]
}
```**column#x** are the columns you want to select/delete.
**value** is a list of the values you want to keep on this column.
**Note that you can add as many columns as needed.**### Build & run docker image (locally)
Build a local docker image using the following command line:
```sh
docker build -t data-selector .
```Once built, you can run the container locally with the following command line:
```sh
docker run -ti --rm -v :/DATA data-selector selector -i DATA/ -out DATA/ -s -keep DATA/ -delete DATA/ -values DATA/
```-v allows to mount a volume and to use your local data on the docker environment.
**your_path**: Local directory where the data (data to be selected, and parameter json files) are stored
**path_to_data**: The name of the file to select data from (in the Directory).
**out_name**: The name you want to give to the output file.
**file_sep**: File separator of the input file.
**path_to_select_columns**: Path towards json parametrization file.
**path_to_delete_columns**: Path towards json parametrization file.
**path_to_select_data_columns**: Path towards json parametrization file.
## Build
Project is built by [poetry](https://python-poetry.org).
```sh
poetry install
```## Lint
### Python linting
> ⚠️ Be sure to write code compliant with linters or else you'll be rejected by the CI.
**Code linting** is performed by [flake8](https://flake8.pycqa.org).```sh
poetry run flake8 --count --show-source --statistics
```**Static type check** is performed by [mypy](http://mypy-lang.org/).
```sh
poetry run mypy .
```## Unit Test
> ⚠️ Be sure to write tests that succeed or else you'll be rejected by the CI.
Unit tests are performed by the [pytest](https://docs.pytest.org) testing framework.```sh
poetry run pytest -v
```## You want to get involved? 😍
Please check out OKP4 health files :
- [Contributing](https://github.com/okp4/.github/blob/main/CONTRIBUTING.md)
- [Code of conduct](https://github.com/okp4/.github/blob/main/CODE_OF_CONDUCT.md)