Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/josemariasosa/rubik

A set of very useful tools for data wrangling and processing that could be used with the Python library Pandas. This tools allows the user to give to a Pandas DataFrame any kind of complex structured, being able to arrange columns and rows as if they were part of a Rubik's cube.
https://github.com/josemariasosa/rubik
Last synced: about 1 month ago
JSON representation
Host: GitHub
URL: https://github.com/josemariasosa/rubik
Owner: josemariasosa
Created: 2019-07-04T17:16:00.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-12-27T19:10:06.000Z (12 months ago)
Last Synced: 2023-12-27T21:29:18.534Z (12 months ago)
Language: Python
Size: 51.8 KB
Stars: 4
Watchers: 1
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # rubik

A set of very useful tools for data wrangling and data processing that could be used with the Python library Pandas. This set of tools allows the user to give to a Pandas DataFrame any kind of complex structure, being able to arrange columns and rows as if they were part of a Rubik's cube.

Share **rubik** with all your panda friends!

## List of Content

1. [**fillna_list()**](https://github.com/josemariasosa/rubik#1-the-rkfillna_list-function)

2. [**concat_to_list()**](https://github.com/josemariasosa/rubik#2-the-rkconcat_to_list-function)

3. [**ungroup_list()**](https://github.com/josemariasosa/rubik#3-the-rkungroup_list-function)

4. [**ungroup_dict()**](https://github.com/josemariasosa/rubik#4-the-rkungroup_dict-function)

5. [**list_to_columns()**](https://github.com/josemariasosa/rubik#5-the-rklist_to_columns-function)

6. [**groupto_list()**](https://github.com/josemariasosa/rubik#6-the-rkgroupto_list-function)

7. [**groupto_tuple()**](https://github.com/josemariasosa/rubik#7-the-rkgroupto_tuple-function)

8. [**groupto_sorted_tuple()**](https://github.com/josemariasosa/rubik#8-the-rkgroupto_sorted_tuple-function)

9. [**groupto_dict()**](https://github.com/josemariasosa/rubik#9-the-rkgroupto_dict-function)

10. [**groupto_set()**](https://github.com/josemariasosa/rubik#10-the-rkgroupto_set-function)

11. [**groupto_sorted_set()**](https://github.com/josemariasosa/rubik#11-the-rkgroupto_sorted_set-function)

12. [**extend_column()**](https://github.com/josemariasosa/rubik#12-the-rkextend_column-function)

13. [**table()**](https://github.com/josemariasosa/rubik#13-the-rktable-function)

14. [**flat_list()**](https://github.com/josemariasosa/rubik#14-the-rkflat_list-function)

15. [**chunkify()**](https://github.com/josemariasosa/rubik#15-the-rkchunkify-function)

16. [**fillna_dict()**](https://github.com/josemariasosa/rubik#16-the-rkfillna_dict-function)

## Test and use rubik

### Install rubik

To install rubik from the terminal, first create a virtual environment `venv`, then use the `pip install` command:

```bash

python3 -m venv venv

source venv/bin/activate

pip install git+https://github.com/josemariasosa/rubik

```

### Check version

To make sure the installation was correct, check Rubik's version using the following command from the terminal:

```bash

python -c 'import rubik; print(rubik.__version__)'

# 2.1.0

```

### Use rubik in your scripts

Import the module using the `rk` alias for rubik.

```python

import rubik as rk

import pandas as pd

```

## Available Functions:

This is a list of the functions with very simple examples for it use.

### 0. The header

```python

import pandas as pd

from operator import itemgetter

pd.set_option('display.min_rows', 30)

pd.set_option('display.max_rows', 60)

pd.set_option('display.max_columns', 20)

```

After the Pandas Version 0.23, the used must explicitly specify the number of columns that will be printed in the standard output. When Pandas library is loaded the `set_option` method set the default to 20.

---

### 1. The `rk.fillna_list` Function

`rk.fillna_list(data_frame, column_name)`

From any column in a DataFrame, replace the `NaN` values with empty lists.

#### 1.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

```

#### 1.2 Example:

The **original table** is:

| Entry | Id        | Roles     |

|-------|-----------|-----------|

| 0     | user-123  | NaN       |

| 1     | user-452  | [1]       |

| 2     | user-21   | [5, 2]    |

| 3     | user-621  | NaN       |

| 4     | user-5512 | [3, 4]    |

| 5     | user-25   | [1, 2, 3] |

The **new table** is:

| Entry | Id        | Roles     |

|-------|-----------|-----------|

| 0     | user-123  | [ ]       |

| 1     | user-452  | [1]       |

| 2     | user-21   | [5, 2]    |

| 3     | user-621  | [ ]       |

| 4     | user-5512 | [3, 4]    |

| 5     | user-25   | [1, 2, 3] |

The **code** is:

```python

new = rk.fillna_list(original, 'Roles')

```

---

### 2. The `rk.concat_to_list` Function

`rk.concat_to_list(data_frame, column_list, column_new_name)`

Concatenate multiple columns of a data frame into a single list.

#### 2.1 Arguments:

```

data_frame      - The DataFrame we are going to work with.

column_list     - A List with the column names we are going to work with.

column_new_name - A String with the column name we are going to create.

```

#### 2.2 Example:

The **original table** is:

| Entry | Id        | Role 1 | Role 2 |

|-------|-----------|--------|--------|

| 0     | user-123  | 1      | 2      |

| 1     | user-452  | 1      | 3      |

| 2     | user-21   | 5      | 2      |

| 3     | user-621  | 3      | 1      |

| 4     | user-5512 | 3      | 4      |

| 5     | user-25   | 1      | 3      |

The **new table** is:

| Entry | Id        | Roles  |

|-------|-----------|--------|

| 0     | user-123  | [1, 2] |

| 1     | user-452  | [1, 3] |

| 2     | user-21   | [5, 2] |

| 3     | user-621  | [3, 1] |

| 4     | user-5512 | [3, 4] |

| 5     | user-25   | [1, 3] |

The **code** is:

```python

new = rk.concat_to_list(original, ['Role 1', 'Role 2'], 'Roles')

```

---

### 3. The `rk.ungroup_list` Function

`rk.ungroup_list(data_frame, column_name)`

This function unnest a 'Series of Lists' in a Pandas data frame.

⚡️ Note that the number of rows for the result may increase.

#### 3.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

```

#### 3.2 Example:

The **original table** is:

| Entry | Id       | Roles  |

|-------|----------|--------|

| 0     | user-123 | [1, 2] |

| 1     | user-452 | [5, 7] |

| 2     | user-21  | [3]    |

The **new table** is:

| Entry | Id       | Roles |

|-------|----------|-------|

| 0     | user-123 | 1     |

| 0     | user-123 | 2     |

| 1     | user-452 | 5     |

| 1     | user-452 | 7     |

| 2     | user-21  | 3     |

The **code** is:

```python

new = rk.ungroup_list(original, 'Roles')

```

---

### 4. The `rk.ungroup_dict` Function

`rk.ungroup_dict(data_frame, column_name, prefix=False)`

This function flatten a data frame with dictionaries in a column.

#### 4.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

prefix - Use the prefix argument as follow:

    - False: default, regular behavior, column names are the dict keys.

    - True: Use as prefix the original column name followed by an underscore.

    - String: The user can give any prefix.

```

#### 4.2 Example:

The **original table** is:

| Entry | Id        | Roles                       |

|-------|-----------|-----------------------------|

| 0     | user-123  | {"main": 1, "secondary": 2} |

| 1     | user-452  | {"main": 3, "secondary": 1} |

| 2     | user-21   | {"main": 7}                 |

| 3     | user-621  | {"main": 2, "secondary": 6} |

| 4     | user-5512 | {"main": 7, "secondary": 5} |

| 5     | user-25   | {"main": 3}                 |

The **new table** is:

| Entry | Id        | main | secondary |

|-------|-----------|------|-----------|

| 0     | user-123  | 1    | 2         |

| 1     | user-452  | 3    | 1         |

| 2     | user-21   | 7    | NaN       |

| 3     | user-621  | 2    | 6         |

| 4     | user-5512 | 7    | 5         |

| 5     | user-25   | 3    | NaN       |

The **code** is:

```python

new = rk.ungroup_dict(original, 'Roles')

```

---

### 5. The `rk.list_to_columns` Function

`rk.list_to_columns(data_frame, column_name)`

This function creates multiple columns from a single column with lists.

#### 5.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

```

#### 5.2 Example:

The **original table** is:

| Entry | Id        | Roles  |

|-------|-----------|--------|

| 0     | user-123  | [1, 2] |

| 1     | user-452  | [1, 3] |

| 2     | user-21   | [5, 2] |

| 3     | user-621  | [3, 1] |

| 4     | user-5512 | [3, 4] |

| 5     | user-25   | [1, 3] |

The **new table** is:

| Entry | Id        | Roles_1 | Roles_2 |

|-------|-----------|---------|---------|

| 0     | user-123  | 1       | 2       |

| 1     | user-452  | 1       | 3       |

| 2     | user-21   | 5       | 2       |

| 3     | user-621  | 3       | 1       |

| 4     | user-5512 | 3       | 4       |

| 5     | user-25   | 1       | 3       |

The **code** is:

```python

new = rk.list_to_columns(original, 'Roles')

```

---

### 6. The `rk.groupto_list` Function

`rk.groupto_list(data_frame, column_list, column_name)`

Group a variable (column_name) in to a single list in regards of agroup of variables (column_list).

#### 6.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with as pivot columns.

column_name - A String with the column name we are going to modify.

```

#### 6.2 Example:

The **original table** is:

| Entry | Id       | Roles |

|-------|----------|-------|

| 0     | user-123 | 1     |

| 0     | user-123 | 2     |

| 1     | user-452 | 5     |

| 1     | user-452 | 7     |

| 2     | user-21  | 3     |

The **new table** is:

| Entry | Id       | Roles  |

|-------|----------|--------|

| 0     | user-123 | [1, 2] |

| 1     | user-452 | [5, 7] |

| 2     | user-21  | [3]    |

The **code** is:

```python

new = rk.groupto_list(original, ['Entry', 'Id'], 'Roles')

```

---

### 7. The `rk.groupto_tuple` Function

`rk.groupto_tuple(data_frame, column_list, column_name)`

Group a variable (column_name) into a tuple in regards of a group of variables (column_list).

#### 7.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with as pivot columns.

column_name - A String with the column name we are going to modify.

```

#### 7.2 Example:

The **original table** is:

| Entry | Id       | Roles |

|-------|----------|-------|

| 0     | user-123 | 1     |

| 0     | user-123 | 2     |

| 1     | user-452 | 5     |

| 1     | user-452 | 7     |

| 2     | user-21  | 3     |

The **new table** is:

| Entry | Id       | Roles  |

|-------|----------|--------|

| 0     | user-123 | (1, 2) |

| 1     | user-452 | (5, 7) |

| 2     | user-21  | (3, )  |

The **code** is:

```python

new = rk.groupto_tuple(original, ['Entry', 'Id'], 'Roles')

```

---

### 8. The `rk.groupto_sorted_tuple` Function

`rk.groupto_sorted_tuple(data_frame, column_list, column_name, n=0)`

Group a variable (column_name) in to a single tuple in regards of a group of variables (column_list). Sort a list of tuples by the first, second, or n-1 element.

#### 8.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with.

column_name - A String with the column name we are going to modify.

n           - An integer with the index of the sorting value (default n = 0).

```

#### 8.2 Example:

The **original table** is:

| Entry | Id       | Roles |

|-------|----------|-------|

| 0     | user-123 | 2     |

| 0     | user-123 | 1     |

| 1     | user-452 | 7     |

| 1     | user-452 | 5     |

| 2     | user-21  | 3     |

The **new table** is:

| Entry | Id       | Roles  |

|-------|----------|--------|

| 0     | user-123 | (1, 2) |

| 1     | user-452 | (5, 7) |

| 2     | user-21  | (3, )  |

The **code** is:

```python

new = rk.groupto_sorted_tuple(original, ['Entry', 'Id'], 'Roles')

```

---

### 9. The `rk.groupto_dict` Function

`rk.groupto_dict(data_frame, column_list, column_new_name)`

Generate new column with dictionaries having values of othe columns.

#### 9.1 Arguments:

```

data_frame      - The DataFrame we are going to work with.

column_list     - A List with the column names we are going to work with.

column_new_name - A String with the column name we are going to create.

```

#### 9.2 Example:

The **original table** is:

| Entry | Id        | main | secondary |

|-------|-----------|------|-----------|

| 0     | user-123  | 1    | 2         |

| 1     | user-452  | 3    | 1         |

| 2     | user-21   | 7    | 3         |

| 3     | user-621  | 2    | 6         |

| 4     | user-5512 | 7    | 5         |

| 5     | user-25   | 3    | 3         |

The **new table** is:

| Entry | Id        | Roles                       |

|-------|-----------|-----------------------------|

| 0     | user-123  | {"main": 1, "secondary": 2} |

| 1     | user-452  | {"main": 3, "secondary": 1} |

| 2     | user-21   | {"main": 7, "secondary": 3} |

| 3     | user-621  | {"main": 2, "secondary": 6} |

| 4     | user-5512 | {"main": 7, "secondary": 5} |

| 5     | user-25   | {"main": 3, "secondary": 3} |

The **code** is:

```python

new = rk.groupto_dict(original, ['main', 'secondary'], 'Roles')

```

---

### 10. The `rk.groupto_set` Function

`rk.groupto_set(data_frame, column_list, column_name)`

Group a variable (column_name) in to a single set in regards of a group of variables (column_list).

**The returned object type is List, not an actual set.**

#### 10.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with.

column_name - A String with the column name we are going to modify.

```

#### 10.2 Example:

The **original table** is:

| Entry | Id       | Roles |

|-------|----------|-------|

| 0     | user-123 | 2     |

| 0     | user-123 | 1     |

| 1     | user-452 | 7     |

| 1     | user-452 | 7     |

| 2     | user-21  | 3     |

The **new table** is:

| Entry | Id       | Roles  |

|-------|----------|--------|

| 0     | user-123 | [2, 1] |

| 1     | user-452 | [7]    |

| 2     | user-21  | [3]    |

The **code** is:

```python

new = rk.groupto_set(original, ['Entry', 'Id'], 'Roles')

```

---

### 11. The `rk.groupto_sorted_set` Function

`rk.groupto_sorted_set(data_frame, column_list, column_name)`

Group a variable (column_name) into a sorted set in regards of a group of variables (column_list).

**The returned object type is List, not an actual set.**

#### 11.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with.

column_name - A String with the column name we are going to modify.

```

#### 11.2 Example:

The **original table** is:

| Entry | Id       | Roles |

|-------|----------|-------|

| 0     | user-123 | 2     |

| 0     | user-123 | 1     |

| 1     | user-452 | 7     |

| 1     | user-452 | 7     |

| 2     | user-21  | 3     |

The **new table** is:

| Entry | Id       | Roles  |

|-------|----------|--------|

| 0     | user-123 | [1, 2] |

| 1     | user-452 | [7]    |

| 2     | user-21  | [3]    |

The **code** is:

```python

new = rk.groupto_sorted_set(original, ['Entry', 'Id'], 'Roles')

```

---

### 12. The `rk.extend_column` Function

`rk.extend_column(data_frame, col_name_1, col_name_2, col_new_name)`

Expand 2 Pandas Series with every element being lists into a single column with lists.

#### 12.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

col_name_1 - A String with the column name we are going to modify.

col_name_2 - A String with the column name we are going to modify.

col_new_name - A String with the column name we are going to create.

```

#### 12.2 Example:

The **original table** is:

| Entry | Id        | Roles1 | Roles2 |

|-------|-----------|--------|--------|

| 0     | user-123  | [1, 2] | [ ]    |

| 1     | user-452  | [3, 1] | [2]    |

| 2     | user-21   | [7]    | [5, 4] |

| 3     | user-621  | [2, 6] | [ ]    |

| 4     | user-5512 | [7, 5] | [1]    |

| 5     | user-25   | [3]    | [4, 5] |

The **new table** is:

| Entry | Id        | Roles    |

|-------|-----------|-----------|

| 0     | user-123  | [1, 2]    |

| 1     | user-452  | [3, 1, 2] |

| 2     | user-21   | [7, 5, 4] |

| 3     | user-621  | [2, 6]    |

| 4     | user-5512 | [7, 5, 1] |

| 5     | user-25   | [3, 4, 5] |

The **code** is:

```python

new = rk.extend_column(original, 'Roles1', 'Roles2', 'Roles')

```

---

### 13. The `rk.table` Function

`rk.table(_list)`

This function works like table() in R. It returns a data frame with the frequency of the elements in a given list.

The response is a Pandas DataFrame.

#### 13.1 Arguments:

```

_list  - The List we are going to work with.

```

#### 13.2 Example:

The **original list** is:

`[100, 103, 555, 102, 100, 100, 100, 102, 103, 103]`

The **new table** is:

| value | freq |

|-------|------|

| 100   | 4    |

| 103   | 3    |

| 102   | 2    |

| 555   | 1    |

The **code** is:

```python

original = [100, 103, 555, 102, 100, 100, 100, 102, 103, 103]

new = rk.table(original)

```

---

### 14. The `rk.flat_list` Function

`rk.flat_list(_list)`

Flatten a list with nested lists.

#### 14.1 Arguments:

```

_list  - The List we are going to work with.

```

#### 14.2 Example:

The **original list** is:

`[[100,[103, [555]]], 102]`

The **new list** is:

`[100, 103, 555, 102]`

The **code** is:

```python

original = [[100,[103, [555]]], 102]

new = rk.flat_list(original)

print(new)

# [100, 103, 555, 102]

```

---

### 15. The `rk.chunkify` Function

`rk.chunkify(chunk_this_list, chunk_size)`

Create smaller chunks in the same list.

#### 15.1 Arguments:

```

chunk_this_list  - The List we are going to work with.

chunk_size       - An integer with the number of elements in a chunk.

```

#### 15.2 Example:

The **original list** is:

`[100, 103, 555, 102, 100, 100, 100, 102, 103]`

The **new list** is:

`[[100, 103], [555, 102], [100, 100], [100, 102], [103]]`

The **code** is:

```python

original = [100, 103, 555, 102, 100, 100, 100, 102, 103, 103]

new = rk.chunkify(original, 2)

print(new)

# [[100, 103], [555, 102], [100, 100], [100, 102], [103]]

```

---

### 16. The `rk.fillna_dict` Function

`rk.fillna_dict(data_frame, column_name)`

From any column in a DataFrame, replace the NaN values with empty dictionaries.

#### 16.1 Arguments:

```

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

```

#### 16.2 Example:

The **original table** is:

| Entry | Id        | Roles     |

|-------|-----------|-----------|

| 0     | user-123  | NaN       |

| 1     | user-452  | NaN       |

| 2     | user-21   | {'r': 1}  |

| 3     | user-621  | NaN       |

| 4     | user-5512 | {'r': 2}  |

| 5     | user-25   | NaN       |

The **new table** is:

| Entry | Id        | Roles     |

|-------|-----------|-----------|

| 0     | user-123  | { }       |

| 1     | user-452  | { }       |

| 2     | user-21   | {'r': 1}  |

| 3     | user-621  | { }       |

| 4     | user-5512 | {'r': 2}  |

| 5     | user-25   | { }       |

The **code** is:

```python

new = rk.fillna_dict(original, 'Roles')

```

---

Get the code of the last version [here](https://github.com/josemariasosa/rubik/blob/master/rubik.py).

## Versions:

- version - 2.2.3 *'Pareciera ser todo más oscuro acá abajo.'*

    1. `flat_list` is more compatible with Pandas.

    2. I removed the versions funny names from the code.

- version - 2.2.2 *'My guitar is not too loud!'*

    1. Fixing edge case for the `flat_list` function.

- version - 2.2.1 *'Never stop until the cube is done.'*

    1. Fixing edge case for the `ungroup_dict` function using math.

    https://docs.python.org/3/library/math.html#math.isnan

    2. New function. fillna_dict.

- version - 2.2 *'Pandemic leisure.'*

    1. Updating function. For `ungroup_dict`, the user may use a prefix for the new columns that will be created.

- version - 2.1 *'This is the end of a decade.'*

    1. (deleted) New function. Expand a column with a list, into multiple columns.

    2. Updating function. chunkify receives now a list or a DataFrame.

- version - 2.0. *'PyCon Latam 2019 - Puerto Vallarta.'*

    1. New function names. Again! In compliance with PEP8.

    2. Create the rubik Package for git.

    3. pip install git+https://github.com/josemariasosa/rubik

- version - 1.3.2. *'New job. New opportunities.'*

    1. Displaying a DataFrame in the standard output in a pretty way.

        - Once the display.max_rows is exceeded, the display.min_rows

        options determines how many rows are shown in the truncated

        repr.

- version - 1.3.1. *'Just a little bit higher. Not too much.'*

    1. Standardizing names and the format.

- version - 1.3. *'I should not be high in classes.'*

            

	1. Improvements in the flatDict function.

        - Avoid crashing names with the dictionary keys.

    2. Adding the chunkify function.

- version - 0. *'I am not the original one, but I'm old, thought.'*