https://github.com/nicomignoni/tab2img
A tool to convert tabular data into images, in order to be used by CNNs Inspired by the "DeepInsight" paper.
https://github.com/nicomignoni/tab2img
cnn data-preprocessing deepinsight tabular-data
Last synced: 6 months ago
JSON representation
A tool to convert tabular data into images, in order to be used by CNNs Inspired by the "DeepInsight" paper.
- Host: GitHub
- URL: https://github.com/nicomignoni/tab2img
- Owner: nicomignoni
- License: mit
- Created: 2020-11-12T18:08:15.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-02-11T11:54:56.000Z (almost 5 years ago)
- Last Synced: 2024-11-09T02:18:45.592Z (about 1 year ago)
- Topics: cnn, data-preprocessing, deepinsight, tabular-data
- Language: Python
- Homepage:
- Size: 495 KB
- Stars: 25
- Watchers: 2
- Forks: 6
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# tab2img: from tabular data to images
A tool to convert tabular data into images for CNN. Inspired by the [DeepInsight](https://www.nature.com/articles/s41598-019-47765-6) paper.
## Installation
```
pip install tab2img
```
## Background
In the [paper](https://www.nature.com/articles/s41598-019-47765-6) "*DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture*" the autors propose a method to convert tabular data into images, in order to utilize the power of convolutional neural network (CNN) for non-image structured data.
The Figure illustrates the main idea: given a training dataset $X \in \mathbb{R}^{m \times n}$ with $m$ samples and $n$ features, we are required to find a function $M \in \mathbb{R}^{m \times n} \to \mathbb{R}^{m \times d \times d}$, where $d = \lceil \sqrt{n} \rceil$.
There are numerous ways to choose $M$. In this implementation, the features are organized with respect to the correlation vector $\rho(X,Y)$, where $Y \in \mathbb{R}^{1 \times m}$ is the target vector.
Given $X$ and $Y$ as
$$
X = \begin{bmatrix} x^{(1)}_1 & \cdots & x^{(1)}_n \\\ \vdots & \ddots & \vdots \\\ x^{(m)}_1 & \cdots & x^{(m)}_n \end{bmatrix}, \quad Y = \begin{bmatrix} y_1 \\\ \vdots \\\ y_m \end{bmatrix}
$$
Vector $\rho_i$ express the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) for the $i$-th feature, i.e.,
$$
\rho_i = \rho(X_i, Y), \quad X_i = \begin{bmatrix} x^{(1)}_i \\\ \vdots \\\ x^{(m)}_i \end{bmatrix}
$$
In this case, being $X$ a sample, the correlation coefficient is implemented as
$$
\rho(x,y) = \frac{\sum_{i=1}^{n}(x_{i}-{\bar {x}})(y_{i}-{\bar{y}})}{{\sqrt{\sum_{i=1}^{n}(x_{i}-{\bar{x}})^{2}}}{\sqrt{\sum_{i=1}^{n}(y_{i}-{\bar{y}})^{2}}}}
$$
At this point, $\rho_1, \dots, \rho_n$ are sorted from the greatest to the smallest, generating the vector of indices
$$
J = \left[ J_k \in \mathbb{N}: \ \rho(X_{J_k}, Y) > \rho(X_{J_{k-1}}, Y), \ k = 2,\dots,n \right]
$$
Eventually, the final tensor $M$ is
$$
M = \begin{bmatrix} X_{J_1} & X_{J_2} & X_{J_5} & \cdots \\\ X_{J_3} & X_{J_4} & X_{J_7} & \cdots \\\ X_{J_6} & X_{J_8} & X_{J_9} & \cdots \\\ \vdots & \vdots & \vdots & \ddots \end{bmatrix}
$$
The mapping from $J_k$ to the right row and column $(r,c)_k$ of $M$ is
$$
(r, c)_ k = \begin{cases} (\sqrt{k}, \sqrt{k}) & \text{if} \sqrt{k} \in \mathbb{N} \\\ (\lceil\sqrt{k}\rceil, \lceil\sqrt{k}\rceil - \frac{1}{2}(\lceil\sqrt{k}\rceil^2 - k)) & \text{if} \sqrt{k} \notin \mathbb{N} \ \text{and} \ \lceil\sqrt{k}\rceil^2 - k = 0 \mod{2} \\\ (\lceil\sqrt{k}\rceil - \frac{1}{2}(\lceil\sqrt{k}\rceil^2 - k), \lceil\sqrt{k}\rceil) & \text{if} \sqrt{k} \notin \mathbb{N} \ \text{and} \ \lceil\sqrt{k}\rceil^2 - k \neq 0 \mod{2} \end{cases}
$$
## Example
```python
from sklearn.datasets import fetch_covtype
from tab2img.converter import Tab2Img
dataset = fetch_covtype()
train = dataset.data
target = dataset.target
model = Tab2Img()
images = model.fit_transform(train, target)
```