https://github.com/soda-inria/carte

Repository for CARTE: Context-Aware Representation of Table Entries
https://github.com/soda-inria/carte

classification data-science graph-transformer machine-learning regression transformers

Last synced: 3 months ago
JSON representation

Repository for CARTE: Context-Aware Representation of Table Entries

Host: GitHub
URL: https://github.com/soda-inria/carte
Owner: soda-inria
License: bsd-3-clause
Created: 2024-05-31T07:03:55.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-04-04T06:35:13.000Z (3 months ago)
Last Synced: 2025-04-12T21:19:32.690Z (3 months ago)
Topics: classification, data-science, graph-transformer, machine-learning, regression, transformers
Language: Python
Homepage: https://soda-inria.github.io/carte/
Size: 161 MB
Stars: 119
Watchers: 10
Forks: 14
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        [![Downloads](https://img.shields.io/pypi/dm/carte-ai)](https://pypi.org/project/carte-ai/)

[![PyPI Version](https://img.shields.io/pypi/v/carte-ai)](https://pypi.org/project/carte-ai/)

[![Python Version](https://img.shields.io/pypi/pyversions/carte-ai)](https://pypi.org/project/carte-ai/)

[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

![Code Coverage](https://img.shields.io/badge/coverage-81%25-brightgreen)

[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Benchmark-yellow)](https://huggingface.co/datasets/inria-soda/carte-benchmark)

[![arXiv](https://img.shields.io/badge/arXiv-2402.16785-blue.svg)](https://arxiv.org/pdf/2402.16785)

# CARTE: 
Pretraining and Transfer for Tabular Learning

![CARTE_outline](carte_ai/data/etc/outline_carte.jpg)

This repository contains the implementation of the paper CARTE: Pretraining and Transfer for Tabular Learning.

CARTE is a pretrained model for tabular data by treating each table row as a star graph and training a graph transformer on top of this representation.

## Colab Examples (Give it a test):

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1PeltEmNLehQ26VQtFJhl7OxnzCS8rPMT?usp=sharing)

* CARTERegressor on Wine Poland dataset

* CARTEClassifier on Spotify dataset

  

Other datasets are available for testing: [datasets](https://huggingface.co/datasets/inria-soda/carte-benchmark/tree/main/data_raw)

> [!WARNING]

> This library is currently in a phase of active development. All features are subject to change without prior notice. If you are interested in collaborating, please feel free to reach out by opening an issue or starting a discussion.

### 01 Install 🚀

The library has been tested on Linux, MacOSX and Windows.

CARTE-AI can be installed from [PyPI](https://pypi.org/project/carte-ai):


pip install carte-ai

pip install huggingface_hub



#### Post installation check

After a correct installation, you should be able to import the module without errors:

```python

import carte_ai

```

### 02 CARTE-AI example on sampled data step by step ➡️

#### 1️⃣ Load the Data 💽

```python

import pandas as pd

from carte_ai.data.load_data import *

num_train = 128  # Example: set the number of training groups/entities

random_state = 1  # Set a random seed for reproducibility

X_train, X_test, y_train, y_test = wina_pl(num_train, random_state)

print("Wina Poland dataset:", X_train.shape, X_test.shape)

```

![sample](images/data_wina.png)

#### 2️⃣ Convert Table 2 Graph 🪵

The basic preparations are:

- preprocess raw data

- load the prepared data and configs; set train/test split

- generate graphs for each table entries (rows) using the Table2GraphTransformer

- create an estimator and make inference

```python

import fasttext

from huggingface_hub import hf_hub_download

from carte_ai import Table2GraphTransformer

model_path = hf_hub_download(repo_id="hi-paris/fastText", filename="cc.en.300.bin")

preprocessor = Table2GraphTransformer(fasttext_model_path=model_path)

# Fit and transform the training data

X_train = preprocessor.fit_transform(X_train, y=y_train)

# Transform the test data

X_test = preprocessor.transform(X_test)

```

![sample](images/t2g.png)

#### 3️⃣ Make Predictions🔮

For learning, CARTE currently runs with the sklearn interface (fit/predict) and the process is:

- Define parameters

- Set the estimator

- Run 'fit' to train the model and 'predict' to make predictions

```python

from carte_ai import CARTERegressor, CARTEClassifier

# Define some parameters

fixed_params = dict()

fixed_params["num_model"] = 10 # 10 models for the bagging strategy

fixed_params["disable_pbar"] = False # True if you want cleanness

fixed_params["random_state"] = 0

fixed_params["device"] = "cpu"

fixed_params["n_jobs"] = 10

fixed_params["pretrained_model_path"] = config_directory["pretrained_model"]

# Define the estimator and run fit/predict

estimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression

estimator.fit(X=X_train, y=y_train)

y_pred = estimator.predict(X_test)

# Obtain the r2 score on predictions

score = r2_score(y_test, y_pred)

print(f"\nThe R2 score for CARTE:", "{:.4f}".format(score))

```

![sample](images/performance.png)

### 03 Reproducing paper results ⚙️

➡️ [installation instructions setup paper](INSTALL.md)

### 04 Contribute to the package 🚀

➡️ [read the contributions guidelines](CONTRIBUTIONS.md)

### 05 Star History ⭐️

![Star History Chart](https://api.star-history.com/svg?repos=soda-inria/carte&type=Date)

### 06 CARTE-AI references 📚

```

@article{kim2024carte,

  title={CARTE: pretraining and transfer for tabular learning},

  author={Kim, Myung Jun and Grinsztajn, L{\'e}o and Varoquaux, Ga{\"e}l},

  journal={arXiv preprint arXiv:2402.16785},

  year={2024}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/soda-inria/carte

Awesome Lists containing this project

README