{"id":18553898,"url":"https://github.com/soda-inria/carte","last_synced_at":"2025-04-12T21:19:48.337Z","repository":{"id":242026420,"uuid":"808484740","full_name":"soda-inria/carte","owner":"soda-inria","description":"Repository for CARTE: Context-Aware Representation of Table Entries","archived":false,"fork":false,"pushed_at":"2025-04-04T06:35:13.000Z","size":168564,"stargazers_count":119,"open_issues_count":0,"forks_count":14,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-12T21:19:32.690Z","etag":null,"topics":["classification","data-science","graph-transformer","machine-learning","regression","transformers"],"latest_commit_sha":null,"homepage":"https://soda-inria.github.io/carte/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/soda-inria.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-31T07:03:55.000Z","updated_at":"2025-04-09T15:36:45.000Z","dependencies_parsed_at":"2024-12-21T22:12:11.475Z","dependency_job_id":"51a4ef7d-641e-4170-ad6d-7996680c078a","html_url":"https://github.com/soda-inria/carte","commit_stats":null,"previous_names":["soda-inria/carte"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soda-inria%2Fcarte","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soda-inria%2Fcarte/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soda-inria%2Fcarte/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soda-inria%2Fcarte/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/soda-inria","download_url":"https://codeload.github.com/soda-inria/carte/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248632112,"owners_count":21136629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","data-science","graph-transformer","machine-learning","regression","transformers"],"created_at":"2024-11-06T21:18:47.501Z","updated_at":"2025-04-12T21:19:48.314Z","avatar_url":"https://github.com/soda-inria.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Downloads](https://img.shields.io/pypi/dm/carte-ai)](https://pypi.org/project/carte-ai/)\n[![PyPI Version](https://img.shields.io/pypi/v/carte-ai)](https://pypi.org/project/carte-ai/)\n[![Python Version](https://img.shields.io/pypi/pyversions/carte-ai)](https://pypi.org/project/carte-ai/)\n[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)\n![Code Coverage](https://img.shields.io/badge/coverage-81%25-brightgreen)\n[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Benchmark-yellow)](https://huggingface.co/datasets/inria-soda/carte-benchmark)\n[![arXiv](https://img.shields.io/badge/arXiv-2402.16785-blue.svg)](https://arxiv.org/pdf/2402.16785)\n\n\n\n# CARTE: \u003cbr /\u003ePretraining and Transfer for Tabular Learning\n\n![CARTE_outline](carte_ai/data/etc/outline_carte.jpg)\n\nThis repository contains the implementation of the paper CARTE: Pretraining and Transfer for Tabular Learning.\n\nCARTE is a pretrained model for tabular data by treating each table row as a star graph and training a graph transformer on top of this representation.\n\n## Colab Examples (Give it a test):\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1PeltEmNLehQ26VQtFJhl7OxnzCS8rPMT?usp=sharing)\n* CARTERegressor on Wine Poland dataset\n* CARTEClassifier on Spotify dataset\n  \nOther datasets are available for testing: [datasets](https://huggingface.co/datasets/inria-soda/carte-benchmark/tree/main/data_raw)\n\n\n\u003e [!WARNING]\n\u003e This library is currently in a phase of active development. All features are subject to change without prior notice. If you are interested in collaborating, please feel free to reach out by opening an issue or starting a discussion.\n\n\n### 01 Install 🚀\n\nThe library has been tested on Linux, MacOSX and Windows.\n\nCARTE-AI can be installed from [PyPI](https://pypi.org/project/carte-ai):\n\n\u003cpre\u003e\npip install carte-ai\npip install huggingface_hub\n\u003c/pre\u003e\n\n#### Post installation check\nAfter a correct installation, you should be able to import the module without errors:\n\n```python\nimport carte_ai\n```\n\n### 02 CARTE-AI example on sampled data step by step ➡️\n\n#### 1️⃣ Load the Data 💽\n```python\nimport pandas as pd\nfrom carte_ai.data.load_data import *\n\nnum_train = 128  # Example: set the number of training groups/entities\nrandom_state = 1  # Set a random seed for reproducibility\nX_train, X_test, y_train, y_test = wina_pl(num_train, random_state)\nprint(\"Wina Poland dataset:\", X_train.shape, X_test.shape)\n```\n![sample](images/data_wina.png)\n\n#### 2️⃣ Convert Table 2 Graph 🪵\n\nThe basic preparations are:\n- preprocess raw data\n- load the prepared data and configs; set train/test split\n- generate graphs for each table entries (rows) using the Table2GraphTransformer\n- create an estimator and make inference\n\n```python\nimport fasttext\nfrom huggingface_hub import hf_hub_download\nfrom carte_ai import Table2GraphTransformer\n\nmodel_path = hf_hub_download(repo_id=\"hi-paris/fastText\", filename=\"cc.en.300.bin\")\n\npreprocessor = Table2GraphTransformer(fasttext_model_path=model_path)\n\n# Fit and transform the training data\nX_train = preprocessor.fit_transform(X_train, y=y_train)\n\n# Transform the test data\nX_test = preprocessor.transform(X_test)\n```\n![sample](images/t2g.png)\n\n#### 3️⃣ Make Predictions🔮\nFor learning, CARTE currently runs with the sklearn interface (fit/predict) and the process is:\n- Define parameters\n- Set the estimator\n- Run 'fit' to train the model and 'predict' to make predictions\n\n```python\nfrom carte_ai import CARTERegressor, CARTEClassifier\n\n# Define some parameters\nfixed_params = dict()\nfixed_params[\"num_model\"] = 10 # 10 models for the bagging strategy\nfixed_params[\"disable_pbar\"] = False # True if you want cleanness\nfixed_params[\"random_state\"] = 0\nfixed_params[\"device\"] = \"cpu\"\nfixed_params[\"n_jobs\"] = 10\nfixed_params[\"pretrained_model_path\"] = config_directory[\"pretrained_model\"]\n\n\n# Define the estimator and run fit/predict\n\nestimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression\nestimator.fit(X=X_train, y=y_train)\ny_pred = estimator.predict(X_test)\n\n# Obtain the r2 score on predictions\n\nscore = r2_score(y_test, y_pred)\nprint(f\"\\nThe R2 score for CARTE:\", \"{:.4f}\".format(score))\n```\n![sample](images/performance.png)\n\n### 03 Reproducing paper results ⚙️\n\n➡️ [installation instructions setup paper](INSTALL.md)\n\n### 04 Contribute to the package 🚀\n\n➡️ [read the contributions guidelines](CONTRIBUTIONS.md)\n\n### 05 Star History ⭐️\n\n![Star History Chart](https://api.star-history.com/svg?repos=soda-inria/carte\u0026type=Date)\n\n### 06 CARTE-AI references 📚\n\n```\n@article{kim2024carte,\n  title={CARTE: pretraining and transfer for tabular learning},\n  author={Kim, Myung Jun and Grinsztajn, L{\\'e}o and Varoquaux, Ga{\\\"e}l},\n  journal={arXiv preprint arXiv:2402.16785},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoda-inria%2Fcarte","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoda-inria%2Fcarte","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoda-inria%2Fcarte/lists"}