{"id":20484434,"url":"https://github.com/sap-samples/salt","last_synced_at":"2026-03-09T15:30:54.760Z","repository":{"id":261151173,"uuid":"879967554","full_name":"SAP-samples/salt","owner":"SAP-samples","description":"Source code and data for the paper \"SALT: Sales Autocompletion Linked Business Tables Dataset\"","archived":false,"fork":false,"pushed_at":"2025-01-10T23:59:09.000Z","size":889,"stargazers_count":12,"open_issues_count":1,"forks_count":1,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-05T16:19:24.991Z","etag":null,"topics":["business","data","dataset","linked","tables"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SAP-samples.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-28T21:49:43.000Z","updated_at":"2025-02-03T14:06:18.000Z","dependencies_parsed_at":"2024-11-05T01:20:42.881Z","dependency_job_id":"7b1b2e98-c3eb-45a7-8757-0f063837b3b5","html_url":"https://github.com/SAP-samples/salt","commit_stats":null,"previous_names":["sap-samples/salt"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SAP-samples/salt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SAP-samples%2Fsalt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SAP-samples%2Fsalt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SAP-samples%2Fsalt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SAP-samples%2Fsalt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SAP-samples","download_url":"https://codeload.github.com/SAP-samples/salt/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SAP-samples%2Fsalt/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30301109,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T14:33:48.460Z","status":"ssl_error","status_checked_at":"2026-03-09T14:33:48.027Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["business","data","dataset","linked","tables"],"created_at":"2024-11-15T16:22:22.197Z","updated_at":"2026-03-09T15:30:54.749Z","avatar_url":"https://github.com/SAP-samples.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\nSPDX-FileCopyrightText: 2017 Free Software Foundation Europe e.V. \u003chttps://fsfe.org\u003e\n\nSPDX-License-Identifier: CC-BY-NC-4.0\n--\u003e\n\n# SALT: Sales Autocompletion Linked Business Tables Dataset\n[![made-with-python](https://img.shields.io/badge/Made%20with-Python-red.svg)](#python)\n[![License](https://img.shields.io/badge/license-CC--BY--NC--SA--4.0-blue)]()\n[![arXiv](https://img.shields.io/badge/arXiv-2501.03413-29d634.svg)](https://arxiv.org/abs/2501.03413)\n[![REUSE status](https://api.reuse.software/badge/github.com/sap-samples/salt)](https://api.reuse.software/info/github.com/sap-samples/salt)\n\n\n\n\n#### News\n- **07/10/2025: 🎉🎉🎉 Dataset is now integrated into [RelBench](https://github.com/snap-stanford/relbench) 🎉🎉🎉**\n- 01/11/2025: Updated paper (some results changed due to minor dataset changes, screenshots added to appendix) \n- 12/22/2024: Data available via Hugging Face only\n- 12/19/2024: Train/test split provided\n- 12/15/2024: Preliminatry dataset available on Hugging Face.\n- 12/13/2024: Provided data \n- 10/29/2024: Preliminary repository created\n\n\n## Description\nThis repository contains the dataset from our paper [**SALT: Sales Autocompletion Linked Business Tables Dataset**](https://openreview.net/forum?id=UZbELpkWIr#discussion) presented at [NeurIPS'24 Table Representation Workshop](https://table-representation-learning.github.io/).\n\n### Abstract\nFoundation models, particularly those that incorporate Transformer architectures, have demonstrated exceptional performance in domains such as natural language processing and image processing. Adapting these models to structured data, like tables, however, introduces significant challenges. These difficulties are even more pronounced when addressing multi-table data linked via foreign key, which is prevalent in the enterprise realm and crucial for empowering business use cases. Despite its substantial impact, research focusing on such linked business tables within enterprise settings remains a significantly important yet underexplored domain.\nTo address this, we introduce a curated dataset sourced from an Enterprise Resource Planning (ERP) system, featuring extensive linked tables. This dataset is specifically designed to support research endeavors in table representation learning. By providing access to authentic enterprise data, our goal is to potentially enhance the effectiveness and applicability of models for real-world business contexts.\n\n### Dataset on Hugging Face\n\nThe dataset is now also available via the Hugging Face dataset platform. Check-out: [SALT](https://huggingface.co/datasets/sap-ai-research/SALT)\n\n```python\nfrom datasets import load_dataset\n\nds = load_dataset(\"sap-ai-research/SALT\")\n```\n\n### Usage\n\n#### Example of loading the tables with Hugging Face datasets\nUnless pandas library is already installed, install it with:\n\n```bash\npip install pandas\n```\n\n```python\nfrom datasets import load_dataset\n\ndataset_name = \"sap-ai-research/SALT\"\nsplit = \"train\"  # use \"train\" or \"test\"\nsalesdocuments = load_dataset(dataset_name, \"salesdocuments\", split=split)\nsalesdocument_items = load_dataset(dataset_name, \"salesdocument_items\", split=split)\ncustomers = load_dataset(dataset_name, \"customers\", split=split)\naddresses = load_dataset(dataset_name, \"addresses\", split=split)\n\n# you can also load the joined table which combines the four tables in one\njoined_table = load_dataset(dataset_name, \"joined_table\", split=split)\n```\n\n\n### Information\n![Table Schema of SALT Dataset](images/schema.svg?raw=true \"SALT Schema\")\n*Table Schema of SALT Dataset*\n\n![Screenshot of a Salesorder Input Mask](images/SAP_S4HANA_SalesOrder_App.png?raw=true \"Salesorder Input Mask\")\n*Example Input Mask of a Salesorder App using SAP S/4HANA*\n\n## Requirements\nN/A\n\n## Known Issues\nNo known issues\n\n### Authors:\n - [Tassilo Klein](https://tjklein.github.io/)\n - [Clemens Biehl](https://www.linkedin.com/in/clemens-biehl-43a39a117/)\n - [Margarida Costa](https://www.linkedin.com/in/mariamargaridacosta/)\n - [André Sreš](https://www.linkedin.com/in/andr%C3%A9-sre%C5%A1-937096160/)\n - [Jonas Kolk](https://www.linkedin.com/in/jonas-kolk-b8a94b123/)\n - [Johannes Hoffart](https://www.linkedin.com/in/johanneshoffart/)\n\n## Citations\nIf you use this dataset in your research or want to refer to our work, please cite:\n\n```\n@inproceedings{\nklein2024salt,\ntitle={{SALT}: Sales Autocompletion Linked Business Tables Dataset},\nauthor={Tassilo Klein and Clemens Biehl and Margarida Costa and Andre Sres and Jonas Kolk and Johannes Hoffart},\nbooktitle={NeurIPS 2024 Third Table Representation Learning Workshop},\nyear={2024},\nurl={https://openreview.net/forum?id=UZbELpkWIr}\n}\n```\n\n## Roadmap\n- [x] Integration into [RelBench](https://relbench.stanford.edu/), Feb'25\n- [x] Release dataset\n\n## How to obtain support\n[Create an issue](https://github.com/SAP-samples/SALT/issues) in this repository if you find a bug or have questions about the content.\n \nFor additional support, [ask a question in SAP Community](https://answers.sap.com/questions/ask.html).\n\n## Contributing\nIf you wish to contribute code, offer fixes or improvements, please send a pull request. Due to legal reasons, contributors will be asked to accept a DCO when they create the first pull request to this project. This happens in an automated fashion during the submission process. SAP uses [the standard DCO text of the Linux Foundation](https://developercertificate.org/).\n\n## License\nCopyright (c) 2024 SAP SE or an SAP affiliate company. All rights reserved. This project is licensed under the CC-BY-NC-SA Software License, version 4.0 except as noted otherwise in the [LICENSE](LICENSES/CC-BY-NC-4.0.txt) file.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsap-samples%2Fsalt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsap-samples%2Fsalt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsap-samples%2Fsalt/lists"}