{"id":28511255,"url":"https://github.com/esborisova/tableeval-study","last_synced_at":"2026-01-31T16:02:33.104Z","repository":{"id":294413667,"uuid":"848309252","full_name":"esborisova/TableEval-Study","owner":"esborisova","description":"Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data","archived":false,"fork":false,"pushed_at":"2025-07-25T07:06:30.000Z","size":73998,"stargazers_count":2,"open_issues_count":3,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-07-25T12:54:34.837Z","etag":null,"topics":["benchmarking","multimodal-large-language-models","table-understanding"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/esborisova.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-27T14:25:10.000Z","updated_at":"2025-07-25T07:06:33.000Z","dependencies_parsed_at":"2025-07-25T09:19:53.336Z","dependency_job_id":null,"html_url":"https://github.com/esborisova/TableEval-Study","commit_stats":null,"previous_names":["esborisova/tableeval-study"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/esborisova/TableEval-Study","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esborisova%2FTableEval-Study","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esborisova%2FTableEval-Study/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esborisova%2FTableEval-Study/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esborisova%2FTableEval-Study/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/esborisova","download_url":"https://codeload.github.com/esborisova/TableEval-Study/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esborisova%2FTableEval-Study/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28947567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T14:26:55.697Z","status":"ssl_error","status_checked_at":"2026-01-31T14:26:52.545Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarking","multimodal-large-language-models","table-understanding"],"created_at":"2025-06-08T23:38:25.050Z","updated_at":"2026-01-31T16:02:33.099Z","avatar_url":"https://github.com/esborisova.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Introduction\nThis repository contains code for the paper [Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data](https://aclanthology.org/2025.trl-1.10/).\n\nWe investigate the effectiveness of both *text-based* and *multimodal* LLMs on table understanding tasks through a cross-domain and cross-modality evaluation. Specifically, we compare their performance on tables from *scientific* vs. *non-scientific* contexts and examine their robustness on tables represented as *images* vs. *text*. Additionally, we conduct an interpretability analysis to measure context usage and input relevance. We also introduce the **TableEval** benchmark, comprising **3017** tables from scholarly publications, Wikipedia, and financial reports, where each table is provided in five different formats: **Image**, **Dictionary**, **HTML**, **XML**, and **LaTeX**. For more details, please, refer to the paper.\n\n# TableEval dataset\n\nTableEval corpus is developed for benchmarking (M)LLMs performance across different table modalities. It contains six data subsets, comprising 3017 tables and 11312 instances in total. Tables are available as PNG images and in four textual formats including HTML, XML, LaTeX, and Dictionary (Dict). All task annotations are taken from the source datasets. \n\n**The dataset can be dowloaded from Hugging Face 🤗:** https://huggingface.co/datasets/katebor/TableEval\n\n# Models\n\n| Model                    |   🤗 HF checkpoint        | Size (B)          | Vision        | \n|------------------------- |---------------------------|-------------------|---------------|\n|  Gemini-2.0-Flash        |   --                      |   --              |    ✅         |   \n|  LLaVa-NeXT              | llama3-llava-next-8b-hf   |   8               |    ✅         |         \n|  Qwen2.5-VL              |  Qwen2.5-VL-3B-Instruct   |  3                |    ✅         |   \n|                          |Qwen2.5-VL-7B-Instruct     |  7                |    ✅         |   \n|  Idefics3                |   Idefics3-8B-Llama3      |  8                |    ✅         |   \n| Llama-3                  |    Llama-3.2-3B-Instruct  |  3                |    ❌         |    \n| Qwen2.5                  |  Qwen2.5-3B-Instruct      |    3              |    ❌         |    \n|                          |   Qwen2.5-14B-Instruct    |    14             |    ❌         |    \n| Mistral-Nemo             |Mistral-Nemo-Instruct-2407 |  12               |    ❌         | \n\n# Interpretability \n\nThe code, instructions, and examples of silency maps are avaialble [here](https://github.com/esborisova/Table-Understanding-Evaluation-Study/tree/main/explanations).\n\n# Evaluation pipeline\n\nAll instructions on how to run the evaluation are provided in this [README.md](https://github.com/esborisova/Table-Understanding-Evaluation-Study/tree/main/src/evaluation) file.\n\n# Repository structure\n```\n    ├── src               \n    │   ├── application    # data preparation scripts       \n    │   ├── evaluation     # evaluation pipeline and code for running intepretability tools\n    │   ├── utils          # functions used for data preparation      \n    └──  explanations      # intepretability analysis results                    \n```\n# Citation\n```bibtex\n@inproceedings{borisova-etal-2025-table,\n    title = \"Table Understanding and (Multimodal) {LLM}s: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data\",\n    author = {Borisova, Ekaterina  and\n      Barth, Fabio  and\n      Feldhus, Nils  and\n      Abu Ahmad, Raia  and\n      Ostendorff, Malte  and\n      Ortiz Suarez, Pedro  and\n      Rehm, Georg  and\n      M{\\\"o}ller, Sebastian},\n    editor = \"Chang, Shuaichen  and\n      Hulsebos, Madelon  and\n      Liu, Qian  and\n      Chen, Wenhu  and\n      Sun, Huan\",\n    booktitle = \"Proceedings of the 4th Table Representation Learning Workshop\",\n    month = jul,\n    year = \"2025\",\n    address = \"Vienna, Austria\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2025.trl-1.10/\",\n    pages = \"109--142\",\n    ISBN = \"979-8-89176-268-8\",\n    abstract = \"Tables are among the most widely used tools for representing structured data in research, business, medicine, and education. Although LLMs demonstrate strong performance in downstream tasks, their efficiency in processing tabular data remains underexplored. In this paper, we investigate the effectiveness of both text-based and multimodal LLMs on table understanding tasks through a cross-domain and cross-modality evaluation. Specifically, we compare their performance on tables from scientific vs. non-scientific contexts and examine their robustness on tables represented as images vs. text. Additionally, we conduct an interpretability analysis to measure context usage and input relevance. We also introduce the TableEval benchmark, comprising 3017 tables from scholarly publications, Wikipedia, and financial reports, where each table is provided in five different formats: Image, Dictionary, HTML, XML, and LaTeX. Our findings indicate that while LLMs maintain robustness across table modalities, they face significant challenges when processing scientific tables.\"\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fesborisova%2Ftableeval-study","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fesborisova%2Ftableeval-study","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fesborisova%2Ftableeval-study/lists"}