{"id":13653150,"url":"https://github.com/Alpha-Innovator/ChartVLM","last_synced_at":"2025-04-23T06:31:17.316Z","repository":{"id":223318034,"uuid":"749755049","full_name":"Alpha-Innovator/ChartVLM","owner":"Alpha-Innovator","description":"Official Repository of ChartX \u0026 ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning","archived":false,"fork":false,"pushed_at":"2024-09-26T06:43:22.000Z","size":4211,"stargazers_count":213,"open_issues_count":6,"forks_count":19,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-04-08T03:17:11.008Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Alpha-Innovator.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-29T10:31:30.000Z","updated_at":"2025-03-22T12:04:44.000Z","dependencies_parsed_at":"2024-06-02T05:23:44.378Z","dependency_job_id":"ef545cca-fce3-482f-8fe8-87a012dd8410","html_url":"https://github.com/Alpha-Innovator/ChartVLM","commit_stats":null,"previous_names":["unimodal4reasoning/chartvlm","alpha-innovator/chartvlm"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alpha-Innovator%2FChartVLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alpha-Innovator%2FChartVLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alpha-Innovator%2FChartVLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Alpha-Innovator%2FChartVLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Alpha-Innovator","download_url":"https://codeload.github.com/Alpha-Innovator/ChartVLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250384888,"owners_count":21421811,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T02:01:06.483Z","updated_at":"2025-04-23T06:31:12.298Z","avatar_url":"https://github.com/Alpha-Innovator.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003eChartX \u0026 ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning\u003c/h1\u003e\n\n[[ Related Paper ]](https://arxiv.org/abs/2402.12185) [[ Website ]](https://unimodal4reasoning.github.io/DocGenome_page/) [[ Dataset (Google Drive)]](https://drive.google.com/file/d/1d6zyH3kIwgepTqR0fc67xzyUtblrvOIX/view) [[ Dataset (Hugging Face) ]](https://huggingface.co/datasets/U4R/ChartX/viewer)\n\n[[Models 🤗(Hugging Face)]](https://huggingface.co/U4R/ChartVLM-base)\n\u003c/div\u003e\n\n# ChartX \u0026 ChartVLM\nRecently, many versatile Multi-modal Large Language Models (MLLMs) have emerged continuously. However, their capacity to query information depicted in visual charts and engage in reasoning based on the queried contents remains under-explored. In this paper, to comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in the chart domain, we construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data. Besides, we develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns such as reasoning tasks in the field of charts or geometric images. We evaluate the chart-related ability of mainstream MLLMs and our ChartVLM on the proposed ChartX evaluation set. Extensive experiments demonstrate that ChartVLM surpasses both versatile and chart-related large models, achieving results comparable to GPT-4V. We believe that our study can pave the way for further exploration in creating a more comprehensive chart evaluation set and developing more interpretable multi-modal models.\n\n## Release\n- **Structuring Chart-oriented Repre- sentation Metric (SCRM)**: You could refer to [Evaluation](https://github.com/UniModal4Reasoning/ChartVLM/blob/1dfda1372c888e98c197b5873dcc6e3aaa13cf39/eval/README.md?plain=1#L27) and [eval_SE_ChartX.py](https://github.com/UniModal4Reasoning/ChartVLM/blob/main/eval/eval_SE_ChartX.py) for evaluating the chart-related data extraction ability of VLMS. Please see [StructChart](https://arxiv.org/abs/2309.11268) for more technical details of SCRM evaluation metric.\n\n- [2024/2/21] 🔥 We have released the ChartX benchmark [data](https://drive.google.com/file/d/1d6zyH3kIwgepTqR0fc67xzyUtblrvOIX/view) to evaluate the chart-related capabilities of the existing MLLMS. We divide the entire ChartX benchmark into 4,848 validation samples, [ChartX_annotation_val.json](https://drive.google.com/file/d/13jwSO8kaAnbPujECQK9x2QA_TXByzzYH/view?usp=sharing) and 1,152 test samples, [ChartX_annotation_test.json](https://drive.google.com/file/d/1kOEi5Kca7WnFhBGyJlBIEtlgaIk004o0/view?usp=sharing). Results reported in Table 2 to 5 are evaluated on test samples. The evaluation log is shown [here](eval/eval_result_SE_on_ChartX.log).\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"assets/motivation.png\" height=\"85%\"\u003e\n\u003c/div\u003e\n\n------------------------\n\n\u003cdiv align=\"center\"\u003e\n\u003ch1\u003eChartX Evaluation Set\u003cbr\u003e\u003c/h1\u003e\n\u003c/div\u003e\n\n## Overall\nWe collected 48K multi-modal chart data covering **22 topics**, **18 chart types**, and **7 tasks**. Each chart data within this dataset includes four modalities: image, CSV, python code, and text description. \n \n\n\u003cdetails\u003e\n\u003csummary\u003e 18 chart types:\u003c/summary\u003e\n\nGeneral Chart Types = ['bar chart', 'bar_num chart', 'line chart', 'line_num chart', 'pie chart'],\n\nFine-grained Chart Types = ['radar chart', 'histogram', 'box plot', 'treemap', 'bubble chart', 'area chart', '3D-bar chart', 'multi-axes', 'ring chart', 'rose chart'],\n\nDomain-specific Chart Types=['heatmap', 'candlestick chart', 'funnel chart']\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e 22 chart topics:\u003c/summary\u003e\n\nmajor_categories = [\n\"Business and Finance\",\n\"Healthcare and Health\",\n\"Science and Engineering\",\n\"Social Media and the Web\",\n\"Government and Public Policy\",\n\"Education and Academics\",\n\"Environment and Sustainability\",\n\"Arts and Culture\",\n\"Retail and E-commerce\",\n\"Tourism and Hospitality\",\n\"Human Resources and Employee Management\",\n\"Agriculture and Food Production\",\n\"Energy and Utilities\",\n\"Transportation and Logistics\",\n\"Real Estate and Housing Market\",\n\"Manufacturing and Production\",\n\"Sports and Entertainment\",\n\"Social Sciences and Humanities\",\n\"Law and Legal Affairs\",\n\"Technology and the Internet\",\n\"Charity and Nonprofit Organizations\",\n\"Food and Beverage Industry\"\n]\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e 7 chart tasks (Employed eval metric):\u003c/summary\u003e\n\n4 close-ended = ['Structural Extraction (SCRM)', 'Chart Type (EM)', 'Chart Title (EM)', 'QA (GPT-acc)']\n\n3 open-ended = ['Description (GPT-score)', 'Summarization (GPT-score)', 'Redrawing code (GPT-score)']\n\n\u003c/details\u003e\n\n## ChartX Download\n\n\u003cdetails\u003e\n\u003csummary\u003e Data Download\u003c/summary\u003e\n\nPlease download the official [ChartX Evaluation Set](https://drive.google.com/file/d/1d6zyH3kIwgepTqR0fc67xzyUtblrvOIX/view?usp=sharing) dataset and organize the downloaded files as follows:\n```\nChartX\n├── 3D-Bar\n│   ├── code\n|   ├── csv\n|   ├── png\n|   ├── txt\n├── area_chart\n│   ├── code\n|   ├── csv\n|   ├── png\n|   ├── txt\n....\n....\n├── rose\n│   ├── code\n|   ├── csv\n|   ├── png\n|   ├── txt\n```\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e Visualization of Data Distribution\u003c/summary\u003e\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"assets/tsne.png\" height=\"85%\"\u003e\n\u003c/div\u003e\n\n\u003c/details\u003e\n\n\n------------------------\n\n\u003cdiv align=\"center\"\u003e\n\u003ch1\u003eChartVLM\u003cbr\u003e\u003c/h1\u003e\n\u003c/div\u003e\n\n\n## ChartVLM Overall: \n- **(1)** To enhance the interpretability of the chart model in cognition tasks (e.g. answer questions based on chart image), ChartVLM first performs the base perception task (e.g. structural extraction from the given chart image to a predicted CSV data), and then, finishes other cognition tasks (e.g. chart redrawing, description, summary, and QA) based on the extracted structural data. \n- **(2)** To choose the task that users expect to perform according to the used prompts, the instruction adapter is designed, which can cover a variety of user instructions as illustrated in this figure.\n\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"assets/chartvlm.png\" height=\"85%\"\u003e\n\u003c/div\u003e\n\n\n## Installation for ChartVLM\n* Clone this repository.\n    ```shell\n    git clone https://github.com/UniModal4Reasoning/ChartVLM.git\n    ```\n* Install the python dependent libraries.\n    ```shell\n    pip install -r requirements.txt \n    ```\n\n## Pre-trained Checkpoints of ChartVLM\nPlease refer to Huggingface to download our pre-trained weights for [ChartVLM-large](https://huggingface.co/U4R/ChartVLM-large) and [ChartVLM-base](https://huggingface.co/U4R/ChartVLM-base).\n\n\u003cdetails\u003e\n\u003csummary\u003eYou need to organize the downloaded ckpts as follow:\u003c/summary\u003e\n\n```\nCharVLM-base (or your customized name)\n├── instruction_adapter\n│   ├── mlp_classifier.pth\n|   ├── vectorizer.pkl\n├── base_decoder\n│   ├── type_title\n│   │   ├── files of type_title base_decoder\n│   ├── files of base_decoder\n├── auxiliary_decoder\n│   ├── base\n│   │   ├── files of pretrained auxiliary_decoder\n│   ├── files of auxiliary_decoder lora_weights \n```\n\u003c/details\u003e\n\n## Training ChartVLM\nPlease refer to [instruction adapter](adapter/README.md), [base decoder](base_decoder/README.md), and [auxiliary decoder](auxiliary_decoder/README.md) for more details of model training.\n\n## Evaluation\nPlease refer to [eval](eval/README.md) for details of evaluation all tasks\n\n\n\u003cdetails\u003e\n\u003csummary\u003e Evaluation Results for Structural Extraction (SE) task\u003c/summary\u003e\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"assets/radar_se.png\" height=\"650\"\u003e\n\u003c/div\u003e\n\n\u003c/details\u003e\n\n\n\n\u003cdetails\u003e\n\u003csummary\u003e Evaluation Results for QA task\u003c/summary\u003e\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"assets/radar_qa.png\" height=\"650\"\u003e\n\u003c/div\u003e\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e Evaluation Results for Description task\u003c/summary\u003e\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"assets/radar_desc.png\" height=\"650\"\u003e\n\u003c/div\u003e\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003e Evaluation Results for Summarization task\u003c/summary\u003e\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"assets/radar_summ.png\" height=\"650\"\u003e\n\u003c/div\u003e\n\n\u003c/details\u003e\n\n## Citation\nIf you find our work useful in your research, please consider citing Fox:\n```bibtex\n@article{xia2024chartx,\n  title={ChartX \\\u0026 ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning},\n  author={Xia, Renqiu and Zhang, Bo and Ye, Hancheng and Yan, Xiangchao and Liu, Qi and Zhou, Hongbin and Chen, Zijun and Dou, Min and Shi, Botian and Yan, Junchi and others},\n  journal={arXiv preprint arXiv:2402.12185},\n  year={2024}\n}\n```","funding_links":[],"categories":["Datasets-or-Benchmark"],"sub_categories":["多模态-跨模态"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlpha-Innovator%2FChartVLM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAlpha-Innovator%2FChartVLM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAlpha-Innovator%2FChartVLM/lists"}