{"id":31365722,"url":"https://github.com/peter-gy/autovistype","last_synced_at":"2026-05-02T20:36:07.097Z","repository":{"id":310816123,"uuid":"993651228","full_name":"peter-gy/AutoVisType","owner":"peter-gy","description":"Probing vision-language model alignment with human expert visual grouping over stratified sample of VIS30K dataset.","archived":false,"fork":false,"pushed_at":"2025-09-26T10:55:46.000Z","size":15107,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-26T12:29:21.021Z","etag":null,"topics":["data-visualization","google-genai","langchain","llm-benchmarking","marimo","meta-llama","mistral","multi-label-classification","openai","polars","qwen","uv","vis30k","vision-language-model","visual-stimuli","visualization-categorization","vlm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/peter-gy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-31T08:06:37.000Z","updated_at":"2025-09-26T10:55:49.000Z","dependencies_parsed_at":"2025-09-26T12:19:58.722Z","dependency_job_id":"3d53d03c-565f-40b6-9352-cd4cd916fa6d","html_url":"https://github.com/peter-gy/AutoVisType","commit_stats":null,"previous_names":["peter-gy/autovistype"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/peter-gy/AutoVisType","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peter-gy%2FAutoVisType","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peter-gy%2FAutoVisType/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peter-gy%2FAutoVisType/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peter-gy%2FAutoVisType/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/peter-gy","download_url":"https://codeload.github.com/peter-gy/AutoVisType/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peter-gy%2FAutoVisType/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32549383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-02T19:18:06.202Z","status":"ssl_error","status_checked_at":"2026-05-02T19:16:21.335Z","response_time":132,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","google-genai","langchain","llm-benchmarking","marimo","meta-llama","mistral","multi-label-classification","openai","polars","qwen","uv","vis30k","vision-language-model","visual-stimuli","visualization-categorization","vlm"],"created_at":"2025-09-27T09:48:23.132Z","updated_at":"2026-05-02T20:36:07.063Z","avatar_url":"https://github.com/peter-gy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![arXiv](https://img.shields.io/badge/arXiv-2509.05718-b31b1b.svg)](https://arxiv.org/abs/2509.05718)\n[![Open in molab](https://molab.marimo.io/molab-shield.png)](https://molab.marimo.io/notebooks/nb_P78Fecf4gZkYE4MCKXcanW)\n\n[![Poster](assets/poster.png)](./assets/poster.pdf)\n\nThis repository implements the research described in our submission to the IEEE VIS 25 poster track. The project investigates whether state-of-the-art Vision-Language Models (VLMs) can align with human-centric, stimuli-based categorization of data visualizations.\n\n---\n\n## Overview\n\nUnlike previous work that focuses on task-based data interpretation, this research probes whether VLMs can categorize visualizations based purely on their **essential visual stimuli** as perceived by human experts, independent of specific data interpretation tasks. We evaluate VLMs against [Chen et al.'s image-based typology](https://arxiv.org/abs/2403.05594) derived from expert analysis of the [VIS30K dataset](https://ieeexplore.ieee.org/abstract/document/9337213).\n\n## Key Research Questions\n\n- Can VLMs approximate human cognitive processes in visualization categorization?\n- How well do VLMs grasp the \"essential stimuli\" that drive human expert categorization?\n- What are the current limitations of strictly stimuli-based AI visual understanding in the visualization domain?\n\n## Implementation\n\nThe research methodology is implemented using two interactive [Marimo](https://marimo.io/) notebooks.\n\n### 📊 `src/inference.py`\n\nInteractive notebook for running VLM inference on visualization images:\n\n- Loads and samples the VIS30K dataset (stratified sampling of 305 images)\n- Implements zero-shot categorization using structured prompts\n- Supports concurrent processing with rate limiting and caching\n- Outputs structured predictions for purpose, encoding, and dimensionality\n\n### 📈 `src/evaluation.py`\n\nInteractive notebook for comprehensive evaluation and analysis:\n\n- Computes multi-label classification metrics (Accuracy, Hamming Loss, Jaccard Score, Precision/Recall/F1)\n- Generates confusion matrices and performance visualizations\n- Provides interactive exploration of results by model, feature, and difficulty\n- Constructs comparative analysis across all evaluated models\n\n## Evaluated Vision-Language Models (13 Total)\n\n### Google GenAI\n\n- `gemini-2.0-flash`\n- `gemini-2.5-flash-preview-05-20`\n- `gemini-2.5-pro-preview-05-06`\n\n### OpenAI\n\n- `gpt-4.1`\n- `gpt-4.1-mini`\n- `gpt-4.1-nano`\n- `o4-mini`\n\n### Meta LLaMA (via OpenRouter)\n\n- `llama-4-scout`\n- `llama-4-maverick`\n\n### Mistral AI (via OpenRouter)\n\n- `mistral-small-3.1-24b-instruct`\n- `mistral-medium-3`\n- `pixtral-large-2411`\n\n### Qwen (via OpenRouter)\n\n- `qwen2.5-vl-32b-instruct`\n\n## Dataset \u0026 Evaluation Framework\n\n- **Dataset**: [VIS30K](https://github.com/VisImageNavigator/VisImageNavigator.github.io/blob/54ab2319cca6a9e9056ce9cb5a337e920711b15e/public/dataset/vispubData30_updated_07112024.csv) with expert annotations (6,803 images)\n- **Sample**: Stratified sample of 305 images across encoding types, dimensionalities, and difficulty levels\n- **Features Evaluated**:\n  - **Purpose**: `gui`, `schematic`, `vis`\n  - **Encoding**: Various encoding types (bar, line, scatter, etc.)\n  - **Dimensionality**: 2D, 3D, others\n- **Setting**: Zero-shot evaluation with structured JSON output\n\n## Key Findings\n\n- **Purpose Identification**: VLMs achieve reasonable accuracy ($\u003e0.7$) for high-level categorization\n- **Dimensionality**: Performance varies with complexity, showing challenges with nuanced spatial reasoning\n- **Encoding Recognition**: Most challenging task for all VLMs ($\u003c0.4$ accuracy), highlighting the difficulty of discerning fine-grained visual stimuli\n- **Difficulty Impact**: Performance decreases with expert-assessed image complexity across all models\n\n## Getting Started\n\n1. **Install Dependencies**:\n\n```bash\nuv sync\n```\n\n2. **Set up API Keys**:\n\nCopy the example environment variables file and fill in the missing values:\n\n```bash\ncp .env.example .env\n```\n\n3. **Run Inference Notebook**:\n\n```bash\nuv run marimo run src/inference.py\n```\n\n4. **Run Evaluation Notebook**:\n\n```bash\nuv run marimo run src/evaluation.py\n```\n\n## Research Implications\n\nThis work is a precursor to a more comprehensive study that will provide insights for:\n\n- **AI Development**: Understanding current VLM limitations in abstract visual reasoning\n- **Human-AI Collaboration**: Informing the design of visualization tools that leverage human perceptual strengths\n- **Visualization Research**: Establishing benchmarks for AI alignment with human-centric frameworks\n\n## Future Work\n\n- One-shot and few-shot prompting experiments\n- Full VIS30K dataset evaluation\n- Model uncertainty quantification\n- Parameter sensitivity analysis\n- Determinism evaluation across multiple runs\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeter-gy%2Fautovistype","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpeter-gy%2Fautovistype","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeter-gy%2Fautovistype/lists"}