{"id":28653879,"url":"https://github.com/tiger-ai-lab/structeval","last_synced_at":"2025-07-05T18:06:44.375Z","repository":{"id":295357005,"uuid":"911787354","full_name":"TIGER-AI-Lab/StructEval","owner":"TIGER-AI-Lab","description":null,"archived":false,"fork":false,"pushed_at":"2025-05-25T04:19:45.000Z","size":545053,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-25T05:43:47.797Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TIGER-AI-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-03T21:12:05.000Z","updated_at":"2025-05-25T03:04:25.000Z","dependencies_parsed_at":"2025-05-25T05:54:30.980Z","dependency_job_id":null,"html_url":"https://github.com/TIGER-AI-Lab/StructEval","commit_stats":null,"previous_names":["tiger-ai-lab/structeval"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TIGER-AI-Lab/StructEval","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FStructEval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FStructEval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FStructEval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FStructEval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TIGER-AI-Lab","download_url":"https://codeload.github.com/TIGER-AI-Lab/StructEval/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TIGER-AI-Lab%2FStructEval/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259599331,"owners_count":22882357,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-13T07:07:58.580Z","updated_at":"2025-06-13T07:07:59.135Z","avatar_url":"https://github.com/TIGER-AI-Lab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# StructEval\n\nStructEval is a framework for evaluating language models on structured outputs, supporting rendering and evaluation of generated code.\n\n## Installation\n\n### Installation with conda\n\n```bash\n# Create and activate the conda environment from the environment.yml file\nconda create -n structeval python=3.12\nconda activate structeval\n\n# Install all required packages(required)\npip install -r requirements.txt\n\n# Separately install the llm-engines(required for inference)\npip install git+https://github.com/jdf-prog/LLM-Engines.git\n\n# Install playwright browsers (required for rendering)\nplaywright install\n\n# You could also install the package in development mode(optional)\npip install -e .\n```\n\n### System Dependencies(Optional to read)\n\nThe following system packages will be installed automatically through conda:\n- `ghostscript` and `poppler`: Required for PDF processing\n- `nodejs`: Required for Playwright\n- `graphviz`: Required for visualization\n- `imagemagick`: Required for image processing\n\nIf you encounter any issues with system dependencies, you can install them manually using your system's package manager:\n\n```bash\n# Ubuntu/Debian\nsudo apt-get update\nsudo apt-get install ghostscript poppler-utils nodejs graphviz imagemagick\n\n# CentOS/RHEL\nsudo yum install ghostscript poppler nodejs graphviz ImageMagick\n\n# macOS\nbrew install ghostscript poppler node graphviz imagemagick\n```\n\n## CLI Usage\n\nStructEval provides a command-line interface for running inference, rendering, and evaluation.\n\n### Basic Commands\n\n```bash\n# Run inference\npython -m structeval.cli inference \\\n    --llm_model_name \"model_name\" \\\n    --llm_engine \"engine_name\" \\\n    --input_path \"path/to/input.json\" \\\n    --output_path \"path/to/output.json\"\n\n# Render outputs\npython -m structeval.cli render \\\n    --input_path \"path/to/inference_output.json\" \\\n    --img_output_path \"path/to/rendered_images\" \\\n    --non_renderable_output_dir \"path/to/non_renderable_files\"\n\n# Run evaluation\npython -m structeval.cli evaluate \\\n    --vlm_model_name \"model_name\" \\\n    --vlm_engine \"engine_name\" \\\n    --input_path \"path/to/inference_output.json\" \\\n    --output_path \"path/to/evaluation_output.json\" \\\n    --img_path \"path/to/rendered_images\" \\\n    --non_renderable_output_dir \"path/to/non_renderable_files\"\n```\n\n### Command Parameters\n\n#### Inference\n- `--llm_model_name`: Name of the language model (e.g., \"meta-llama/Llama-3.1-8B-Instruct\", \"gpt-4.1-mini\")\n- `--llm_engine`: Engine for running inference (e.g., \"vllm\", \"openai\")\n- `--input_path`: Path to the input dataset JSON file\n- `--output_path`: Path to save inference results\n\n#### Render\n- `--input_path`: Path to the inference output JSON\n- `--img_output_path`: Directory to save rendered images\n- `--non_renderable_output_dir`: Directory to save non-renderable outputs\n\n#### Evaluate\n- `--vlm_model_name`: Name of the vision language model for evaluation (e.g., \"gpt-4.1-mini\")\n- `--vlm_engine`: Engine for evaluation (e.g., \"openai\")\n- `--input_path`: Path to the inference output JSON\n- `--output_path`: Path to save evaluation results\n- `--img_path`: Path to the directory containing rendered images\n- `--non_renderable_output_dir`: Directory containing non-renderable outputs\n\n## Helper Scripts\n\nThe repository includes helper scripts for running full experiments:\n\n### run_inference.py\nRun inference across multiple models:\n\n```bash\npython -m structeval.run_inference\n```\n\n### run_render.py\nRender outputs from inference results:\n\n```bash\npython -m structeval.run_render\n```\n\n### run_evaluation.py\nEvaluate rendered outputs:\n\n```bash\npython -m structeval.run_evaluation\n```\n\nThese scripts can be configured by editing the model lists and file paths within them.\n\n## Input Format\n\nThe input JSON should be an array of task objects with the following structure:\n\n```json\n[\n  {\n    \"task_id\": \"000500\",\n    \"query\": \"Please output JSON code:\\n\\nTask:\\n...\",\n    \"feature_requirements\": \"\",\n    \"task_name\": \"Text to JSON\",\n    \"input_type\": \"Text\",\n    \"output_type\": \"JSON\",\n    \"query_example\": \"\",\n    \"VQA\": [],\n    \"raw_output_metric\": [\n      \"novel.title\",\n      \"novel.author.name\",\n      \"novel.characters[0].name\"\n    ],\n    \"rendering\": false\n  }\n]\n```\n\n### Input Fields:\n- `task_id`: Unique identifier for the task\n- `query`: The prompt sent to the model\n- `task_name`: Name of the task (e.g., \"Text to JSON\", \"Text to Angular\")\n- `input_type`: Type of input (e.g., \"Text\")\n- `output_type`: Expected output format (e.g., \"JSON\", \"Angular\")\n- `VQA`: Array of visual question-answer pairs for evaluating renderable outputs\n- `raw_output_metric`: Keys or elements to check in the output\n- `rendering`: Boolean indicating if the output should be rendered visually\n\n## Output Format\n\n### Inference Output\nThe inference process adds a `generation` field to each task in the input:\n\n```json\n[\n  {\n    \"task_id\": \"000500\",\n    \"query\": \"Please output JSON code:\\n\\nTask:\\n...\",\n    \"feature_requirements\": \"\",\n    \"task_name\": \"Text to JSON\",\n    \"input_type\": \"Text\",\n    \"output_type\": \"JSON\",\n    \"query_example\": \"\",\n    \"VQA\": [],\n    \"raw_output_metric\": [...],\n    \"rendering\": false,\n    \"generation\": \"```json\\n{\\n  \\\"novel\\\": {\\n    \\\"title\\\": \\\"The Obsidian Labyrinth\\\",\\n    \\\"author\\\": {\\n      \\\"name\\\": \\\"Anya Petrova\\\",\\n      \\\"birth_year\\\": 1978\\n    },\\n    ...\\n  }\\n}\\n```\"\n  }\n]\n```\n\n### Evaluation Output\nThe evaluation result contains additional scoring fields:\n\n```json\n[\n  {\n    \"task_id\": \"000500\",\n    \"query\": \"Please output JSON code:\\n\\nTask:\\n...\",\n    \"feature_requirements\": \"\",\n    \"task_name\": \"Text to JSON\",\n    \"input_type\": \"Text\",\n    \"output_type\": \"JSON\",\n    \"VQA\": [],\n    \"raw_output_metric\": [...],\n    \"rendering\": false,\n    \"generation\": \"...\",\n    \"output_file\": \"experiment_results/model-name/non_renderable_files/000500.json\",\n    \"render_score\": 1,\n    \"VQA_score\": null,\n    \"key_validation_score\": 1.0,\n    \"final_eval_score\": 1.0\n  }\n]\n```\n\n### Evaluation Fields:\n- `output_file`: Path to the rendered output or extracted JSON file\n- `render_score`: Score indicating if the output was rendered successfully (0 or 1)\n- `VQA_score`: Score from visual question-answering evaluation (for renderable outputs)\n- `key_validation_score`: Score from validating expected keys in JSON output (for non-renderable outputs)\n- `raw_output_eval`: Array of boolean values indicating whether each raw output metric was satisfied\n- `raw_output_score`: Score from the raw output evaluation\n- `final_eval_score`: Overall evaluation score between 0 and 1","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiger-ai-lab%2Fstructeval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftiger-ai-lab%2Fstructeval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftiger-ai-lab%2Fstructeval/lists"}