{"id":28512404,"url":"https://github.com/cluebenchmark/math24o","last_synced_at":"2026-03-06T18:03:03.082Z","repository":{"id":283466966,"uuid":"951832202","full_name":"CLUEbenchmark/Math24o","owner":"CLUEbenchmark","description":"Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark ","archived":false,"fork":false,"pushed_at":"2025-03-27T09:00:22.000Z","size":43,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-04T01:36:29.916Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CLUEbenchmark.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-20T10:08:26.000Z","updated_at":"2025-04-03T08:17:09.000Z","dependencies_parsed_at":"2025-07-04T01:31:00.031Z","dependency_job_id":"3df92a4e-eb9c-4786-b12e-4bdb645872e7","html_url":"https://github.com/CLUEbenchmark/Math24o","commit_stats":null,"previous_names":["cluebenchmark/math24o"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CLUEbenchmark/Math24o","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FMath24o","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FMath24o/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FMath24o/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FMath24o/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CLUEbenchmark","download_url":"https://codeload.github.com/CLUEbenchmark/Math24o/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FMath24o/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30189483,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T17:33:53.563Z","status":"ssl_error","status_checked_at":"2026-03-06T17:33:51.678Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-09T00:37:56.000Z","updated_at":"2026-03-06T18:03:03.064Z","avatar_url":"https://github.com/CLUEbenchmark.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Math24o\n\nMath24o benchmarks LLMs on Chinese high school Olympiad math using the 2024 prelims.\n\nMath24o是一个中文题目的数学推理测评基准，用于评估大型语言模型在「高中奥林匹克数学竞赛」级别的数学推理能力。\n\n该测评使用2024年预赛试题，可通过程序自动判断模型答案与参考答案是否一致，以客观评估模型的正确率。\n\n此测评旨在为未来模型研发提供参考，提高模型在复杂数学任务中的可靠性。\n\n\n\n# 获得模型回复及提示词 Prompts\n\nFull Input: \u003cquestion\u003e+\"\\n\" + \u003cspecial_prompt\u003e\n\n指定提示词（Special Prompt Used）：\n\n 请把你的最终答案放在\\boxed{}内，即使用\\boxed{你的最终答案}这个格式，注意\\boxed{}里只能是整数或小数。\n\nSpecial Prompt Used translated as Engish:\n \n Please put your final answer in \\boxed{}, using the format \\boxed{your final answer}. Note that only integers or decimals are allowed inside \\boxed{}.\n\n完整示例（Example）：\n\n 设函数 $$f : \\{1, 2, 3 \\} \\to\\{2, 3, 4 \\}$$ 满足 $$f \\left( f \\left( x \\right)-1 \\right)=f \\left( x \\right)$$ ，则这样的函数有多少个？\n\n 请把你的最终答案放在\\boxed{}内，即使用\\boxed{你的最终答案}这个格式，注意\\boxed{}里只能是整数或小数。\n\n\n# 🏆 主要成绩 Main Result\n\n| 排名 | 模型 | 机构 | 总分 | 使用方式 | 发布日期 |\n|----|--------------------------------|----------|------|----------|----------|\n| 1 | o3-mini(high) | OpenAI | 85.71 |API |2025.03.12 |\n| 2 | Gemini-2.0-Flash-Thinking-Exp-01-21 | Google | 71.43|API | 2025.03.12 |\n| 3 | QwQ-Max-Preview | 阿里云 | 66.67 | 官网 | 2025.03.12 |\n| 3 | QwQ-32B | 阿里云 | 66.67 | 模型|2025.03.12 |\n| 3 | o1 | OpenAI | 66.67 |API | 2025.03.12 |\n| 4 | DeepSeek-R1 | 深度求索 | 57.14 | API | 2025.03.12 |\n| 4 | Claude 3.7 Sonnet | Anthropic | 57.14 |POE| 2025.03.12 |\n\n 注：以上成绩是大模型仅生成一次答案时的正确率。用户可自己结合问题和答案重新进行评估。\n\n# ✨自动化评估 Auto Evaluation\n\n 待所有待测大模型的回答都粘贴在 model_answers 后，保存 model_answers 文件。回到终端，依次发送以下内容：\n\n## 安装所需的 Python 扩展包 Install \n\n pip install -r requirements.txt\n\n## 获取评估结果 Run script\n\n python auto_evaluation.py\n\n此时在终端会返回待测大模型的平均得分。\n\n你也可以在终端发送以下内容来获取每道题目的详细评估结果：\n\n## 打开 output.xlsx（也可以手动打开）\n\n output.xlsx\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcluebenchmark%2Fmath24o","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcluebenchmark%2Fmath24o","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcluebenchmark%2Fmath24o/lists"}