{"id":13564765,"url":"https://github.com/google-deepmind/mathematics_dataset","last_synced_at":"2025-05-14T09:11:53.516Z","repository":{"id":45048864,"uuid":"177970295","full_name":"google-deepmind/mathematics_dataset","owner":"google-deepmind","description":"This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.","archived":false,"fork":false,"pushed_at":"2024-12-23T14:21:10.000Z","size":62,"stargazers_count":1868,"open_issues_count":2,"forks_count":255,"subscribers_count":64,"default_branch":"master","last_synced_at":"2025-04-11T19:13:17.281Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-deepmind.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-03-27T10:23:40.000Z","updated_at":"2025-04-11T08:05:00.000Z","dependencies_parsed_at":"2025-04-11T16:52:36.641Z","dependency_job_id":"5f2fa760-1819-4a07-a8a9-40447f1e7c12","html_url":"https://github.com/google-deepmind/mathematics_dataset","commit_stats":null,"previous_names":["google-deepmind/mathematics_dataset","deepmind/mathematics_dataset"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Fmathematics_dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Fmathematics_dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Fmathematics_dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Fmathematics_dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-deepmind","download_url":"https://codeload.github.com/google-deepmind/mathematics_dataset/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254110374,"owners_count":22016391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:01:35.574Z","updated_at":"2025-05-14T09:11:53.497Z","avatar_url":"https://github.com/google-deepmind.png","language":"Python","funding_links":[],"categories":["Python","A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"# Mathematics Dataset\n\nThis dataset code generates mathematical question and answer pairs, from a range\nof question types at roughly school-level difficulty. This is designed to test\nthe mathematical learning and algebraic reasoning skills of learning models.\n\nOriginal paper: [Analysing Mathematical\nReasoning Abilities of Neural Models](https://openreview.net/pdf?id=H1gR5iR5FX)\n(Saxton, Grefenstette, Hill, Kohli).\n\n## Example questions\n\n```\nQuestion: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.\nAnswer: 4\n\nQuestion: Calculate -841880142.544 + 411127.\nAnswer: -841469015.544\n\nQuestion: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).\nAnswer: 54*a - 30\n\nQuestion: Let e(l) = l - 6. Is 2 a factor of both e(9) and 2?\nAnswer: False\n\nQuestion: Let u(n) = -n**3 - n**2. Let e(c) = -2*c**3 + c. Let l(j) = -118*e(j) + 54*u(j). What is the derivative of l(a)?\nAnswer: 546*a**2 - 108*a - 118\n\nQuestion: Three letters picked without replacement from qqqkkklkqkkk. Give prob of sequence qql.\nAnswer: 1/110\n```\n\n## Pre-generated data\n\n[Pre-generated files](https://console.cloud.google.com/storage/browser/mathematics-dataset)\n\n### Version 1.0\n\nThis is the version released with the original paper. It contains 2 million\n(question, answer) pairs per module, with questions limited to 160 characters in\nlength, and answers to 30 characters in length. Note the training data for each\nquestion type is split into \"train-easy\", \"train-medium\", and \"train-hard\". This\nallows training models via a curriculum. The data can also be mixed together\nuniformly from these training datasets to obtain the results reported in the\npaper. Categories:\n\n* **algebra** (linear equations, polynomial roots, sequences)\n* **arithmetic** (pairwise operations and mixed expressions, surds)\n* **calculus** (differentiation)\n* **comparison** (closest numbers, pairwise comparisons, sorting)\n* **measurement** (conversion, working with time)\n* **numbers** (base conversion, remainders, common divisors and multiples,\n  primality, place value, rounding numbers)\n* **polynomials** (addition, simplification, composition, evaluating, expansion)\n* **probability** (sampling without replacement)\n\n## Getting the source\n\n### PyPI\n\nThe easiest way to get the source is to use pip:\n\n```shell\n$ pip install mathematics_dataset\n```\n\n### From GitHub\n\nAlternately you can get the source by cloning the mathematics_dataset\nrepository:\n\n```shell\n$ git clone https://github.com/deepmind/mathematics_dataset\n$ pip install --upgrade mathematics_dataset/\n```\n\n## Generating examples\n\nGenerated examples can be printed to stdout via the `generate` script. For\nexample:\n\n```shell\npython -m mathematics_dataset.generate --filter=linear_1d\n```\n\nwill generate example (question, answer) pairs for solving linear equations in\none variable.\n\nWe've also included `generate_to_file.py` as an example of how to write the\ngenerated examples to text files. You can use this directly, or adapt it for\nyour generation and training needs.\n\n## Dataset Metadata\nThe following table is necessary for this dataset to be indexed by search\nengines such as \u003ca href=\"https://g.co/datasetsearch\"\u003eGoogle Dataset Search\u003c/a\u003e.\n\u003cdiv itemscope itemtype=\"http://schema.org/Dataset\"\u003e\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eproperty\u003c/th\u003e\n    \u003cth\u003evalue\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ename\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"name\"\u003eMathematics Dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eurl\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"url\"\u003ehttps://github.com/deepmind/mathematics_dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003esameAs\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"sameAs\"\u003ehttps://github.com/deepmind/mathematics_dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003edescription\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"description\"\u003eThis dataset consists of mathematical question and answer pairs, from a range\nof question types at roughly school-level difficulty. This is designed to test\nthe mathematical learning and algebraic reasoning skills of learning models.\\n\n\\n\n## Example questions\\n\n\\n\n```\\n\nQuestion: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.\\n\nAnswer: 4\\n\n\\n\nQuestion: Calculate -841880142.544 + 411127.\\n\nAnswer: -841469015.544\\n\n\\n\nQuestion: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).\\n\nAnswer: 54*a - 30\\n\n```\\n\n\\n\nIt contains 2 million\n(question, answer) pairs per module, with questions limited to 160 characters in\nlength, and answers to 30 characters in length. Note the training data for each\nquestion type is split into \"train-easy\", \"train-medium\", and \"train-hard\". This\nallows training models via a curriculum. The data can also be mixed together\nuniformly from these training datasets to obtain the results reported in the\npaper. Categories:\\n\n\\n\n* **algebra** (linear equations, polynomial roots, sequences)\\n\n* **arithmetic** (pairwise operations and mixed expressions, surds)\\n\n* **calculus** (differentiation)\\n\n* **comparison** (closest numbers, pairwise comparisons, sorting)\\n\n* **measurement** (conversion, working with time)\\n\n* **numbers** (base conversion, remainders, common divisors and multiples,\\n\n  primality, place value, rounding numbers)\\n\n* **polynomials** (addition, simplification, composition, evaluating, expansion)\\n\n* **probability** (sampling without replacement)\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eprovider\u003c/td\u003e\n    \u003ctd\u003e\n      \u003cdiv itemscope itemtype=\"http://schema.org/Organization\" itemprop=\"provider\"\u003e\n        \u003ctable\u003e\n          \u003ctr\u003e\n            \u003cth\u003eproperty\u003c/th\u003e\n            \u003cth\u003evalue\u003c/th\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003ename\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"name\"\u003eDeepMind\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003esameAs\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"sameAs\"\u003ehttps://en.wikipedia.org/wiki/DeepMind\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n        \u003c/table\u003e\n      \u003c/div\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ecitation\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"citation\"\u003ehttps://identifiers.org/arxiv:1904.01557\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-deepmind%2Fmathematics_dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-deepmind%2Fmathematics_dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-deepmind%2Fmathematics_dataset/lists"}