{"id":13456802,"url":"https://github.com/google-deepmind/code_contests","last_synced_at":"2025-04-07T21:14:24.665Z","repository":{"id":44810129,"uuid":"453950487","full_name":"google-deepmind/code_contests","owner":"google-deepmind","description":null,"archived":false,"fork":false,"pushed_at":"2023-10-03T13:17:45.000Z","size":73,"stargazers_count":2096,"open_issues_count":16,"forks_count":206,"subscribers_count":38,"default_branch":"main","last_synced_at":"2024-11-12T18:02:34.376Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-deepmind.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-01-31T09:48:14.000Z","updated_at":"2024-11-11T09:12:02.000Z","dependencies_parsed_at":"2024-01-13T16:23:32.859Z","dependency_job_id":"8c9435e0-b718-4ef2-99f9-b35517f1d85e","html_url":"https://github.com/google-deepmind/code_contests","commit_stats":null,"previous_names":["google-deepmind/code_contests","deepmind/code_contests"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Fcode_contests","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Fcode_contests/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Fcode_contests/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Fcode_contests/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-deepmind","download_url":"https://codeload.github.com/google-deepmind/code_contests/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247730069,"owners_count":20986404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T08:01:28.024Z","updated_at":"2025-04-07T21:14:24.641Z","avatar_url":"https://github.com/google-deepmind.png","language":"C++","funding_links":[],"categories":["Datasets \u0026 Benchmarks","A01_文本生成_文本对话","C++"],"sub_categories":["大语言对话模型及数据"],"readme":"# CodeContests\n\nCodeContests is a competitive programming dataset for machine-learning. This\ndataset was used when training\n[AlphaCode](https://deepmind.com/blog/article/Competitive-programming-with-AlphaCode). AlphaCode has been published in [Science](https://www.science.org/doi/10.1126/science.abq1158), with a preprint on [arXiv](https://arxiv.org/abs/2203.07814).\n\nIt consists of programming problems, from a variety of sources:\n\nSite        | URL                         | Source\n----------- | --------------------------- | ------\nAizu        | https://judge.u-aizu.ac.jp  | [CodeNet](https://github.com/IBM/Project_CodeNet)\nAtCoder     | https://atcoder.jp          | [CodeNet](https://github.com/IBM/Project_CodeNet)\nCodeChef    | https://www.codechef.com    | [description2code](https://github.com/ethancaballero/description2code)\nCodeforces  | https://codeforces.com      | [description2code](https://github.com/ethancaballero/description2code) and Codeforces\nHackerEarth | https://www.hackerearth.com | [description2code](https://github.com/ethancaballero/description2code)\n\nProblems include test cases in the form of paired inputs and outputs, as well as\nboth correct and incorrect human solutions in a variety of languages.\n\n## Install bazel\n\nFirst [install bazel](https://docs.bazel.build/versions/main/install.html)\nand verify it builds correctly (we only support Linux with clang, but other\nplatforms might work):\n\n```sh\nbazel build -c opt :print_names_and_sources\n```\n\n## Downloading the dataset\n\n[Install the Cloud SDK](https://cloud.google.com/sdk/docs/quickstart), which\nprovides the `gsutil` utility. You can then download the full data (~3GiB) with,\ne.g:\n\n```\ngsutil -m cp -r gs://dm-code_contests /tmp\n```\n\nThe data consists of `ContestProblem` protocol buffers in\n[Riegeli](https://github.com/google/riegeli) format. See `contest_problem.proto`\nfor the protocol buffer definition and documentation of its fields.\n\nThe dataset contains three splits:\n\nSplit      | Filename\n---------- | ----------------------------------------\nTraining   | `code_contests_train.riegeli-*-of-00128`\nValidation | `code_contests_valid.riegeli`\nTest       | `code_contests_test.riegeli`\n\nThere is example code for iterating over the dataset in C++ (in\n`print_names.cc`) and Python (in `print_names_and_sources.py`). For example, you\ncan print the source and name of each problem in the validation data by\n[installing bazel](https://docs.bazel.build/versions/main/install.html) and then\nrunning:\n\n```\nbazel run -c opt \\\n  :print_names_and_sources /tmp/dm-code_contests/code_contests_valid.riegeli\n```\n\nOr do the same for the training data with the following command (which will\nprint around 13000 lines of output):\n\n```\nbazel run -c opt \\\n  :print_names_and_sources /tmp/dm-code_contests/code_contests_train.riegeli*\n```\n\n## Executing and evaluating solutions\n\nThe `execution` subdirectory contains code for executing a solution and\nevaluating whether it solves a problem. `solve_example` demonstrates this\nfunctionality, and can be run with e.g.\n\n```\nbazel run -c opt execution:solve_example -- \\\n  --valid_path=/tmp/dm-code_contests/code_contests_valid.riegeli\n```\n\nNote, for the last command you should see one `Compilation failed` and two\n`Compilation succeeded`, if you see three `Compilation failed` then there is\nlikely an issue with the Python version used, please install and try several\nones before reporting a bug.\n\nThe execution code defaults to using Python 3.9 and 2.7, located at\n`/usr/bin/python3.9` and `/usr/bin/python2.7`, with standard libraries at\n`/usr/lib/python3.9` and `/usr/lib/python2.7`. These can be changed with the\nflags defined in `py_locations.cc`, for example:\n\n```\nbazel run -c opt execution:solve_example -- \\\n  --valid_path=/tmp/dm-code_contests/code_contests_valid.riegeli \\\n  --python3_path=/usr/bin/python3.10 --python3_library_paths=/usr/lib/python3.10\n```\n\nIn Debian/Ubuntu you can install specific Python versions with\n\n```\nsudo apt install python3.9 python3.10 python3.11\n```\n\nand you can check if you have some version installed by `which` provides output:\n\n```\nwhich python3.11\n```\n\nNote that the Python used for building with bazel and for executing inside the sandbox can be different.\n\n### Note on data and sandbox consistency\n\nThe incorrect and correct solutions attached to problems are not guaranteed to compile and execute in the exact same way as in their original contest website (for example different compiler versions or flags or different library versions). Some of the solutions will fail compilation, or will produce sandbox violations, especially if they are incorrect.\n\n### FAQ\n\nWe recommend running the following before reporting bugs, which wipes out the\nbazel state and sometimes fixes transient errors.\n\n```\nbazel clean --expunge\nrm -rf ~/.cache/bazel\n```\n\n## Supported platforms\n\nThis repository is supported on Linux, compiled with clang.\n\nPeople on MacOS have reported this error:\nhttps://github.com/deepmind/code_contests/issues/5\n\nWindows have reported this error:\nhttps://github.com/deepmind/code_contests/issues/9\n\n## Citing this work\n\nIf you use this dataset or code, please cite this paper:\n\n```\n@article{\n  doi:10.1126/science.abq1158,\n  author = {Yujia Li  and David Choi  and Junyoung Chung  and Nate Kushman  and Julian Schrittwieser  and R{\\'e}mi Leblond  and Tom Eccles  and James Keeling  and Felix Gimeno  and Agustin Dal Lago  and Thomas Hubert  and Peter Choy  and Cyprien de Masson d’Autume  and Igor Babuschkin  and Xinyun Chen  and Po-Sen Huang  and Johannes Welbl  and Sven Gowal  and Alexey Cherepanov  and James Molloy  and Daniel J. Mankowitz  and Esme Sutherland Robson  and Pushmeet Kohli  and Nando de Freitas  and Koray Kavukcuoglu  and Oriol Vinyals },\n  title = {Competition-level code generation with AlphaCode},\n  journal = {Science},\n  volume = {378},\n  number = {6624},\n  pages = {1092-1097},\n  year = {2022},\n  doi = {10.1126/science.abq1158},\n  URL = {https://www.science.org/doi/abs/10.1126/science.abq1158},\n  eprint = {https://www.science.org/doi/pdf/10.1126/science.abq1158},\n  abstract = {Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models show impressive code generation abilities yet still perform poorly on more complex tasks requiring problem-solving skills, such as competitive programming problems. Here, we introduce AlphaCode, a system for code generation that achieved an average ranking in the top 54.3\\% in simulated evaluations on recent programming competitions on the Codeforces platform. AlphaCode solves problems by generating millions of diverse programs using specially trained transformer-based networks and then filtering and clustering those programs to a maximum of just 10 submissions. This result marks the first time an artificial intelligence system has performed competitively in programming competitions. Computer programming competitions are popular tests among programmers that require critical thinking informed by experience and creating solutions to unforeseen problems, both of which are key aspects of human intelligence but challenging to mimic by machine learning models. Using self-supervised learning and an encoder-decoder transformer architecture, Li et al. developed AlphaCode, a deep-learning model that can achieve approximately human-level performance on the Codeforces platform, which regularly hosts these competitions and attracts numerous participants worldwide (see the Perspective by Kolter). The development of such coding platforms could have a huge impact on programmers’ productivity. It may even change the culture of programming by shifting human work to formulating problems, with machine learning being the main one responsible for generating and executing codes. —YS Modern machine learning systems can achieve average human-level performance in popular competitive programming contests.}}\n```\n\n## License\n\nThe code is licensed under the\n[Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).\n\nAll non-code materials provided are made available under the terms of the CC BY\n4.0 license\n([Creative Commons Attribution 4.0 International license](https://creativecommons.org/licenses/by/4.0/legalcode)).\n\nWe gratefully acknowledge the contributions of the following:\n\n*   Codeforces materials are sourced from http://codeforces.com.\n*   Description2Code materials are sourced from:\n    [Description2Code Dataset](https://github.com/ethancaballero/description2code),\n    licensed under the\n    [MIT open source license](https://opensource.org/licenses/MIT), copyright\n    not specified.\n*   CodeNet materials are sourced from:\n    [Project_CodeNet](https://github.com/IBM/Project_CodeNet), licensed under\n    [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0), copyright not\n    specified.\n\nUse of the third-party software, libraries code or data may be governed by\nseparate terms and conditions or license provisions. Your use of the third-party\nsoftware, libraries or code may be subject to any such terms. We make no\nrepresentations here with respect to rights or abilities to use any such\nmaterials.\n\n## Disclaimer\n\nThis is not an official Google product.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-deepmind%2Fcode_contests","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-deepmind%2Fcode_contests","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-deepmind%2Fcode_contests/lists"}