{"id":35015607,"url":"https://github.com/mdda/cryptic-crossword-reasoning-verifier","last_synced_at":"2026-05-20T08:31:00.501Z","repository":{"id":273257676,"uuid":"919124361","full_name":"mdda/cryptic-crossword-reasoning-verifier","owner":"mdda","description":null,"archived":false,"fork":false,"pushed_at":"2026-01-11T18:50:18.000Z","size":435,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-11T19:43:58.566Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mdda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-19T19:04:33.000Z","updated_at":"2026-01-11T18:50:22.000Z","dependencies_parsed_at":"2025-01-19T20:42:28.525Z","dependency_job_id":null,"html_url":"https://github.com/mdda/cryptic-crossword-reasoning-verifier","commit_stats":null,"previous_names":["mdda/cryptic-crossword-reasoning-verifier"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mdda/cryptic-crossword-reasoning-verifier","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcryptic-crossword-reasoning-verifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcryptic-crossword-reasoning-verifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcryptic-crossword-reasoning-verifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcryptic-crossword-reasoning-verifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mdda","download_url":"https://codeload.github.com/mdda/cryptic-crossword-reasoning-verifier/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdda%2Fcryptic-crossword-reasoning-verifier/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33251981,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-20T04:48:54.280Z","status":"ssl_error","status_checked_at":"2026-05-20T04:48:10.851Z","response_time":356,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-27T05:19:24.168Z","updated_at":"2026-05-20T08:31:00.479Z","avatar_url":"https://github.com/mdda.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# cryptic-crossword-reasoning-verifier\n\n## Code Quick-start : \n\n* Have a look at the Jupyter Notebooks (pre-rendered) in `./notebooks`\n* And (for the main `solver` library) in `./solver` \n\n## Papers\n\nThis repo is a copy of the relevant code for the paper:\n\n* [\"A Reasoning-Based Approach to Cryptic Crossword Clue Solving\"](http://arxiv.org/abs/2506.04824) - Andrews \u0026 Witteveen (2025)\n  + Accepted at ICML 2025 Forty-second International Conference on Machine Learning).\n\nRelated Papers :\n\n* [\"Generating Code to Verify Cryptic Crossword Reasoning\"](https://openreview.net/forum?id=2nC7zy7adD) - Andrews \u0026 Witteveen (2025)\n  + Accepted at the [ICLR 2025 Deep Learning for Code Workshop](https://dl4c.github.io/)\n  + [Workshop Poster](https://iclr.cc/media/PosterPDFs/ICLR%202025/34846.png)\n  + [Workshop listing includes SlidesLive Video](https://iclr.cc/virtual/2025/34846)\n\n* [\"Proving that Cryptic Crossword Clue Answers are Correct\"](https://arxiv.org/abs/2407.08824) - Andrews \u0026 Witteveen (2024)\n  + Accepted at the [ICML 2024 Workshop on LLMs and Cognition](https://llm-cognition.github.io/)\n  + [Explainer Video on YouTube](https://www.youtube.com/watch?v=vLITb6XDTQ8)\n\n## Overview\n\nhttps://en.wikipedia.org/wiki/Cryptic_crossword\n\nA good cryptic clue contains three elements:\n* a precise definition\n* a fair subsidiary indication\n* nothing else\n\n\n## \"Lay Summary\"\n\nWe're interested in a type of puzzle common in major newspapers (in the UK, and elsewhere) : Cryptic Crosswords.\nEach cryptic clue hints to its unique answer in two ways : a 'regular crossword definition' and 'wordplay'.\nBecause the wordplay and definition must have the same answer, \nsolvers know whether they've got the answer correct (even without having other answers in the crossword grid).\n\nOur work uses a large language model (LLM) to guess possible answers, \nand then justify the reasoning, finally delivering its solution in the Python programming language.\nBy getting the reasoning as a small computer program,\nwe can easily tell whether the LLM has got the reasoning correct, \nand this enables our method to beat ChatGPT and other models.\n\nAlthough we focused on Cryptic Crosswords, our ideas could also be applied to other linguistic tasks, \nopening them up to methods that are more commonly used for mathematics and programming problems.\n\n\n## Get external libraries/data\n\n### Cryptonite\n\n* Key benchmark Times/Telegraph dataset\n\n```bash\nwget https://github.com/aviaefrat/cryptonite/blob/main/data/cryptonite-official-split.zip?raw=true\nunzip 'cryptonite-official-split.zip?raw=true' -d data_orig\nrm https://github.com/aviaefrat/cryptonite/blob/main/data/cryptonite-official-split.zip?raw=true\n```\n\n### FastText\n\n* Used for word/phrase embeddings\n\n```bash\n# https://github.com/facebookresearch/fastText/issues/512\nuv pip install wheel setuptools # Since newer pip behaves a bit more strictly?\n#pip install fasttext # FAILS with gcc error\ngit clone https://github.com/facebookresearch/fastText.git\npushd fastText\nuv pip install .\npopd\n# Test by importing 'fasttext' within python\n\n# DO THIS ONCE : \n  # rm cc.en.300.bin.gz\n  # https://fasttext.cc/docs/en/crawl-vectors.html#adapt-the-dimension\n  import fasttext\n  import fasttext.util\n  ft = fasttext.load_model('cc.en.300.bin')\n  ft.get_dimension() # 300\n  fasttext.util.reduce_model(ft, 100)\n  ft.get_dimension() # 100\n  ft.save_model('cc.en.100.bin')  # Use this in the code...\n\n```\n\n### Decrypt Dataset (Guardian)\n\n* [jsrozner/decrypt: Repository for paper Decrypting Cryptic Crosswords](https://github.com/jsrozner/decrypt)\n\nIncludes dictionary, names, `deits_anag_indic` (anagram indicator word list)\n\n```bash\nwget https://github.com/jsrozner/decrypt/raw/main/data/guardian_2020_10_08.json.zip\nunzip guardian_2020_10_08.json.zip -d data_orig\n#... Except this doesn't have any across/down information -\u003e SAD\n# Issue (2) : Publish clues as json with information to reconstruct puzzles fully removed \n```\n\n### Crossword Word List\n\nThis project uses the UK Advanced Cryptics Dictionary, Copyright (c) 2009 J Ross Beresford. \nFor license information see `UKACD.txt` after download.\n\n```bash\n# https://cfajohnson.com/wordfinder/singlewords\n# This is actually a slightly OLDER version than the one found via rdeits\n#   the rdeits version was used for the paper results...\nwget https://cfajohnson.com/wordfinder/UKACD17.tgz\ntar -xzf ./UKACD17.tgz UKACD17.TXT\nmv UKACD17.TXT UKACD.txt # expected location\n```\n\n### Indicator Word Lists\n\nIndicator word lists (included via [rdeits/cryptics installation](https://github.com/rdeits/cryptics/)) are from:\n* http://sutherland-studios.com.au/puzzles/anagram.php\n* http://www.crosswordunclued.com/2008/09/dictionary.html\n\n```bash\n# Pull in the CrypticCrosswords library for its abbreviations and actions data\ngit clone https://github.com/rdeits/CrypticCrosswords.jl.git solver/\n\necho '# NEW ADDITIONS' \u003e\u003e solver/CrypticCrosswords.jl/corpora/indicators/InitialSubstring\necho 'briefly' \u003e\u003e solver/CrypticCrosswords.jl/corpora/indicators/InitialSubstring\necho 'most of' \u003e\u003e solver/CrypticCrosswords.jl/corpora/indicators/InitialSubstring\n```\n\n## Using the Gemini-LLM\n\nThe `gemini-1.5-flash-001` model is used via `solver/llm.py`, \nand will use the VertexAI credentials you provide in `./key-vertexai-iam.json`, \nand/or the API key provided in `./config.yaml`\n\n```bash\nexport GOOGLE_APPLICATION_CREDENTIALS=\"./key-vertexai-iam.json\"\n```\n\nThe code also allows for usage of the \\$FREE Gemini API \n(for which you'll need to add a `free=True` flag to the `get_model()` calls).\n\n\n## Library installation\n\n```bash\nuv venv ~/env312\n. ~/env312/bin/activate\n```\n\n```bash\nuv pip install jupyterlab jupytext ipywidgets\nuv pip install omegaconf numpy\nuv pip install levenshtein pytz  # pytz=Timezone stuff for logging\nuv pip install -U google-generativeai   # Some complaints about tensorflow-metadata and protobuf\nuv pip install -U vertexai\nuv pip install redis\n\n# For the unsloth stuff (Gemma, etc)\nuv pip install tf-keras\nuv pip install \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\"\nuv pip install --no-deps \"xformers\u003c0.0.26\" trl peft accelerate bitsandbytes\n```\n\n\n## Examining / Running the notebooks\n\n* NB: To just have a look at the notebook outputs, see : `./notebooks/*.ipynb` (as expected)\n\n`jupytext` has been used within JupyterLab for the notebooks : This means that the actual saved-to-github \ncode is the the `.py` files in the main directory, which should be run in JupyterLab (say) using the \n`jupytext` plugin, and choosing `Open as Notebook` on the `.py` file.\n\nThe local notebook contents is stored to `cache-notebooks`, and not checked into the repo.  i.e. the following was done:\n```bash\njupytext --set-formats cache-notebooks//ipynb,py XYZ.py\n```\n\n## Citing this work\n\n```bibtex\n@inproceedings{andrews-cryptic-reasoning-2025,\n  title={A Reasoning-Based Approach to {Cryptic Crossword} Clue Solving},\n  author={Martin Andrews and Sam Witteveen},\n  booktitle={Forty-second International Conference on Machine Learning},\n  year={2025},\n  url={https://openreview.net/forum?id=kBTgizDiCq},\n  url_arxiv={http://arxiv.org/abs/2506.04824}\n}\n```\n\n\n\n### Acknowledgements\n\n* Support for this research was provided by the Google AI Developer Programs team, including access to the Gemini models and GPUs on Google Cloud Platform.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdda%2Fcryptic-crossword-reasoning-verifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmdda%2Fcryptic-crossword-reasoning-verifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdda%2Fcryptic-crossword-reasoning-verifier/lists"}