{"id":24826840,"url":"https://github.com/andrewimpellitteri/creativity_bench","last_synced_at":"2025-03-26T00:47:25.979Z","repository":{"id":274541403,"uuid":"923235648","full_name":"andrewimpellitteri/creativity_bench","owner":"andrewimpellitteri","description":"A benchmark for the creativity of LLMs based on Gwern's post","archived":false,"fork":false,"pushed_at":"2025-02-03T23:13:19.000Z","size":1255,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-04T00:23:29.662Z","etag":null,"topics":["ai","benchmark","creativity","evaluation","llm","llms","llms-benchmarking","writing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andrewimpellitteri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-27T21:36:56.000Z","updated_at":"2025-02-03T23:13:22.000Z","dependencies_parsed_at":"2025-01-27T23:38:30.863Z","dependency_job_id":null,"html_url":"https://github.com/andrewimpellitteri/creativity_bench","commit_stats":null,"previous_names":["andrewimpellitteri/creativity_bench"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewimpellitteri%2Fcreativity_bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewimpellitteri%2Fcreativity_bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewimpellitteri%2Fcreativity_bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewimpellitteri%2Fcreativity_bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andrewimpellitteri","download_url":"https://codeload.github.com/andrewimpellitteri/creativity_bench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245568586,"owners_count":20636803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","benchmark","creativity","evaluation","llm","llms","llms-benchmarking","writing"],"created_at":"2025-01-30T21:37:32.554Z","updated_at":"2025-03-26T00:47:25.974Z","avatar_url":"https://github.com/andrewimpellitteri.png","language":"Python","readme":"# LLM Creativity Benchmark\nA comprehensive evaluation suite for measuring creative capabilities in large language models (LLMs). This benchmark assesses multiple key dimensions of creativity through structured tests and quantitative metrics.\n\nBased on [this](https://gwern.net/creative-benchmark) post by Gwern.\n\n![results](model_comparison.png)\n\n## Key Features\n- **Free Association Test**: Measures lexical originality and vocabulary estimation\n- **Telephone Game**: Quantifies semantic drift through iterative paraphrasing\n- **Camel's Back Challenge**: Tests narrative coherence under multiple edits\n- **DRY (Don't Repeat Yourself) Test**: Evaluates output diversity across prompts\n- **Extreme Style Transfer Test**: Take a set of stories with genre labels; ask a LLM to summarize each one; then ask it to write a story using only the summary and a random other genre label; score based on how different the other genre versions are from the original. \n- **Composite Creativity Score**: Combined metric aggregating multiple dimensions\n\nSupports `ollama` for local generation as well as the OpenAI API (URL is set to huggingface).\n\n`config.py` has some stories and genre labels for extreme generation but these are very basic and can be improved.\n\nI have only run the model on a few small models as it takes a bit to run and I am GPU poor :(\n\n## Installation and Usage\n\nClone the repository and run:\n\n```\npip install -r requirements.txt\n```\n\nSet HF_TOKEN to your huggingface token as an environment variable and run `cli.py`:\n\n```\nusage: cli.py [-h] [--model MODEL] [--prompt PROMPT] [--save] [--use_api] [--n N]\n\nRun the LLM Creativity Benchmark and output the results.\n\noptions:\n  -h, --help       show this help message and exit\n  --model MODEL    Name of the model to benchmark.\n  --save           Save results as JSON file in 'runs' directory\n  --use_api        Use Hugging Face API for generation\n  --n N            Number of benchmark runs\n```\n\nTo visualize the results of all evaluation runs:\n\n```\npython visualize_results.py\n```\n\n## Contributing\n\nPlease feel free to contribute by submitting pull requests or issues. I am hoping to implement all of the benchmarks mentioned by Gwern's post. We welcome any feedback on how we can improve the benchmark suite.\n\nAlso if anyone would like to run the benchmark on different models and submit that results, please do so!\n\n## License\n\nMIT","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewimpellitteri%2Fcreativity_bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrewimpellitteri%2Fcreativity_bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewimpellitteri%2Fcreativity_bench/lists"}