{"id":25477307,"url":"https://github.com/datamine/rlhf-signatures","last_synced_at":"2025-11-06T03:30:32.030Z","repository":{"id":277757295,"uuid":"933128692","full_name":"Datamine/rlhf-signatures","owner":"Datamine","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-15T22:08:10.000Z","size":0,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-15T22:25:37.471Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Datamine.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-15T08:00:19.000Z","updated_at":"2025-02-15T22:08:13.000Z","dependencies_parsed_at":"2025-02-15T22:25:39.688Z","dependency_job_id":"752fa4fe-0f4f-4b1c-a702-7750c58d57fa","html_url":"https://github.com/Datamine/rlhf-signatures","commit_stats":null,"previous_names":["datamine/rlhf-signatures"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Datamine%2Frlhf-signatures","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Datamine%2Frlhf-signatures/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Datamine%2Frlhf-signatures/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Datamine%2Frlhf-signatures/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Datamine","download_url":"https://codeload.github.com/Datamine/rlhf-signatures/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239481803,"owners_count":19646107,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-18T13:39:16.256Z","updated_at":"2025-11-06T03:30:31.995Z","avatar_url":"https://github.com/Datamine.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RLHF Signatures\n\n### Getting Started\n\nThis project uses the UV Python package manager. [Install UV locally](https://docs.astral.sh/uv/getting-started/installation/),\nthen run inside this directory:\n\n```\nuv venv\nsource .venv/bin/activate\nuv sync\n```\n\nThat will install all the required packages to this virtual environment, and you'll be able to run all the scripts herein.\nThis project is configured for Python 3.12, so you may need to fiddle with the configuration if your system has a lower Python version.\n(Practically, it shouldn't make a difference.)\n\n### Running Your Own Experiments\n\nFor replicating my results, you can follow these steps:\n\n1. Ensure that you have the following environment variables set:\n```\nexport OPENAI_API_KEY=\"...\"\nexport DEEPSEEK_API_KEY=\"...\"\nexport ANTHROPIC_API_KEY=\"...\"\nexport GEMINI_API_KEY=\"...\"\nexport TOGETHER_AI_API_KEY=\"...\"\n```\n\n2. Optional: run `generate_questions.py` to generate questions.csv. This will generate the same questions as in the `questions/` directory.\nYou can modify the wording inside `generate_questions.py` to generate your own question sets. I recommend truncating your question set into a `questions_short.csv` version\nfor testing. One thing that's **very important** is that the questions here are designed to be symmetric:\nfor every question that asks the LLM to choose between Option A and Option B, there's an identical question that asks the LLM to choose between Option B and Option A.\nAs the later sections show, some models have significant ordering biases, so creating question sets that are perfectly symmetrical with respect to ordering is\nimportant to counteract any biasing effects.\n\n3. Run `get_answers.py questions/questions.csv`. This will query all the APIs with the questions from `questions.csv`. (You can use a different Questions file, of course. But beware of line 38, the script assumes that the actual question itself is in the fourth column of the CSV.)\nIt is designed to be robust errors/interruptions, and therefore handles\nevery single row as a separate file-write. It will skip over all previously written questions. Pay attention to the logs in case of rate limiting, as I encountered with Gemini 2.0 Pro.\n\n4. Run `validate_answers.py dataset/` to check all the answers for validity. There's a small amount of cleanup that may be necessary: Gemini's models terminate all their answers with a newline, and various models will insert periods, and occasionally answer with a whole sentence.\nThis takes only a few minutes to review and clean up in Excel or Vim -- I recommend cleaning up the data yourself, as it's a good chance to look closely at it, run a sanity chec, and notice any patterns.\n\n5. Run `naive_order_bias.py` on each of your answer files to test whether the LLM is systematically biased toward answering with the first (or second) option.\nIt outputs a Binomial Test and a Chi-Square Test against the Null Hypothesis that there is no ordering bias.\n\n6. Run `paired_order_bias.py` on each of your answer files as a third ordering test: it runs a paired t-test to evaluate whether the mean difference\nin proportions (between the two orderings, Option A vs Option B and Option B vs Option A) is statistically different from zero. The t-statistic indicates\nwhether or not the ordering has an effect on the choice. (What's different about this test is that it explicitly uses the symmetrical ordering pairs, whereas the Binomial and Chi-Square tests don't.)\n\n7. Run `bradley_terry.py` to compute the Bradley-Terry parameters to evaluate the relative strength of preference for each\nof the Options within a given model run. You can run it on an individual file (`python bradley_terry.py answers/...`) to get the full\nstatistics printout, or run it on a whole directory (`python bradley_terry.py answers/`) to output a spreadsheet of all the strength parameters,\ni.e. as a table over all the Options and the Models.\n\n### TODO\n\n1. Find a good non-RLHF benchmark model.\nI couldn't find a good non-RLHF benchmark model.\nI tried [EleutherAI/GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b), but it was too bad to to be useful.\nI couldn't find a hosted version of [Bloom-176B](https://huggingface.co/bigscience/bloom).\n\n2. Combine `naive_order_bias.py` and `paired_order_bias.py` into one file and  have them output to a spreadsheet, rather than printing the results.\n\n3. Write an explanation of the results section.\n\n4. Rewrite `get_answers.py` to not have a magic number for the index of the Question in the spreadsheet row, but rather infer it from the header.\n\n### FAQ\n\n**Q: How do I contact you with my question?**\n\nA: Write to contact@johnloeber.com\n\n**Q: How much did it cost to run these experiments?**\n\nA: Maybe $20 in total. My calls to Llama-405B cost me only $0.10 total via TogetherAI. DeepSeek cost me ~50 cents, even with their reasoning model.\nOpenAI managed to charge me $15 for 200,000 output tokens, which had me mystified until I remembered that they don't just charge for output, but\nfor the Chain-of-Thought tokens, since O1 is a reasoning model.\n\n**Q: How can I contribute?**\n\nA: Feel free to file issues, PRs, etc. I have GitHub notifications turned off, so email me if there's something I should look at.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatamine%2Frlhf-signatures","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatamine%2Frlhf-signatures","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatamine%2Frlhf-signatures/lists"}