{"id":30288578,"url":"https://github.com/da03/moderation_issue","last_synced_at":"2025-08-16T22:37:56.916Z","repository":{"id":282936594,"uuid":"788175130","full_name":"da03/moderation_issue","owner":"da03","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-17T23:44:46.000Z","size":3929,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-17T19:11:40.405Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/da03.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-17T23:19:00.000Z","updated_at":"2024-09-13T13:16:01.000Z","dependencies_parsed_at":"2025-03-17T19:11:42.895Z","dependency_job_id":"c7fd100a-be62-4eec-997c-32f116428315","html_url":"https://github.com/da03/moderation_issue","commit_stats":null,"previous_names":["da03/moderation_issue"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/da03/moderation_issue","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/da03%2Fmoderation_issue","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/da03%2Fmoderation_issue/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/da03%2Fmoderation_issue/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/da03%2Fmoderation_issue/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/da03","download_url":"https://codeload.github.com/da03/moderation_issue/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/da03%2Fmoderation_issue/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270781214,"owners_count":24643808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-16T22:37:56.310Z","updated_at":"2025-08-16T22:37:56.906Z","avatar_url":"https://github.com/da03.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenAI Moderation API 429 or 500 Error Despite Not Reaching Usage Limit\n\nWhen the input to the Moderation API is long, it raises a 429 rate limit error (or sometimes 500 error) even without actually reaching the rate limit. Note that this cannot be resolved by waiting and resending the same request, as the same error would be raised.\n\n## Code to Reproduce the Error\n\nBelow is a minimum code snippet that can reproduce the error using a request of 6k Chinese characters. This example can also be found in a single file `test_moderation.py`:\n\n```\nfrom openai import OpenAI\nimport time\nimport requests\nclient = OpenAI()\nresponse = requests.get(\"https://raw.githubusercontent.com/da03/moderation_issue/main/example.txt\")\n\nflag_produce_error = True # when True, produces a 429/500 error; False, no error\nflag_produce_error = False\nif flag_produce_error: # raises a 429/500 error with 6,000 characters\n    text = response.text[:6000]\nelse: # works fine with 5,999 characters\n    text = response.text[:5999]\n\nprint (f'Number of characters: {len(text)}')\ntry:\n    response = client.moderations.create(input=text)\nexcept Exception as e: \n    print ('error', e)\n```\n\nIn the above code, using 6000 characters will fail (by setting `flag_produce_error` to `True` in above code) with a 429 Rate limit error (or 500 server error), while using 5999 characters will succeed (by setting `flag_produce_error` to `False` in above code).\n\n\n## Hypothesis: Encoding Issues\n\nInspired by [pondin6666](https://community.openai.com/u/pondin6666), I suspect that these errors might be linked to encoding issues, particularly involving non-Latin characters. For example, if we use English characters in the above example (as opposed to Chinese characters in the original text), even scaling to millions of characters still works:\n\n```\nfrom openai import OpenAI\nimport time\nimport requests\nclient = OpenAI()\nimport string, random\n\ntext = ''.join(random.choices(string.ascii_uppercase + string.digits, k=1000000))\nprint (f'Number of characters: {len(text)}')\ntry:\n    response = client.moderations.create(input=text)\nexcept Exception as e: \n    print ('error', e)\n```\n\n\n## Evidence supporting Hypotheis\n\nI used the Moderation API to flag toxic data in the [WildChat dataset](https://huggingface.co/datasets/allenai/WildChat). During this process, I collected statistics on the languages of failing examples and observed that most errors involved inputs containing non-Latin characters, such as Korean and Chinese. Below are the detailed statistics showing the disproportionate occurrence of errors with these languages:\n\n### Error Distribution by Language\n\n- **Korean**: 66.44% of errors (0.51% of dataset)\n- **Chinese**: 10.96% of errors (13.54% of dataset)\n- **English**: 6.85% of errors (54.92% of dataset) (mostly containing special characters like ψ or •)\n- **Russian**: 3.94% of errors (11.77% of dataset)\n- **Japanese**: 2.40% of errors (0.53% of dataset)\n- **Hindi**: 2.23% of errors (0.03% of dataset)\n\nThis suggests a possible correlation between non-Latin characters and the increased likelihood of receiving a 429 or 500 error from the Moderation API. To enable others to verify my results, I have included failing examples in [failing_examples](failing_examples).\n\n\n## Workaround\n\nWhile awaiting a more permanent fix or clarification from OpenAI, I've implemented a temporary workaround that involves breaking down large inputs into smaller segments, and then taking their maximum category scores as the result. In case others encounter the same issue, I have included my workaround in [workaround.py](workaround.py) as part of this repo.\n\n## Joing the Discussion\n\nhttps://community.openai.com/t/moderation-raises-429-rate-limit-error-for-long-input/718609\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fda03%2Fmoderation_issue","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fda03%2Fmoderation_issue","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fda03%2Fmoderation_issue/lists"}