{"id":19542035,"url":"https://github.com/bigscience-workshop/shadesofbias","last_synced_at":"2026-03-19T10:21:45.605Z","repository":{"id":243693806,"uuid":"813168288","full_name":"bigscience-workshop/ShadesofBias","owner":"bigscience-workshop","description":"Evaluation for Shades of Bias in Text","archived":false,"fork":false,"pushed_at":"2024-10-21T23:31:16.000Z","size":10663,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-11-11T03:13:00.352Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bigscience-workshop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-10T15:42:36.000Z","updated_at":"2024-10-21T23:31:19.000Z","dependencies_parsed_at":"2024-06-10T18:20:34.317Z","dependency_job_id":"7bc40949-64b4-4cc3-83fa-3408418fb35f","html_url":"https://github.com/bigscience-workshop/ShadesofBias","commit_stats":null,"previous_names":["bigscience-workshop/shadesofbias"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2FShadesofBias","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2FShadesofBias/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2FShadesofBias/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigscience-workshop%2FShadesofBias/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bigscience-workshop","download_url":"https://codeload.github.com/bigscience-workshop/ShadesofBias/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233083811,"owners_count":18622563,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T03:12:51.801Z","updated_at":"2026-03-04T05:01:45.229Z","avatar_url":"https://github.com/bigscience-workshop.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ShadesofBias\nThis repository provides scripts and code use in the [Shades of Bias in Text Dataset](https://huggingface.co/datasets/LanguageShades/BiasShades).\nIt includes code for processing the data, and for evaluation to measure bias in Language Models across languages.\n\n## Data Processing\n\n**process_dataset/map_dataset.py** takes https://huggingface.co/datasets/LanguageShades/BiasShadesRaw and normalizes/formats to produce https://huggingface.co/datasets/LanguageShades/BiasShadesRaw\n\n**process_dataset/extract_vocabulary.py** takes https://huggingface.co/datasets/LanguageShades/BiasShadesRaw and aligns each statement to its corresponding template slots, printing out results -- and how well the alignment worked -- in https://huggingface.co/datasets/LanguageShades/LanguageCorrections\n\n## Evaluation\n\n### HF Endpoints\nTo use HF Endpoint navigate to [Shades](https://ui.endpoints.huggingface.co/LanguageShades/endpoints) if you have access. If not copy the .env file in your root directory.\n\n### Example Script\nRun `example_logprob_evaluate.py` to iterate through the dataset for a given model and compute log probability of biased sentences. If you have the .env, load_endpoint_url(model_name) will load the model if it has been created for that model.\n\nRun `generation_evaluate.py` to iterate through the dataset, with each instance formatted with a specified prompt from `prompts/`. It is possible to specify a prompt language that is different from the original language. Prompt language will be set to Enlish unless further specified. If you have the .env, load_endpoint_url(model_name) will load the model if it has been created for that model.\n\n#### Add more prompts\nFollow the examples in `prompts/` to create a `.txt` file for new prompt. Input field should be indicated with `{input}` in the text file.\n\n### Base Models\nCurrent [Proposed Model List](https://docs.google.com/spreadsheets/d/1VIOlRclodnwu0nfIWX211LsQ01cWXjQ3/edit#gid=1485273927)\n\n### 'Aligned' models\nTodo\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigscience-workshop%2Fshadesofbias","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbigscience-workshop%2Fshadesofbias","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigscience-workshop%2Fshadesofbias/lists"}