{"id":19381191,"url":"https://github.com/j2kun/fkl-sdm16","last_synced_at":"2025-10-09T15:33:49.535Z","repository":{"id":71944350,"uuid":"49983638","full_name":"j2kun/fkl-SDM16","owner":"j2kun","description":"Code and experiments for \"A confidence-based approach for balancing fairness and accuracy\"","archived":false,"fork":false,"pushed_at":"2020-06-09T03:20:24.000Z","size":838,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-09T15:33:04.761Z","etag":null,"topics":["fairness","machine-learning","research-paper"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/1601.05764","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/j2kun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-01-19T21:15:32.000Z","updated_at":"2023-06-21T02:48:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"fc6c245a-1083-4f63-a874-08263ee4e42e","html_url":"https://github.com/j2kun/fkl-SDM16","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/j2kun/fkl-SDM16","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j2kun%2Ffkl-SDM16","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j2kun%2Ffkl-SDM16/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j2kun%2Ffkl-SDM16/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j2kun%2Ffkl-SDM16/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/j2kun","download_url":"https://codeload.github.com/j2kun/fkl-SDM16/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/j2kun%2Ffkl-SDM16/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001645,"owners_count":26083147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fairness","machine-learning","research-paper"],"created_at":"2024-11-10T09:16:16.955Z","updated_at":"2025-10-09T15:33:49.530Z","avatar_url":"https://github.com/j2kun.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Code and experiments for \"A confidence-based approach for balancing fairness and accuracy\"\n\nAll experiments used in this paper were implemented in Python 3 with the following\ndependencies\n\n    numpy\n    matplotlib\n    scikitlearn\n\n\n## One-click rerun of all experiments\n\nTo re-run all experiments used in the paper, run the following from the command line\n\n    ./run-all.sh\n\nThis will re-run all the experiments and output the data to plaintext\nfiles in the results/ subdirectory.\n\nTo generate all plots used in the paper, run the following from the command line:\n\n    python plot-all.py\n\n\n## Datasets\n\nThe datasets are given the following names\n\n    adult\n    german \n    singles \n\n### Loading into Python\n\nFor each dataset there is a data loader module and a baseline (see the\nBaselines section below). We will use `adult` as the prototype, and unless\notherwise stated all datasets operate the same way with `adult` replaced by the\ndataset name. The raw data files are `adult.train` and `adult.test`. If\npreprocessing occurred to split a dataset into training and testing subsets,\nthen the unprocessed data files are in the `preprocessing/` subdirectory along\nwith python scripts to perform the (randomized) preprocessing. Additional\npreprocessing is performed to turn categorical features into (possibly many)\nbinary features.\n\nTo load a dataset, you can run the following commands from the base directory\nof the project.\n\n    $ python\n    Python 3.3.3 (default, Dec 30 2013, 23:51:18) \n    [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin\n    \u003e\u003e\u003e from data import adult\n    \u003e\u003e\u003e trainingData, testData = adult.load()\n    \u003e\u003e\u003e adult.protectedIndex\n    1\n    \u003e\u003e\u003e len(trainingData)\n    32561\n    \u003e\u003e\u003e trainingData[0]\n    ((39, 1, 0, 0, 0, 0, 0, 1, 0, 0, 13, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n    0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2174, 0, 40, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n    0, 0, 0, 0, 0), -1)\n\n## An example experiment\n\nAn example experiment, testing the linear regression learner on the German dataset.\n\n```\nPython 3.6.3 (default, Oct  4 2017, 06:09:15) \n[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin\n\u003e\u003e\u003e from data import german\n\u003e\u003e\u003e train, test = german.load()\n\u003e\u003e\u003e from margin import *\n\u003e\u003e\u003e def lrLearner(train, protectedIndex, protectedValue):\n...    marginAnalyzer = lrSKLMarginAnalyzer(train, protectedIndex, protectedValue)\n...    shift = marginAnalyzer.optimalShift()\n...    print('best shift is: %r' % (shift,))\n...    return marginAnalyzer.conditionalShiftClassifier(shift)\n... \n\u003e\u003e\u003e h = lrLearner(train, german.protectedIndex, german.protectedValue)\nbest shift is: -0.19250157835095894\n\u003e\u003e\u003e from errorfunctions import signedStatisticalParity, labelError, individualFairness\n\u003e\u003e\u003e labelError(test, h)\n0.25825825825825827 \n```\n\nCopy-pastable:\n\n```\nfrom data import german\nfrom errorfunctions import signedStatisticalParity, labelError, individualFairness\nfrom margin import *\n\ntrain, test = german.load()\ndef lrLearner(train, protectedIndex, protectedValue):\n    marginAnalyzer = lrSKLMarginAnalyzer(train, protectedIndex, protectedValue)\n    shift = marginAnalyzer.optimalShift()\n    print('best shift is: %r' % (shift,))\n    return marginAnalyzer.conditionalShiftClassifier(shift)\n \nh = lrLearner(train, german.protectedIndex, german.protectedValue)\nlabelError(test, h)\n```\n\n## Experiments\n\nThe experiments are organized by method, using the acronyms from the paper.  So\nrandom relabeling (RR) is in the `experiment-RR.py` file. Each experiment has a\n`runAll()` function that runs all of the experiments for every dataset and\nlearner (SVM, logistic regression, and AdaBoost). Note that boosting and SVM\ntake ~5-30 minutes per run on large datasets, and each experiment averages over\n10 runs.\n\n## Plots\n\nThe main plots in the paper are produced by the MarginAnalyzer class in\n`margin.py`. See the `MarginAnalyzer.plotMarginHistogram` and\n`MarginAnalyzer.plotTradeoff` functions for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fj2kun%2Ffkl-sdm16","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fj2kun%2Ffkl-sdm16","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fj2kun%2Ffkl-sdm16/lists"}