{"id":16518622,"url":"https://github.com/warisgill/tracefl","last_synced_at":"2025-06-26T02:03:14.824Z","repository":{"id":252986731,"uuid":"842116607","full_name":"warisgill/TraceFL","owner":"warisgill","description":"TraceFL is a novel mechanism for Federated Learning that achieves interpretability by tracking neuron provenance. It identifies clients responsible for global model predictions, achieving 99% accuracy across diverse datasets (e.g., medical imaging) and neural networks (e.g., GPT).","archived":false,"fork":false,"pushed_at":"2024-11-12T00:41:33.000Z","size":4155,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-26T02:03:14.256Z","etag":null,"topics":["accountability","debugging","differential-privacy","explainability","explainable-ai","federated-learning","interpretability","interpretability-and-explainability","machine-learning","software-engineering","testing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/warisgill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-13T17:48:04.000Z","updated_at":"2025-04-28T02:12:34.000Z","dependencies_parsed_at":"2025-02-13T06:43:23.570Z","dependency_job_id":null,"html_url":"https://github.com/warisgill/TraceFL","commit_stats":null,"previous_names":["warisgill/tracefl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/warisgill/TraceFL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warisgill%2FTraceFL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warisgill%2FTraceFL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warisgill%2FTraceFL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warisgill%2FTraceFL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/warisgill","download_url":"https://codeload.github.com/warisgill/TraceFL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/warisgill%2FTraceFL/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261984644,"owners_count":23240302,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accountability","debugging","differential-privacy","explainability","explainable-ai","federated-learning","interpretability","interpretability-and-explainability","machine-learning","software-engineering","testing"],"created_at":"2024-10-11T16:37:01.905Z","updated_at":"2025-06-26T02:03:14.798Z","avatar_url":"https://github.com/warisgill.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance\n\n\u003e **Accepted at 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)** [[Arxiv Paper Link](https://arxiv.org/pdf/2312.13632)]\n\nFor questions or feedback, please contact at [waris@vt.edu](mailto:waris@vt.edu). The `code` is written in [Flower FL Framework](https://flower.ai/), the most widely used FL framework.\n# 1. TraceFL\nTraceFL is a `tool` designed to provide **interpretability** in Federated Learning (FL) by identifying clients responsible for specific predictions made by a global model.\n\n![alt text](image.png)\n### 1.1 Overview\nFederated Learning (FL) enables multiple clients (e.g., `hospitals` ) to collaboratively train a global model without sharing their raw data. However, this distributed and privacy-preserving setup makes it challenging to attribute a model's predictions to specific clients. Understanding which clients are most responsible for a model's output is crucial for `debugging`, `accountability`, and `incentivizing` high-quality contributions.\n\nTraceFL addresses this challenge by dynamically tracking the significance of neurons in a global model's prediction and mapping them back to the corresponding neurons in each participating client's model. This process allows FL developers to localize the clients most responsible for a prediction without accessing their raw training data.\n### 1.2 Key Features\n- **Neuron Provenance:** A novel technique that tracks the flow of information from individual clients to the global model, identifying the most influential clients for each prediction.\n- **High Accuracy:** TraceFL achieves 99% accuracy in localizing responsible clients in both image and text classification tasks.\n- **Wide Applicability:** Supports multiple neural network architectures, including CNNs (e.g., ResNet, DenseNet) and any transformers model from HuggingFace library (e.g., BERT, GPT).\n- **Scalability and Robustness:** Efficiently scales to thousands of clients and maintains high accuracy under varying data distributions and differential privacy settings.\n- **No Client-Side Instrumentation Required:** Runs entirely on the central server, without needing access to clients' training data or modifications to the underlying fusion algorithm.\n# 2. Running TraceFL\n\n\u003eThe `.sh` (e.g., `job_training_all_exps.sh`) scripts and `TraceFL/tracefl/conf/base.yaml` provided in this artifact can be used to regenerate any experiment results presented in the paper. `\n\nThe experiments cover various aspects of federated learning, including:\n1. **Image and Text Classification**: Evaluating the performance of different models and datasets in federated settings.\n2. **Differential Privacy**: Analyzing the impact of differential privacy on model training and TraceFL's localizability.\n3. **Scalability**: Testing the scalability of TraceFL with varying numbers of clients and rounds.\n4. **Dirichlet Alpha Tuning**: Exploring the effects of different Dirichlet alpha values on data distribution, TraceFL's localizability, and model performance.\n### 2.1 Experiments Configuration Overview\n- **Image Classification**:\n  - Models: ResNet18, DenseNet121\n  - Datasets: MNIST, CIFAR-10, PathMNIST, OrganAMNIST\n  - Number of Rounds: 25-50\n- **Text Classification**:\n  - Models: OpenAI GPT, Google BERT\n  - Datasets: DBPedia, Yahoo Answers\n  - Number of Rounds: 25\n### 2.2 Differential Privacy Analysis\nThese experiments evaluate the impact of differential privacy on TraceFL by applying different noise levels and clipping norms.\n- **Models**: DenseNet121, OpenAI GPT\n- **Datasets**: MNIST, PathMNIST, DBPedia\n- **Noise Levels**: 0.0001, 0.0003, 0.0007, 0.0009, 0.001, 0.003\n- **Clipping Norms**: 15, 50\n- **Number of Rounds**: 15\n### 2.3 Scalability Experiments\nScalability tests involve running experiments with varying numbers of clients and rounds to assess how well TraceFL scales.\n- **Models**: OpenAI GPT\n- **Dataset**: DBPedia\n- **Number of Clients**: 200, 400, 600, 800, 1000\n- **Clients per Round**: 10, 20, 30, 40, 50\n- **Number of Rounds**: 15, 100\n\n### 2.4 Dirichlet Alpha Experiments\nThese experiments explore the effect of different Dirichlet alpha values on data partitioning,  model training, and TraceFL's localizability.\n- **Models**: OpenAI GPT, DenseNet121\n- **Datasets**: Yahoo Answers, DBPedia, PathMNIST, OrganAMNIST, MNIST, CIFAR-10\n- **Dirichlet Alpha Values**: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1\n- **Number of Clients**: 100\n- **Clients per Round**: 10\n- **Number of Rounds**: 15\n### 2.5 Results and Log Files\nEach experiment's output will be logged in the `logs` directory, providing detailed information about the training process and results.\n\n# 3. Potential Use Cases of TraceFL\n- **Debugging and Fault Localization:** Identify and isolate faulty or malicious clients responsible for incorrect or suspicious predictions in federated learning models.\n- **Enhancing Model Quality, Fairness, and Incentivization:**  Improve model performance by rewarding high-quality clients, ensuring fair client contributions, and incentivizing continued participation from beneficial clients.\n- **Client Accountability and Security:** Increase accountability by tracing model decisions back to specific clients, deterring malicious behavior, and ensuring secure contributions.\n-  **Optimized Client Selection and Efficiency:** Dynamically select the most beneficial clients for training to enhance model performance and reduce communication overhead.\n- **Interpretable Federated Learning in Sensitive Domains:** Provide transparency and interpretability in federated learning models, crucial for compliance, trust, and ethical considerations in domains like healthcare and finance.\n\n## 4. Citation\nLatex\n```\n@inproceedings{gill2025tracefl,\n  title = {{TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance}},\n  author = {Gill, Waris and Anwar, Ali and Gulzar, Muhammad Ali},\n  booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)},\n  year = {2025},\n  organization = {IEEE},\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwarisgill%2Ftracefl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwarisgill%2Ftracefl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwarisgill%2Ftracefl/lists"}